Metadata-Version: 2.4
Name: sqlthought
Version: 1.0.6
Summary: Building DB, asking natural language questions through agents, and evaluate
Author-email: Tiyasa Mukherjee <mukherjeetiyasa1998@gmail.com>
License-Expression: MIT
Keywords: nlqtosql,csvtodb,langgraph,groq,agents
Requires-Python: >=3.9
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: langgraph>=0.3.3
Requires-Dist: sqlparse
Requires-Dist: pydantic>=2.0
Requires-Dist: psutil
Requires-Dist: Pillow
Requires-Dist: deepeval
Requires-Dist: groq>=0.5.0
Requires-Dist: pandas
Requires-Dist: click
Requires-Dist: litellm
Dynamic: license-file

# 🔗 SQLThought

Are you interested in quickly building a database from CSV files and asking questions in natural language? 

You’ve arrived at the right place!

## Quick Install
```bash
pip install sqlthought
```

## CLI commands
SQLThought includes a full command-line interface:
- `sqlthought` - Show the steps to follow 
- `sqlthought --version` — Show installed version  
- `sqlthought configure` — Configure Groq API and LLM 
- `sqlthought init` - Create workspace for setting up db
- `sqlthought build-db` — Build a SQLite database from CSV files  
- `sqlthought query` — Query your database using natural language  
- `sqlthought evaluate` - Evaluate core metrics and tool governance metrics


# 🤔 What is this?
SQLThought is built to simplify how people interact with structured data.

Many users know exactly what they want to ask —
but writing SQL, understanding joins, tuning filters, or debugging errors takes time.

Traditional NLQ (Natural Language Query) systems jump from text → SQL directly, which often leads to invalid queries and unclear reasoning.

SQLThought takes a more reliable approach:

✅ It thinks step-by-step
It analyzes the schema, decomposes the question, plans the query, generates SQL, executes it, and—if something fails—corrects the SQL automatically.

✅ It builds a database for you
Just place your CSV files in a folder and run one command to generate a queryable SQLite database.

✅ It gives natural language answers
Once SQL is executed, SQLThought converts the results back into a clean, conversational answer powered by Groq.

✅ It calculates core and tool governance metrics
SQLThought calculates the tool calling and core performance metrics for the agent execution pipeline.


## 🧩 What are the key features?

### Multi-Agent Reasoning
SQLThought uses LangGraph to orchestrate intelligent pipelines with:
- Stepwise decomposition  
- State-aware execution  
- Deterministic branches  
- Automatic SQL correction loops  
- Fully transparent, debuggable agent workflows  

### Groq-Powered LLM Execution
Fast agentic reasoning using Groq’s API, providing:
- Low-latency inference  
- Predictable outputs  
- Easy model switching  
- Secure local configuration  

### Modular, Extensible Architecture
Every reasoning stage is isolated and replaceable.

### Secure Local Config Storage
```
~/.sqlthought/config.json
```
Stores API keys and model preferences locally (never uploaded or logged).

### NLQ  to SQL conversion
The first reasoning module shipped with SQLThought is a full NLQ → SQL agentic system with:

* Schema understanding
* Subtask decomposition
* Query planning
* SQL generation
* SQL execution
* Automatic correction loops

### Evaluation Capabilities
Built-in unsupervised evaluation pipeline to assess the quality, reliability, and safety of generated SQL without requiring ground-truth labels.

Core Metrics
* Schema Adherence — Checks whether referenced tables and columns exist in the actual database schema. (Ensures structural correctness.)
* SQL Complexity Score — Rates query complexity using join count, subqueries, and length-based penalties. (Helps identify overly complex or inefficient SQL.)
* SQL Safety Score — Detects risky SQL patterns (e.g., DROP TABLE, UNION SELECT, OR 1=1). (Prevents unsafe or injection-like queries.)
* Performance Metrics
    * Execution Time — How long the SQL took to run.
    * CPU % — CPU load during execution.
    * Memory Usage — Memory delta while running query.
    * Disk I/O — Bytes read from disk.
    * Rows Returned — Output size indicator.

Tool Governance Metrics
Evaluates how effectively the agentic pipeline performs each reasoning step:
* Tool Call Accuracy — % of required reasoning tools that executed correctly.
* Tool Efficiency — Measures minimal correction loops or retries.
* Tool Diversity — Coverage of all agent nodes (schema → subproblem → plan → SQL → execution).
* Pipeline Completion — Whether the end-to-end pipeline successfully produced a final SQL result.
* Node Count — Number of agentic reasoning steps executed during the run. (Useful to detect overthinking or under-utilization.)

## 💁 Contributing
Contributions, feature ideas, and pull requests are welcome!
More documentation and developer guides will be added soon.
