Metadata-Version: 2.4
Name: nl2viz-pro
Version: 1.0.0
Summary: Natural Language to Visualization Engine
Home-page: https://github.com/kamaldheen20/nl2viz-pro
Author: AI Systems
Author-email: dheenkamal681@gmail.com
Project-URL: Bug Reports, https://github.com/kamaldheen20/nl2viz-pro/issues
Project-URL: Documentation, https://github.com/kamaldheen20/nl2viz-pro#readme
Project-URL: Source Code, https://github.com/kamaldheen20/nl2viz-pro
Keywords: visualization natural-language nlp data-analysis charts
Classifier: Development Status :: 5 - Production/Stable
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: Topic :: Scientific/Engineering :: Visualization
Classifier: Topic :: Office/Business
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: pandas>=1.3.0
Requires-Dist: numpy>=1.20.0
Requires-Dist: matplotlib>=3.5.0
Requires-Dist: seaborn>=0.11.0
Requires-Dist: plotly>=5.0.0
Requires-Dist: openpyxl>=3.0.0
Requires-Dist: gradio>=3.0.0
Dynamic: author
Dynamic: author-email
Dynamic: classifier
Dynamic: description
Dynamic: description-content-type
Dynamic: home-page
Dynamic: keywords
Dynamic: license-file
Dynamic: project-url
Dynamic: requires-dist
Dynamic: requires-python
Dynamic: summary

# 🎨 NL2Viz Pro — Natural Language to Visualization Engine

A **production-grade Python library** that converts natural language queries and datasets into beautiful, insightful visualizations automatically.

## 🎯 Features

✅ **Auto-detect data formats** (CSV, Excel, JSON, SQLite)
✅ **Smart intent parsing** from natural language queries
✅ **Automatic chart type selection** based on data schema
✅ **Multiple visualization libraries** (Matplotlib, Seaborn, Plotly)
✅ **Fuzzy column matching** with typo correction
✅ **Auto insights generation** (trends, correlations, outliers)
✅ **Interactive charts** (Plotly support)
✅ **CLI & Python API**
✅ **Comprehensive error handling**
✅ **Production-ready codebase**

## 📊 Supported Chart Types

- Bar Charts
- Line Charts
- Pie Charts
- Histograms
- Scatter Plots
- Box Plots
- Heatmaps
- Violin Plots
- Area Charts
- Bubble Charts
- Correlation Matrix
- Pairplot

## 📦 Installation

### Via pip (when published)

```bash
pip install nl2viz-pro
```

### From source

```bash
git clone <repository>
cd nl2viz_pro
pip install -r requirements.txt
pip install -e .
```

## 🚀 Quick Start

### Python API

```python
from nl2viz_pro import visualize

# Quick one-liner
visualize("show sales by region", "data.csv")

# Or with more control
result = visualize(
    "compare profit by category",
    "data.xlsx",
    library='plotly',
    show=True,
    save='output.html'
)

# Access results
insights = result['insights']
figure = result['figure']
data = result['data']
```

### Load Data Once, Query Multiple Times

```python
from nl2viz_pro import NL2VizPro

viz = NL2VizPro()
viz.load_data("data.csv")

# Query multiple times
viz.visualize("show sales by region")
viz.visualize("profit trend over time")
viz.visualize("distribution of quantities")
```

### Command Line

```bash
# Basic usage
nl2viz "show sales by region" data.csv

# With options
nl2viz "correlation heatmap" data.xlsx --library plotly --save chart.html

# Verbose mode
nl2viz "analyze profit" data.json --library seaborn --verbose

# Don't display (just save)
nl2viz "sales distribution" data.csv --save output.png --no-show
```

## 📚 API Reference

### `NL2VizPro` Class

Main class for visualization pipeline.

#### Methods

##### `load_data(data_source)`
Load data from various sources.

```python
# From CSV
viz.load_data("data.csv")

# From Excel
viz.load_data("data.xlsx")

# From JSON
viz.load_data("data.json")

# From DataFrame
import pandas as pd
df = pd.read_csv("data.csv")
viz.load_data(df)

# Returns self for chaining
viz.load_data("data.csv").quick_stats()
```

##### `visualize(query, data_source, library, **kwargs)`
Convert natural language query to visualization.

```python
result = viz.visualize(
    query="show sales by region",
    data_source="data.csv",  # Optional if already loaded
    library='matplotlib',     # 'matplotlib', 'seaborn', 'plotly'
    show=True,               # Display chart
    save='output.png',       # Save to file
    show_insights=True       # Print insights
)
```

**Returns:** Dictionary with:
- `figure`: Visualization object
- `insights`: Generated insights
- `intent`: Parsed query intent
- `data`: Processed visualization data
- `processed_data`: Full processed dataset

##### `get_columns()`
Get list of available columns.

```python
columns = viz.get_columns()
print(columns)  # ['Date', 'Region', 'Sales', ...]
```

##### `suggest_column(query_col)`
Get fuzzy match suggestion for a column name.

```python
match = viz.suggest_column("revenu")  # Returns "revenue" (typo corrected)
```

##### `get_data_info()`
Get detailed data information.

```python
info = viz.get_data_info()
# {
#   'shape': (100, 5),
#   'columns': [...],
#   'dtypes': {...},
#   'missing_values': {...},
#   'missing_percentage': {...}
# }
```

##### `quick_stats()`
Print quick statistical summary.

```python
viz.quick_stats()
```

##### `set_style(style, palette, figure_size)`
Configure visualization style.

```python
viz.set_style(
    style='darkgrid',
    palette='husl',
    figure_size=(14, 8)
)
```

### `visualize()` Convenience Function

One-liner for quick visualization.

```python
from nl2viz_pro import visualize

result = visualize(
    "show sales by region",
    "data.csv",
    library='matplotlib'
)
```

## 🧠 Natural Language Query Examples

### Chart Type Detection

```python
# Automatically detects line chart (time-based)
visualize("show profit over time", "data.csv")

# Detects bar chart (categorical)
visualize("compare sales by region", "data.csv")

# Detects scatter (two numeric columns)
visualize("relationship between price and quantity", "data.csv")

# Detects distribution
visualize("analyze sales distribution", "data.csv")

# Detects correlation
visualize("correlation heatmap", "data.csv")
```

### Column Matching

Fuzzy matching handles typos and synonyms:

```python
# Typo correction
visualize("show revenu by region", "data.csv")  # 'revenu' → 'revenue'

# Synonym matching
visualize("show earnings by category", "data.csv")  # 'earnings' → 'profit'

# Partial matching
visualize("compare sell by region", "data.csv")  # 'sell' → 'sales'
```

### Aggregations

```python
visualize("total sales by region", "data.csv")      # sum
visualize("average profit by month", "data.csv")    # mean
visualize("count of orders by category", "data.csv") # count
visualize("max price by product", "data.csv")        # max
```

## 🔍 Data Format Support

### CSV

```python
visualize("show data", "data.csv")

# Supported options: encoding, delimiter, etc.
```

### Excel

```python
visualize("show data", "data.xlsx")  # Reads first sheet
visualize("show data", "data.xls")   # Also works
```

### JSON

```python
# Array of objects
visualize("show data", "data.json")

# Nested JSON (auto-flattened)
visualize("show data", "data.json")
```

### SQLite

```python
visualize("show data", "database.db")           # First table
visualize("show data", "sqlite:///database.db?table=sales")  # Specific table
```

### Pandas DataFrame

```python
import pandas as pd

df = pd.read_csv("data.csv")
visualize("show data", df)
```

## 📊 Architecture

```
nl2viz_pro/
├── core/
│   ├── input_handler.py      # Data loading
│   ├── schema_analyzer.py    # Data type detection
│   ├── intent_engine.py      # NLP query parsing
│   ├── processor.py          # Data processing
│   ├── visualizer.py         # Chart creation
│   └── insight_engine.py     # Insight generation
├── utils/
│   ├── fuzzy_match.py        # Column matching
│   └── validator.py          # Input validation
├── connectors/               # Data connectors
├── api.py                    # Main API
├── cli.py                    # CLI
├── example.py                # Examples
├── setup.py                  # Installation
└── requirements.txt          # Dependencies
```

## 🔄 Processing Pipeline

1. **Input Handler** → Auto-detect and load data
2. **Schema Analyzer** → Detect column types (numeric, categorical, datetime)
3. **Intent Engine** → Parse query, extract visualization intent
4. **Data Processor** → Apply filters, aggregations, transformations
5. **Visualizer** → Create chart using selected library
6. **Insight Engine** → Generate insights (trends, correlations, outliers)

## 🎨 Visualization Libraries

Switch between visualization libraries:

```python
# Matplotlib (default)
visualize("show data", "data.csv", library='matplotlib')

# Seaborn (statistical)
visualize("show data", "data.csv", library='seaborn')

# Plotly (interactive)
visualize("show data", "data.csv", library='plotly')
```

## 💡 Auto Insights

Automatically generates insights:

```python
result = visualize("show data", "data.csv")

# Access insights
insights = result['insights']

# {
#   'summary': {...},           # Basic stats
#   'trends': [...],            # Trend analysis
#   'correlations': [...],      # Correlations
#   'outliers': [...],          # Outlier detection
#   'top_bottom': {...},        # Top/bottom values
#   'distributions': {...}      # Distribution analysis
# }
```

## 🛠️ Error Handling

Comprehensive error handling:

```python
from nl2viz_pro import visualize

try:
    visualize("show data", "missing_file.csv")
except FileNotFoundError:
    print("File not found")

try:
    visualize("show invalid_column", "data.csv")
except ValueError:
    print("Column not found - suggestions available")

# Get column suggestions
viz = NL2VizPro()
viz.load_data("data.csv")
suggest = viz.suggest_column("revenu")  # Returns closest match
```

## 📈 Advanced Usage

### Custom Processing

```python
from nl2viz_pro.core import DataProcessor, Intent Engine

viz = NL2VizPro()
viz.load_data("data.csv")

# Custom intent
intent = {
    'chart_type': 'bar',
    'x_column': 'Region',
    'y_column': 'Sales',
    'aggregation': 'sum',
    'filters': [{'column': 'Sales', 'operator': '>', 'value': '1000'}],
}

# Process data
df_processed = DataProcessor.process(viz.df, intent)

# Create visualization
from nl2viz_pro.core import Visualizer
fig = Visualizer.create(df_processed, 'bar', 'Region', 'Sales')
Visualizer.show(fig)
```

### Multiple Queries on Same Data

```python
viz = NL2VizPro()
viz.load_data("large_dataset.csv")

queries = [
    "show sales by region",
    "profit trend",
    "distribution of quantities",
    "correlation matrix",
]

for query in queries:
    result = viz.visualize(query, show=False, save=f"chart_{queries.index(query)}.png")
```

### Batch Processing

```python
import os
from pathlib import Path

viz = NL2VizPro()

for csv_file in Path("data/").glob("*.csv"):
    viz.load_data(csv_file)
    result = viz.visualize(
        "show summary",
        save=f"output/{csv_file.stem}.png"
    )
```

## 🧪 Testing

Run examples:

```bash
python example.py
```

This shows 7 complete examples covering:
1. Basic usage
2. Load and query multiple times
3. DataFrame input
4. Different libraries
5. Advanced queries
6. Schema inspection
7. JSON data

## 📋 Requirements

- Python 3.8+
- pandas >= 1.3.0
- numpy >= 1.20.0
- matplotlib >= 3.5.0
- seaborn >= 0.11.0
- plotly >= 5.0.0
- openpyxl >= 3.0.0 (Excel support)

## 🚀 Performance

- Handles datasets up to 1M rows
- Smart sampling for large datasets
- Efficient fuzzy matching
- Optimized for common queries

## 🔮 Future Improvements

- [ ] Gradio UI for web interface
- [ ] Export charts to multiple formats
- [ ] Dashboard mode with multiple visualizations
- [ ] Custom color schemes
- [ ] Caching for repeated queries
- [ ] Multi-file analysis
- [ ] SQL query generation
- [ ] Advanced filtering UI
- [ ] Recommendation engine
- [ ] Integration with APIs

## 📄 License

MIT License - See LICENSE file for details

## 📧 Support

For issues, suggestions, or contributions:
- GitHub Issues: [GitHub Repo]
- Email: dev@nl2viz.com

## 🙌 Contributing

Contributions welcome! Please:
1. Fork the repository
2. Create a feature branch
3. Make your changes
4. Add tests
5. Submit a pull request

---

**Made with ❤️ for data visualization**
