Metadata-Version: 2.4
Name: localdata-mcp
Version: 1.0.3
Summary: A dynamic MCP server for local databases and text files.
Author-email: "Christian C. Berclaz" <christian@berclaz.org>
License-Expression: MIT
Project-URL: Homepage, https://github.com/ChristianBerclaz/localdata-mcp
Project-URL: Repository, https://github.com/ChristianBerclaz/localdata-mcp
Project-URL: Documentation, https://github.com/ChristianBerclaz/localdata-mcp#readme
Project-URL: Bug Tracker, https://github.com/ChristianBerclaz/localdata-mcp/issues
Keywords: mcp,model-context-protocol,database,ai,assistant
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Database
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: fastmcp
Requires-Dist: pandas>=1.3.0
Requires-Dist: sqlalchemy>=1.4.0
Requires-Dist: psycopg2-binary
Requires-Dist: mysql-connector-python
Requires-Dist: pyyaml>=5.4.0
Requires-Dist: toml>=0.10.0
Provides-Extra: dev
Requires-Dist: pytest>=6.0.0; extra == "dev"
Requires-Dist: pytest-cov; extra == "dev"
Requires-Dist: black; extra == "dev"
Requires-Dist: isort; extra == "dev"
Requires-Dist: flake8; extra == "dev"
Requires-Dist: mypy; extra == "dev"
Requires-Dist: pandas-stubs; extra == "dev"
Requires-Dist: types-toml; extra == "dev"
Requires-Dist: types-PyYAML; extra == "dev"
Dynamic: license-file

# LocalData MCP Server

[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
[![Python 3.10+](https://img.shields.io/badge/python-3.10+-blue.svg)](https://www.python.org/downloads/)
[![PyPI version](https://img.shields.io/pypi/v/localdata-mcp.svg)](https://pypi.org/project/localdata-mcp/)
[![FastMCP](https://img.shields.io/badge/FastMCP-Compatible-green.svg)](https://github.com/jlowin/fastmcp)

**A powerful, secure MCP server for local databases and structured text files with advanced security features and large dataset handling.**

## ✨ Features

### 🗄️ **Multi-Database Support**

- **SQL Databases**: PostgreSQL, MySQL, SQLite
- **Document Databases**: MongoDB
- **Structured Files**: CSV, JSON, YAML, TOML

### 🔒 **Advanced Security**

- **Path Security**: Restricts file access to current working directory only
- **SQL Injection Prevention**: Parameterized queries and safe table identifiers
- **Connection Limits**: Maximum 10 concurrent database connections
- **Input Validation**: Comprehensive validation and sanitization

### 📊 **Large Dataset Handling**

- **Query Buffering**: Automatic buffering for results with 100+ rows
- **Large File Support**: 100MB+ files automatically use temporary SQLite storage
- **Chunk Retrieval**: Paginated access to large result sets
- **Auto-Cleanup**: 10-minute expiry with file modification detection

### 🛠️ **Developer Experience**

- **Comprehensive Tools**: 12 database operation tools
- **Error Handling**: Detailed, actionable error messages
- **Thread Safety**: Concurrent operation support
- **Backward Compatible**: All existing APIs preserved

## 🚀 Quick Start

### Installation

```bash
# Using pip
pip install localdata-mcp

# Using uv (recommended)
uv tool install localdata-mcp

# Development installation
git clone https://github.com/ChrisGVE/localdata-mcp.git
cd localdata-mcp
pip install -e .
```

### Configuration

Add to your MCP client configuration:

```json
{
  "mcpServers": {
    "localdata": {
      "command": "localdata-mcp",
      "env": {}
    }
  }
}
```

### Usage Examples

#### Connect to Databases

```python
# PostgreSQL
connect_database("analytics", "postgresql", "postgresql://user:pass@localhost/db")

# SQLite
connect_database("local", "sqlite", "./data.sqlite")

# CSV Files
connect_database("csvdata", "csv", "./data.csv")

# JSON Files
connect_database("config", "json", "./config.json")
```

#### Query Data

```python
# Execute queries with automatic result formatting
execute_query("analytics", "SELECT * FROM users LIMIT 50")

# Large result sets use buffering automatically
execute_query_json("analytics", "SELECT * FROM large_table")
```

#### Handle Large Results

```python
# Get chunked results for large datasets
get_query_chunk("analytics_1640995200_a1b2", 101, "100")

# Check buffer status
get_buffered_query_info("analytics_1640995200_a1b2")

# Manual cleanup
clear_query_buffer("analytics_1640995200_a1b2")
```

## 🔧 Available Tools

| Tool                      | Description                | Use Case      |
| ------------------------- | -------------------------- | ------------- |
| `connect_database`        | Connect to databases/files | Initial setup |
| `disconnect_database`     | Close connections          | Cleanup       |
| `list_databases`          | Show active connections    | Status check  |
| `execute_query`           | Run SQL (markdown output)  | Small results |
| `execute_query_json`      | Run SQL (JSON output)      | Large results |
| `describe_database`       | Show schema/structure      | Exploration   |
| `describe_table`          | Show table details         | Analysis      |
| `get_table_sample`        | Preview table data         | Quick look    |
| `get_table_sample_json`   | Preview (JSON format)      | Development   |
| `find_table`              | Locate tables by name      | Navigation    |
| `read_text_file`          | Read structured files      | File access   |
| `get_query_chunk`         | Paginated result access    | Large data    |
| `get_buffered_query_info` | Buffer status info         | Monitoring    |
| `clear_query_buffer`      | Manual buffer cleanup      | Management    |

## 📋 Supported Data Sources

### SQL Databases

- **PostgreSQL**: Full support with connection pooling
- **MySQL**: Complete MySQL/MariaDB compatibility
- **SQLite**: Local file and in-memory databases

### Document Databases

- **MongoDB**: Collection queries and aggregation

### Structured Files

- **CSV**: Large file automatic SQLite conversion
- **JSON**: Nested structure flattening
- **YAML**: Configuration file support
- **TOML**: Settings and config files

## 🛡️ Security Features

### Path Security

```python
# ✅ Allowed - current directory and subdirectories
"./data/users.csv"
"data/config.json"
"subdir/file.yaml"

# ❌ Blocked - parent directory access
"../etc/passwd"
"../../sensitive.db"
"/etc/hosts"
```

### SQL Injection Prevention

```python
# ✅ Safe - parameterized queries
describe_table("mydb", "users")  # Validates table name

# ❌ Blocked - malicious input
describe_table("mydb", "users; DROP TABLE users; --")
```

### Resource Limits

- **Connection Limit**: Maximum 10 concurrent connections
- **File Size Threshold**: 100MB triggers temporary storage
- **Query Buffering**: Automatic for 100+ row results
- **Auto-Cleanup**: Buffers expire after 10 minutes

## 📊 Performance & Scalability

### Large File Handling

- Files over 100MB automatically use temporary SQLite storage
- Memory-efficient streaming for large datasets
- Automatic cleanup of temporary files

### Query Optimization

- Results with 100+ rows automatically use buffering system
- Chunk-based retrieval for large datasets
- File modification detection for cache invalidation

### Concurrency

- Thread-safe connection management
- Concurrent query execution support
- Resource pooling and limits

## 🧪 Testing & Quality

**✅ 100% Test Coverage**

- 100+ comprehensive test cases
- Security vulnerability testing
- Performance benchmarking
- Edge case validation

**🔒 Security Validated**

- Path traversal prevention
- SQL injection protection
- Resource exhaustion testing
- Malicious input handling

**⚡ Performance Tested**

- Large file processing
- Concurrent connection handling
- Memory usage optimization
- Query response times

## 🔄 API Compatibility

All existing MCP tool signatures remain **100% backward compatible**. New functionality is additive only:

- ✅ All original tools work unchanged
- ✅ Enhanced responses with additional metadata
- ✅ New buffering tools for large datasets
- ✅ Improved error messages and validation

## 📖 Examples

### Basic Database Operations

```python
# Connect to SQLite
connect_database("sales", "sqlite", "./sales.db")

# Explore structure
describe_database("sales")
describe_table("sales", "orders")

# Query data
execute_query("sales", "SELECT product, SUM(amount) FROM orders GROUP BY product")
```

### Large Dataset Processing

```python
# Connect to large CSV
connect_database("bigdata", "csv", "./million_records.csv")

# Query returns buffer info for large results
result = execute_query_json("bigdata", "SELECT * FROM data WHERE category = 'A'")

# Access results in chunks
chunk = get_query_chunk("bigdata_1640995200_a1b2", 1, "1000")
```

### Multi-Database Analysis

```python
# Connect multiple sources
connect_database("postgres", "postgresql", "postgresql://localhost/prod")
connect_database("config", "yaml", "./config.yaml")
connect_database("logs", "json", "./logs.json")

# Query across sources (in application logic)
user_data = execute_query("postgres", "SELECT * FROM users")
config = read_text_file("./config.yaml", "yaml")
```

## 🚧 Roadmap

- [ ] **Enhanced File Formats**: Excel, Parquet support
- [ ] **Caching Layer**: Configurable query result caching
- [ ] **Connection Pooling**: Advanced connection management
- [ ] **Streaming APIs**: Real-time data processing
- [ ] **Monitoring Tools**: Connection and performance metrics

## 🤝 Contributing

Contributions welcome! Please read our [Contributing Guidelines](CONTRIBUTING.md) for details.

### Development Setup

```bash
git clone https://github.com/ChrisGVE/localdata-mcp.git
cd localdata-mcp
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate
pip install -e ".[dev]"
pytest
```

## 📄 License

MIT License - see the [LICENSE](LICENSE) file for details.

## 🔗 Links

- **GitHub**: [localdata-mcp](https://github.com/ChrisGVE/localdata-mcp)
- **PyPI**: [localdata-mcp](https://pypi.org/project/localdata-mcp/)
- **MCP Protocol**: [Model Context Protocol](https://modelcontextprotocol.io/)
- **FastMCP**: [FastMCP Framework](https://github.com/jlowin/fastmcp)

## 📊 Stats

![GitHub stars](https://img.shields.io/github/stars/ChrisGVE/localdata-mcp?style=social)
![GitHub forks](https://img.shields.io/github/forks/ChrisGVE/localdata-mcp?style=social)
![PyPI downloads](https://img.shields.io/pypi/dm/localdata-mcp)

## 📚 Additional Resources

- **[FAQ](FAQ.md)**: Common questions and troubleshooting
- **[Troubleshooting Guide](TROUBLESHOOTING.md)**: Comprehensive problem resolution
- **[Advanced Examples](ADVANCED_EXAMPLES.md)**: Production-ready usage patterns
- **[Blog Post](BLOG_POST.md)**: Technical deep dive and use cases

## 🤔 Need Help?

- **Issues**: [GitHub Issues](https://github.com/ChrisGVE/localdata-mcp/issues)
- **Discussions**: [GitHub Discussions](https://github.com/ChrisGVE/localdata-mcp/discussions)
- **Email**: Available in GitHub profile
- **Community**: Join MCP community forums

## 🏷️ Tags

`mcp` `model-context-protocol` `database` `postgresql` `mysql` `sqlite` `mongodb` `csv` `json` `yaml` `toml` `ai` `machine-learning` `data-integration` `python` `security` `performance`

---

**Made with ❤️ for the MCP Community**
