Metadata-Version: 2.4
Name: iflow-mcp_mat1312_datapizza-mcp-server
Version: 0.1.0
Summary: MCP server for datapizza-ai documentation and examples
Project-URL: Homepage, https://github.com/datapizza-labs/datapizza-mcp-server
Project-URL: Repository, https://github.com/datapizza-labs/datapizza-mcp-server
Project-URL: Issues, https://github.com/datapizza-labs/datapizza-mcp-server/issues
Author: DataPizza MCP Server
License: MIT
Keywords: ai,datapizza-ai,documentation,mcp,rag
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: >=3.10
Requires-Dist: aiohttp>=3.9.0
Requires-Dist: datapizza-ai-clients-openai>=0.0.2
Requires-Dist: datapizza-ai-core>=0.0.2
Requires-Dist: datapizza-ai-embedders-openai>=0.0.2
Requires-Dist: datapizza-ai-vectorstores-qdrant>=0.0.2
Requires-Dist: mcp
Requires-Dist: numpy>=1.24.0
Requires-Dist: openai>=1.0.0
Requires-Dist: python-dotenv>=1.0.0
Requires-Dist: qdrant-client>=1.7.0
Requires-Dist: requests>=2.31.0
Provides-Extra: dev
Requires-Dist: black>=23.0.0; extra == 'dev'
Requires-Dist: mypy>=1.7.0; extra == 'dev'
Requires-Dist: pytest-asyncio>=0.23.0; extra == 'dev'
Requires-Dist: pytest>=7.0.0; extra == 'dev'
Requires-Dist: ruff>=0.1.0; extra == 'dev'
Description-Content-Type: text/markdown

# DataPizza MCP Server 🍕

A Model Context Protocol (MCP) server that provides intelligent access to datapizza-ai documentation through vector similarity search and retrieval-augmented generation.

## Overview

This MCP server enables AI assistants and applications to query the comprehensive datapizza-ai documentation using natural language queries. It indexes documentation from the datapizza-ai repository and provides contextual, relevant responses through a RAG (Retrieval-Augmented Generation) pipeline.

## Features

- **Intelligent Documentation Search**: Natural language queries across datapizza-ai documentation
- **Vector-Based Retrieval**: Uses OpenAI embeddings and Qdrant vector database for semantic search
- **MCP Protocol Compliance**: Standard Model Context Protocol implementation for broad compatibility
- **Automatic Indexing**: Downloads and indexes documentation from GitHub automatically
- **Cloud-Ready**: Supports Qdrant Cloud for scalable vector storage
- **Configurable**: Environment-based configuration for flexible deployment

## Architecture

The server consists of four main components:

- **MCP Server**: FastMCP-based server exposing the `query_datapizza` tool
- **Indexer**: Downloads and processes datapizza-ai documentation into searchable chunks
- **Retriever**: RAG engine for semantic search and response generation
- **Configuration**: Environment-based settings management with validation

## Prerequisites

- Python 3.10 or higher
- OpenAI API key
- Qdrant Cloud account and API key
- Internet connection for documentation indexing

## Installation

1. Clone the repository:
```bash
git clone https://github.com/datapizza-labs/mcp_server_datapizza.git
cd datapizza-mcp-server
```

2. Navigate to the package directory:
```bash
cd datapizza-mcp-server
```

3. Install the package with development dependencies:
```bash
pip install -e ".[dev]"
```

## Configuration

Create a `.env` file in the `datapizza-mcp-server` directory with the following variables:

```env
# Required Configuration
OPENAI_API_KEY=your_openai_api_key_here
QDRANT_URL=your_qdrant_cloud_url
QDRANT_API_KEY=your_qdrant_api_key

# Optional Configuration
EMBEDDING_MODEL=text-embedding-3-small
EMBEDDING_DIMENSIONS=1536
COLLECTION_NAME=datapizza_docs
MAX_RESULTS=5
CHUNK_SIZE=1024
CHUNK_OVERLAP=200
LOG_LEVEL=INFO
```

### Required Environment Variables

| Variable | Description |
|----------|-------------|
| `OPENAI_API_KEY` | OpenAI API key for generating embeddings |
| `QDRANT_URL` | Qdrant Cloud instance URL |
| `QDRANT_API_KEY` | Qdrant Cloud API key |

### Optional Environment Variables

| Variable | Default | Description |
|----------|---------|-------------|
| `EMBEDDING_MODEL` | `text-embedding-3-small` | OpenAI embedding model |
| `EMBEDDING_DIMENSIONS` | `1536` | Embedding vector dimensions |
| `COLLECTION_NAME` | `datapizza_docs` | Qdrant collection name |
| `MAX_RESULTS` | `5` | Maximum search results returned |
| `CHUNK_SIZE` | `1024` | Document chunk size for indexing |
| `CHUNK_OVERLAP` | `200` | Overlap between document chunks |
| `LOG_LEVEL` | `INFO` | Logging level (DEBUG, INFO, WARNING, ERROR) |

## Usage

### 1. Index Documentation

Before using the server, index the datapizza-ai documentation:

```bash
python -m datapizza_mcp.indexer
```

To force re-indexing (clears existing data):
```bash
python -m datapizza_mcp.indexer --force
```

### 2. Start the MCP Server

```bash
python -m datapizza_mcp.server
```

Or use the provided Windows batch script:
```bash
../run_datapizza.bat
```

### 3. Query the Documentation

The server exposes a `query_datapizza` tool that can be called by MCP clients:

```python
# Example query
result = await client.call_tool("query_datapizza", {
    "query": "come creare un agente con OpenAI",
    "max_results": 5
})
```

## MCP Tools and Resources

### Tools

- **`query_datapizza`**: Search datapizza-ai documentation
  - `query` (string): Natural language search query
  - `max_results` (int, optional): Maximum number of results (default: 5)

### Resources

- **`datapizza://status`**: System status and configuration information

## Development

### Code Quality Tools

```bash
# Format code
black src/

# Lint code
ruff check src/
ruff check src/ --fix  # Auto-fix issues

# Type checking
mypy src/

# Run tests
pytest
```

### Project Structure

```
datapizza-mcp-server/
├── src/datapizza_mcp/
│   ├── __init__.py          # Package exports
│   ├── config.py            # Configuration management
│   ├── server.py            # MCP server implementation
│   ├── indexer.py           # Documentation indexing
│   └── retriever.py         # RAG retrieval engine
├── pyproject.toml           # Package configuration
├── .env                     # Environment variables
└── README.md               # This file
```

## Dependencies

### Core Dependencies

- **mcp**: Model Context Protocol framework
- **datapizza-ai-core**: Core datapizza-ai functionality
- **datapizza-ai-embedders-openai**: OpenAI embedding integration
- **datapizza-ai-vectorstores-qdrant**: Qdrant vector store integration
- **openai**: OpenAI API client
- **qdrant-client**: Qdrant database client
- **requests**: HTTP client for GitHub API
- **python-dotenv**: Environment variable management

### Development Dependencies

- **pytest**: Testing framework
- **black**: Code formatter
- **ruff**: Linter and code style checker
- **mypy**: Static type checker

## Troubleshooting

### Common Issues

1. **Authentication Errors**
   - Verify `OPENAI_API_KEY` is set correctly
   - Check Qdrant Cloud credentials (`QDRANT_URL` and `QDRANT_API_KEY`)

2. **Empty Search Results**
   - Ensure documentation is indexed: `python -m datapizza_mcp.indexer`
   - Check system status: query the `datapizza://status` resource

3. **Connection Issues**
   - Verify internet connectivity for GitHub and Qdrant Cloud access
   - Check firewall settings for outbound HTTPS connections

### Debugging

Enable debug logging by setting `LOG_LEVEL=DEBUG` in your `.env` file.

## Contributing

1. Fork the repository
2. Create a feature branch
3. Make your changes following the code style guidelines
4. Run the full test suite and code quality checks
5. Submit a pull request

## License

This project is licensed under the MIT License. See the LICENSE file for details.

## Support

For issues and questions:
- GitHub Issues: [datapizza-mcp-server/issues](https://github.com/datapizza-labs/datapizza-mcp-server/issues)
- DataPizza AI Documentation: [datapizza-ai](https://github.com/datapizza-labs/datapizza-ai)
