Metadata-Version: 2.4
Name: async-mariadb-connector
Version: 0.1.2
Summary: A lightweight, production-grade, and asynchronous Python connector for MariaDB.
Author-email: Chanikya Nelapatla <chanikyac01@gmail.com>
License-Expression: MIT
Project-URL: Homepage, https://github.com/chanikkyasaai/async-mariadb-ml
Project-URL: Bug Tracker, https://github.com/chanikkyasaai/async-mariadb-ml/issues
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Topic :: Database
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: aiomysql>=0.2.0
Requires-Dist: pandas>=2.0.0
Requires-Dist: python-dotenv>=1.0.0
Requires-Dist: tenacity>=8.2.3
Provides-Extra: dev
Requires-Dist: pytest>=8.0.0; extra == "dev"
Requires-Dist: pytest-asyncio>=0.21.0; extra == "dev"
Requires-Dist: pytest-cov>=4.0.0; extra == "dev"
Requires-Dist: faker>=20.0.0; extra == "dev"
Requires-Dist: langchain-core>=0.1.40; extra == "dev"
Requires-Dist: sentence-transformers>=2.2.2; extra == "dev"
Requires-Dist: numpy>=1.26.0; extra == "dev"
Requires-Dist: langsmith>=0.1.0; extra == "dev"
Requires-Dist: httpx>=0.25.0; extra == "dev"
Requires-Dist: unittest-xml-reporting>=3.2.0; extra == "dev"
Requires-Dist: tabulate>=0.9.0; extra == "dev"
Dynamic: license-file

# Async MariaDB Python Connector

[![PyPI version](https://img.shields.io/pypi/v/async-mariadb-connector.svg?v=2)](https://pypi.org/project/async-mariadb-connector/)
[![Python Version](https://img.shields.io/pypi/pyversions/async-mariadb-connector.svg?v=2)](https://pypi.org/project/async-mariadb-connector/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
[![Build Status](https://github.com/chanikkyasaai/async-mariadb-ml/actions/workflows/ci.yml/badge.svg)](https://github.com/chanikkyasaai/async-mariadb-ml/actions)
[![Downloads](https://img.shields.io/pypi/dm/async-mariadb-connector.svg?v=2)](https://pypi.org/project/async-mariadb-connector/)

A lightweight, production-grade, and asynchronous Python connector for MariaDB, designed for high-performance data operations in modern AI/ML and web applications.

---

## The Problem: MariaDB's Python Ecosystem Has a Performance Bottleneck

MariaDB is a powerful and reliable database, but the official Python connector (`mariadb`) operates **synchronously**. This means your application blocks and waits for every single query to finish, creating a massive performance bottleneck in modern, I/O-bound applications.

This is especially problematic for:

-   **AI/ML Pipelines:** Loading large datasets for training or performing bulk embedding inserts for RAG systems becomes slow and inefficient.
-   **Web APIs:** High-traffic web servers struggle to handle concurrent requests when each database call is a blocking operation.
-   **Data Processing:** Any workflow requiring many simultaneous database interactions is severely limited.

## The Solution: A High-Level, Production-Ready Async Connector

This project, `async-mariadb-connector`, was built to solve this exact problem. It provides a high-level, asynchronous interface to MariaDB that is not only fast but also robust and easy to use.

While low-level async drivers like `aiomysql` exist, they lack the "batteries-included" features required for production environments. This library bridges that gap.

### How Well Is It Built?

This is not just a simple wrapper. It is a complete, production-grade library with features designed for real-world use:

-   **Truly Asynchronous:** Built on `asyncio` to eliminate I/O blocking and enable massive concurrency.
-   **Automatic Connection Pooling:** Efficiently manages database connections for optimal performance, right out of the box.
-   **Resilient by Design:** Features automatic connection retries with exponential backoff, so your application can survive transient database or network issues.
-   **Seamless Pandas Integration:** Includes high-performance `bulk_insert` for DataFrames and `fetch_all_df` to move data effortlessly between your database and your data science tools.
-   **Memory-Efficient Streaming:** A `fetch_stream` method allows you to process huge datasets row-by-row, without risking memory overloads.
-   **Professionally Tested:** Comes with a comprehensive test suite (17 tests) ensuring reliability and correctness.

### See the Performance for Yourself

Don't just take our word for it. The performance gains are measurable and significant.

**Check out the detailed results in our [Benchmarks](https://github.com/chanikkyasaai/async-mariadb-ml/blob/main/docs/BENCHMARKS.md) to see how this connector is ~30% faster on concurrent read operations.**

## Strong MariaDB Integration

This library is specifically designed and tested for MariaDB:

- **Tested Against:** MariaDB 11.8.3
- **Full Type Support:** JSON, DECIMAL, utf8mb4 (emojis), TIMESTAMP, TEXT/LONGTEXT
- **Optimized For:** Connection pooling, strict SQL mode, InnoDB transactions
- **Docker Ready:** One-command setup with `docker-compose up`
- **AI/ML Optimized:** JSON storage for embeddings, built-in full-text search for RAG

For detailed MariaDB-specific features, configurations, and best practices, see [MariaDB Integration Notes](https://github.com/chanikkyasaai/async-mariadb-ml/blob/main/docs/MARIADB_NOTES.md).

## Why Choose MariaDB Over PostgreSQL/MySQL?

### 🚀 Performance Advantages
- **30% faster** concurrent operations with async connector
- **Optimized JSON** queries for document/embedding storage (13% faster than PostgreSQL)
- **Connection pooling** handles thousands of concurrent clients
- **InnoDB performance** tuned for modern SSDs

### 🤖 Perfect for AI/ML Workloads
- **JSON columns** for vector embeddings (384-dim, 768-dim, 1536-dim)
- **Full-text search** built-in - 33% faster than PostgreSQL for hybrid RAG
- **Pandas integration** for seamless data science workflows
- **Async operations** for high-throughput ML pipelines (2,900+ inserts/sec)

### 💪 MariaDB Advantages for RAG Systems
- **No extensions required** - FTS built-in (unlike PostgreSQL's pg_trgm)
- **Better JSON performance** - Faster queries for embedding storage
- **Hybrid search** - Native combination of full-text + vector similarity
- **Production-ready** - 20+ years of battle-testing

### 🆚 Feature Comparison

| Feature | MariaDB | PostgreSQL | MySQL |
|---------|---------|------------|-------|
| **Async Python Library** | ✅ **This library** | ⚠️ Limited options | ⚠️ Sync only (official) |
| **JSON Performance** | ⚡ **Fast** | 🐢 Slower | ⚡ Fast |
| **Full-text Search** | ✅ **Built-in** | ⚠️ Requires extension | ✅ Built-in |
| **Connection Pooling** | ✅ **Excellent** | ✅ Good | ✅ Good |
| **Bulk Operations** | ✅ **2,900+ inserts/s** | ⚠️ Slower | ✅ Fast |
| **Replication** | ✅ **Easy setup** | ⚠️ Complex | ✅ Easy setup |
| **Production Ready** | ✅ **20+ years** | ✅ Mature | ✅ Mature |
| **Community** | ✅ **Independent** | ✅ Strong | ⚠️ Oracle-controlled |

**💡 Winner for Python AI/ML:** MariaDB combines the best of both worlds - PostgreSQL-like features with MySQL-style simplicity and performance!

## LangChain Integration 🤖 ⭐ NEW!

**First async MariaDB connector with native LangChain support!**

Use this connector with **LangChain** to build powerful AI applications:

- **Natural Language SQL** - Convert questions to SQL queries
- **RAG with MariaDB** - Use MariaDB as a vector store for document embeddings
- **Hybrid Search** - Combine full-text and semantic search
- **SQL Agents** - Build database agents that answer complex questions

See the complete guide: [LangChain Integration Guide](https://github.com/chanikkyasaai/async-mariadb-ml/blob/main/docs/integrations/LANGCHAIN.md)

**Quick Example:**
```python
from async_mariadb_connector import AsyncMariaDB

async def langchain_example():
    db = AsyncMariaDB()
    
    # Get schema for LLM context
    schema = await db.fetch_all("SHOW TABLES")
    
    # Execute SQL generated by LLM
    results = await db.fetch_all("SELECT * FROM users WHERE age > 30")
    
    await db.close()
```

Check out our working examples in [`examples/integrations/`](https://github.com/chanikkyasaai/async-mariadb-ml/tree/main/examples/integrations):
- `langchain_mariadb_async.py` - SQL chain example
- `langchain_mariadb_rag.ipynb` - RAG with vector embeddings

## Future-Ready for AI and Modern Applications

This connector is designed for the future of data engineering and AI. The combination of non-blocking I/O, efficient bulk operations, and direct DataFrame integration makes it the ideal choice for:

-   **Building high-performance RAG pipelines** with vector embeddings stored in MariaDB.
-   **Creating fast, scalable data APIs** for web and mobile applications.
-   **Powering ETL and data processing workflows** that require high concurrency.

## Installation

```bash
pip install async-mariadb-connector
```

The package is now available on PyPI: https://pypi.org/project/async-mariadb-connector/

## Quick Start

First, spin up MariaDB with docker-compose:

```bash
docker-compose up -d
```

Then set up your `.env` file (copy from `.env.example`):

```ini
# .env
DB_HOST=127.0.0.1
DB_PORT=3307
DB_USER=root
DB_PASSWORD=root
DB_NAME=test_db
```

Now, you can connect and run queries asynchronously:

```python
import asyncio
import pandas as pd
from async_mariadb_connector import AsyncMariaDB

async def main():
    db = AsyncMariaDB()

    try:
        # Fetch all users into a DataFrame
        all_users_df = await db.fetch_all_df("SELECT * FROM users")
        print("All users:")
        print(all_users_df)
        
        # Batch insert multiple rows efficiently
        users_to_insert = [
            ("Alice", 25, "alice@example.com"),
            ("Bob", 30, "bob@example.com"),
            ("Charlie", 35, "charlie@example.com")
        ]
        rows = await db.executemany(
            "INSERT INTO users (name, age, email) VALUES (%s, %s, %s)",
            users_to_insert
        )
        print(f"Inserted {rows} users")

    finally:
        await db.close()

if __name__ == "__main__":
    asyncio.run(main())
```

## Production Monitoring

Monitor connection pool health in production environments:

```python
async def monitor_pool():
    db = AsyncMariaDB()
    
    # Get pool statistics
    stats = db.get_pool_stats()
    
    print(f"Connection Pool Status:")
    print(f"  Total connections: {stats['size']}/{stats['max_size']}")
    print(f"  In use: {stats['in_use']}")
    print(f"  Available: {stats['available']}")
    
    # Alert on pool exhaustion
    if stats['available'] == 0:
        print("⚠️ WARNING: Connection pool exhausted!")
    
    # Calculate utilization percentage
    utilization = (stats['in_use'] / stats['max_size']) * 100
    print(f"  Utilization: {utilization:.1f}%")
    
    await db.close()
```

**Integration with Observability Tools:**
- Export metrics to **Prometheus** for alerting
- Visualize trends in **Grafana** dashboards
- Send to **CloudWatch** for AWS monitoring
- Track in **Datadog** or **New Relic**

## Connect with the Author

This project was created by **Chanikya Nelapatla**.

-   **LinkedIn:** [https://www.linkedin.com/in/chanikkyasaai/](https://www.linkedin.com/in/chanikkyasaai/)
-   **GitHub:** [https://github.com/chanikkyasaai](https://github.com/chanikkyasaai)

## License

This project is licensed under the MIT License.
