Metadata-Version: 2.4
Name: corvus-ai
Version: 0.3.78
Summary: Platform-agnostic, extensible AI-powered ML Development Assistant
Author: Shah Rahman
Project-URL: Homepage, https://github.com/CloudlyIO/corvus
Project-URL: Documentation, https://github.com/CloudlyIO/corvus#readme
Project-URL: Repository, https://github.com/CloudlyIO/corvus
Project-URL: Issues, https://github.com/CloudlyIO/corvus/issues
Keywords: ml,ai,assistant,agents,langchain,claude,mcp
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.10
Description-Content-Type: text/markdown
Requires-Dist: pydantic>=2.0
Requires-Dist: pydantic-settings>=2.0
Requires-Dist: python-dotenv>=1.0
Requires-Dist: typer>=0.9
Requires-Dist: rich>=13.0
Requires-Dist: prompt-toolkit>=3.0
Requires-Dist: pyyaml>=6.0
Requires-Dist: litellm>=1.30
Requires-Dist: anthropic>=0.25
Requires-Dist: openai>=1.0
Requires-Dist: langgraph>=0.1
Requires-Dist: langchain>=0.2
Requires-Dist: langchain-core>=0.2
Requires-Dist: httpx>=0.25
Requires-Dist: aiofiles>=23.0
Requires-Dist: python-jose[cryptography]>=3.3
Requires-Dist: sqlalchemy>=2.0
Requires-Dist: alembic>=1.13
Requires-Dist: keyring>=24.0
Requires-Dist: aiohttp>=3.11.11
Requires-Dist: urllib3>=2.3.0
Requires-Dist: werkzeug>=3.1.3
Requires-Dist: starlette>=0.45.3
Requires-Dist: setuptools>=75.8.0
Requires-Dist: certifi>=2024.12.14
Requires-Dist: jinja2>=3.1.5
Requires-Dist: cryptography>=44.0.0
Provides-Extra: dev
Requires-Dist: pytest>=7.4; extra == "dev"
Requires-Dist: hypothesis>=6.0; extra == "dev"
Requires-Dist: pytest-asyncio>=0.21; extra == "dev"
Requires-Dist: pytest-cov>=4.1; extra == "dev"
Requires-Dist: pytest-xdist>=3.3; extra == "dev"
Requires-Dist: pytest-split>=0.8; extra == "dev"
Requires-Dist: pytest-benchmark>=4.0; extra == "dev"
Requires-Dist: pytest-mock>=3.10; extra == "dev"
Requires-Dist: playwright>=1.40; extra == "dev"
Requires-Dist: ruff>=0.1; extra == "dev"
Requires-Dist: mypy>=1.5; extra == "dev"
Requires-Dist: black>=23.0; extra == "dev"
Requires-Dist: types-PyYAML>=6.0; extra == "dev"
Requires-Dist: pandas-stubs>=2.0; extra == "dev"
Requires-Dist: radon>=6.0; extra == "dev"
Requires-Dist: xenon>=0.9; extra == "dev"
Requires-Dist: pydocstyle>=6.3; extra == "dev"
Requires-Dist: sphinx>=7.0; extra == "dev"
Requires-Dist: sphinx-autodoc-typehints>=2.0; extra == "dev"
Requires-Dist: mkdocstrings[python]>=0.24; extra == "dev"
Requires-Dist: pre-commit>=3.5; extra == "dev"
Requires-Dist: mkdocs>=1.5; extra == "dev"
Requires-Dist: mkdocs-material>=9.4; extra == "dev"
Requires-Dist: pandas>=2.0; extra == "dev"
Requires-Dist: scikit-learn>=1.3; extra == "dev"
Requires-Dist: scipy>=1.11; extra == "dev"
Requires-Dist: pyarrow>=14.0; extra == "dev"
Provides-Extra: knowledge
Requires-Dist: qdrant-client>=1.7; extra == "knowledge"
Requires-Dist: sentence-transformers>=2.2; extra == "knowledge"
Requires-Dist: voyageai>=0.2; extra == "knowledge"
Requires-Dist: neo4j>=5.0; extra == "knowledge"
Requires-Dist: unstructured>=0.10; extra == "knowledge"
Provides-Extra: mcp
Requires-Dist: mcp>=1.0; extra == "mcp"
Provides-Extra: ml
Requires-Dist: torch>=2.0; extra == "ml"
Requires-Dist: transformers>=4.35; extra == "ml"
Requires-Dist: huggingface_hub>=0.20; extra == "ml"
Requires-Dist: mlflow>=2.9; extra == "ml"
Requires-Dist: wandb>=0.16; extra == "ml"
Requires-Dist: pandas>=2.0; extra == "ml"
Requires-Dist: pyarrow>=14.0; extra == "ml"
Requires-Dist: ydata-profiling>=4.6; extra == "ml"
Requires-Dist: mem0ai>=0.1; extra == "ml"
Provides-Extra: eval
Requires-Dist: dspy-ai>=2.4; extra == "eval"
Requires-Dist: ragas>=0.1; extra == "eval"
Requires-Dist: deepeval>=0.20; extra == "eval"
Requires-Dist: scipy>=1.11; extra == "eval"
Provides-Extra: api
Requires-Dist: fastapi>=0.115; extra == "api"
Requires-Dist: uvicorn[standard]>=0.32; extra == "api"
Requires-Dist: redis>=5.0; extra == "api"
Provides-Extra: jupyter
Requires-Dist: ipython>=7.0; extra == "jupyter"
Requires-Dist: nest-asyncio>=1.5; extra == "jupyter"
Requires-Dist: tornado>=6.0; extra == "jupyter"
Provides-Extra: modal
Requires-Dist: modal>=0.60; extra == "modal"
Provides-Extra: aws
Requires-Dist: sagemaker>=2.200; extra == "aws"
Requires-Dist: boto3>=1.34; extra == "aws"
Provides-Extra: gcp
Requires-Dist: google-cloud-aiplatform>=1.40; extra == "gcp"
Provides-Extra: azure
Requires-Dist: azure-ai-ml>=1.14; extra == "azure"
Requires-Dist: azure-identity>=1.15; extra == "azure"
Provides-Extra: databricks
Requires-Dist: databricks-connect>=13.0; extra == "databricks"
Provides-Extra: run
Requires-Dist: corvus-ai[aws,azure,gcp,modal]; extra == "run"
Provides-Extra: storage
Requires-Dist: fsspec>=2024.1; extra == "storage"
Requires-Dist: s3fs>=2024.1; extra == "storage"
Requires-Dist: gcsfs>=2024.1; extra == "storage"
Requires-Dist: adlfs>=2024.1; extra == "storage"
Requires-Dist: cloudpathlib[azure,gs,s3]>=0.18; extra == "storage"
Provides-Extra: all
Requires-Dist: corvus-ai[api,dev,eval,jupyter,knowledge,mcp,ml,run,storage]; extra == "all"

# Corvus - ML Development Assistant

[![Test](https://github.com/CloudlyIO/corvus/actions/workflows/test.yml/badge.svg)](https://github.com/CloudlyIO/corvus/actions/workflows/test.yml)
[![Lint](https://github.com/CloudlyIO/corvus/actions/workflows/lint.yml/badge.svg)](https://github.com/CloudlyIO/corvus/actions/workflows/lint.yml)
[![Security](https://github.com/CloudlyIO/corvus/actions/workflows/security.yml/badge.svg)](https://github.com/CloudlyIO/corvus/actions/workflows/security.yml)
[![Python 3.10+](https://img.shields.io/badge/python-3.10+-blue.svg)](https://www.python.org/downloads/)
[![Version](https://img.shields.io/badge/version-0.3.78-orange.svg)](CHANGELOG.md)

**Platform-agnostic, extensible AI-powered ML Development Assistant**

## Overview

Corvus is built on three pillars:

1. **Universal Skills Framework** - 24 ML skill functions across 24 modules for data engineering, model training, deployment, observability, and ML advisory
2. **MCP Integration** - 6 MCP server integrations for GitHub, PostgreSQL, Filesystem, MLflow, Docker, and Cloud Storage
3. **Knowledge Base** - Hybrid retrieval (BM25 + Vector + RRF), GraphRAG with Neo4j, persistent memory with Mem0
4. **AI Safety** - Defense-in-depth security with prompt injection defense, DLP, grounding validation, and quality monitoring

## Features

### Skills Library (24 Skills)

| Category | Skills |
|----------|--------|
| **Data Engineering** | data-profiler, feature-recommender, pipeline-generator, schema-designer, validation-generator |
| **Model Training** | model-selector, architecture-recommender, pytorch-codegen, experiment-tracker, distributed-config, hyperparameter-optimizer |
| **Deployment** | endpoint-generator, docker-generator, terraform-generator, k8s-manifest-generator, cicd-generator, quantization-exporter |
| **Observability** | monitoring-setup, drift-detector, dashboard-creator |
| **ML Advisory** | ml-spec-generator, experiment-advisor |
| **Model Hub** | huggingface-skills, github-ml-skills |

### MCP Integrations (6 Servers)

| Server | Capabilities |
|--------|--------------|
| **GitHub** | Repository access, PRs, issues, commits |
| **PostgreSQL** | Database queries, schema management |
| **Filesystem** | File operations, directory traversal |
| **MLflow** | Experiment tracking, model registry |
| **Docker** | Container management, image operations |
| **S3/GCS** | Cloud storage operations |

### Knowledge Base

| Component | Description |
|-----------|-------------|
| **Qdrant Vector Store** | Semantic search with voyage-code-3 embeddings |
| **Hybrid Retrieval** | BM25 + Vector search with RRF fusion |
| **Cross-Encoder Reranking** | Precision improvement for retrieval |
| **Neo4j GraphRAG** | Knowledge graph with entity extraction |
| **Mem0 Memory** | Persistent memory across sessions |

### AI Safety & Guardrails

| Component | Purpose |
|-----------|---------|
| **Prompt Guard** | Injection detection and input validation |
| **Tool Guard** | MCP security and sandboxing |
| **Memory Guard** | Memory injection prevention |
| **Grounding Validator** | Hallucination prevention with citations |
| **DLP Scanner** | PII/credential detection and redaction |
| **Quality Monitor** | Output quality and drift detection |

## Quick Start

```bash
# Clone
git clone https://github.com/CloudlyIO/corvus.git
cd corvus

# Install with all dependencies
pip install -e ".[all]"

# Install pre-commit hooks
pre-commit install --hook-type pre-commit --hook-type commit-msg

# Run tests
make test
```

## Usage Examples

### Using Skills

```python
import asyncio
from corvus.skills.library.model_selector import select_model
from corvus.skills.library.pytorch_codegen import generate_pytorch_code

async def main():
    # Select a model for your task
    result = await select_model(
        task_type="classification",
        data_characteristics={
            "num_samples": 10000,
            "num_features": 50,
            "num_classes": 3,
        },
    )
    print(f"Recommended: {result['recommended_model']}")

    # Generate PyTorch training code
    code = await generate_pytorch_code(
        model_type="mlp_classifier",
        input_features=50,
        output_classes=3,
        hidden_layers=[128, 64],
    )
    print(code["model_code"])

asyncio.run(main())
```

### Using Safety Features

```python
from corvus.safety import (
    validate_input,
    create_dlp_scanner,
    assess_quality,
)

# Validate user input for injection attempts
result = validate_input("Help me build a classifier")
if result.is_safe:
    print("Input is safe")

# Scan for sensitive data
scanner = create_dlp_scanner()
dlp_result = scanner.scan_content("Contact: john@example.com")
if dlp_result.pii_count > 0:
    print("PII detected!")

# Assess response quality
quality = assess_quality(
    "Here's a detailed explanation with examples...",
    prompt="Explain gradient descent"
)
print(f"Quality score: {quality.overall_score:.2f}")
```

### Running the E2E Demo

```bash
# Run the complete ML workflow demo
python3 demos/e2e/ml_workflow_demo.py
```

## Development

```bash
# Testing (TDD)
make test           # Unit tests
make test-all       # All tests
make test-cov       # With coverage

# Evaluation (EDD)
make benchmark      # Run benchmarks

# Quality
make lint           # Ruff linting
make format         # Black formatting
make quality        # Lint + typecheck
```

## Project Structure

```
corvus/
├── src/corvus/              # Main package
│   ├── core/              # Config, exceptions
│   ├── skills/            # Universal skills framework (24 skills)
│   ├── llm/               # LLM gateway (LiteLLM)
│   ├── agents/            # Agent orchestration (LangGraph)
│   ├── knowledge/         # Knowledge base (RAG/GraphRAG/Memory)
│   ├── mcp/               # MCP integrations (6 servers)
│   └── safety/            # AI safety & guardrails (6 components)
├── tests/                 # Test suite
│   ├── unit/              # 10,168+ tests
│   ├── integration/       # integration tests
│   └── e2e/               # End-to-end tests
├── benchmarks/            # Performance benchmarks
├── demos/                 # Executable examples
│   ├── skills/            # Individual skill demos
│   ├── knowledge/         # Knowledge base demos
│   ├── mcp/               # MCP integration demos
│   ├── safety/            # Safety feature demos
│   └── e2e/               # End-to-end workflow demos
└── docs/                  # Documentation
```

## Documentation

| Document | Description |
|----------|-------------|
| [Documentation Index](docs/index.md) | All guides, API reference, architecture docs |
| [Status](docs/STATUS.md) | Current project status |
| [Architecture](docs/architecture/overview.md) | Technical architecture |
| [Contributing](CONTRIBUTING.md) | Developer workflow |
| [Changelog](CHANGELOG.md) | Version history |

## Configuration

```bash
# Core
CORVUS_DEBUG=true|false
CORVUS_LOG_LEVEL=INFO|DEBUG|WARNING|ERROR

# LLM
CORVUS_LLM_DEFAULT_PROVIDER=anthropic|openai
CORVUS_LLM_DEFAULT_MODEL=claude-sonnet-4-20250514

# Knowledge Base
CORVUS_KNOWLEDGE_VECTOR_STORE=qdrant
CORVUS_KNOWLEDGE_EMBEDDING_MODEL=voyage-code-3
```

## Technology Stack

| Layer | Technology |
|-------|------------|
| Agent Orchestration | LangGraph |
| LLM Gateway | LiteLLM |
| Primary LLM | Claude Sonnet 4 / Opus 4.5 |
| Vector Database | Qdrant |
| Knowledge Graph | Neo4j |
| Code Embeddings | voyage-code-3 |
| Text Embeddings | e5-large-v2 |
| Memory System | Mem0 |
| Tool Integration | MCP |

## License

MIT
