Metadata-Version: 2.4
Name: mais
Version: 2.1.3
Summary: A Jupyter-compatible plugin that detects risky ML model and dataset loads.
Author-email: Daniel Bardenstein <daniel@manifestcyber.com>
Requires-Python: >=3.11
Description-Content-Type: text/markdown
Requires-Dist: anchore-syft>=1.18.1
Requires-Dist: ipython>=7.0.0
Requires-Dist: pydantic>=2.11.7
Requires-Dist: pydantic-settings>=2.0.0
Requires-Dist: requests>=2.31.0

# MAIS - ML Model Audit & Inspection System

A Python notebook plugin that watches for potentially risky model or dataset loads in Jupyter notebooks. MAIS analyzes code in real-time to detect when you're trying to load models that might require special permissions or licensing.

## Detection Architecture - V1 vs V2

MAIS offers two detection architectures that can be toggled via feature flag:

### 🔄 **V1: Legacy Baseline Detection (Default)**

- **Production-safe default** for backward compatibility
- Uses configuration-based pattern matching
- Watches predefined function lists in `config.py`
- **Best for**: Stable production environments

### 🚀 **V2: Provider-Based Detection (Enhanced)**

- **Specialized detectors** for major ML/AI providers
- **Comprehensive coverage** including patterns V1 misses
- **Provider-specific intelligence** for better accuracy
- **Best for**: Development and comprehensive model monitoring

| Provider        | V1 Detection           | V2 Detection                  |
| --------------- | ---------------------- | ----------------------------- |
| **HuggingFace** | ✅ Basic patterns      | ✅ Advanced + Hub integration |
| **OpenAI**      | ❌ **Missed patterns** | ✅ **Full API coverage**      |
| **PyTorch**     | ✅ torch.load          | ✅ Extended patterns          |
| **Anthropic**   | ❌ **Not detected**    | ✅ **Claude API detection**   |
| **LangChain**   | ❌ **Framework blind** | ✅ **Full framework support** |
| **LlamaIndex**  | ❌ **Not detected**    | ✅ **Document processing**    |

## Architecture Overview

MAIS uses a flexible, strategy-based architecture with multiple specialized components:

![MAIS Architecture](docs/images/architecture.svg)

### Additional Architecture Views

| View                | Purpose                             | Link                                                       |
| ------------------- | ----------------------------------- | ---------------------------------------------------------- |
| **📊 Dependencies** | Component relationships & data flow | [MAIS_DEPENDENCY.svg](docs/images/MAIS_DEPENDENCY.svg)     |
| **⚡ Process Flow** | End-to-end analysis workflow        | [MAIS_PROCESS.svg](docs/images/MAIS_PROCESS.svg)           |
| **🏗️ DDD Layers**   | Domain-driven design structure      | [MAIS_ARCHITECTURE.svg](docs/images/MAIS_ARCHITECTURE.svg) |

### Core Components

#### 📥 **Input Layer**

Processes various types of source code inputs:

- **Source Code**: Direct Python code analysis
- **Notebooks**: Jupyter notebook cell analysis
- **Requirements**: Dependency file scanning
- **Python Files**: Static file analysis

#### 🔍 **Provider-Specific Detectors**

Specialized detectors for different ML/AI providers and frameworks:

- **OpenAI**: Detects GPT, DALL-E, and OpenAI API usage
- **HuggingFace**: Identifies Transformers, Datasets, and Hub model loads
- **Anthropic**: Catches Claude API integrations
- **LangChain**: Finds LangChain components and chains
- **LlamaIndex**: Detects LlamaIndex document processing

#### ⚙️ **Detection Strategies**

Pluggable analysis approaches that detectors can use:

- **AST Strategy**: Advanced parsing with variable resolution for complex code analysis
- **Regex Strategy**: Fast pattern matching for simple detection scenarios
- **LLM-based Strategy**: Future AI-powered code understanding

#### 📊 **Intermediate Output**

Analysis results from provider detectors:

- **Model Findings**: Detected model usage with metadata
- **Risk Assessment**: Security and compliance evaluation
- **Inventory Mapping**: Model-to-provider relationship mapping

#### 📋 **JSON Schema Standardization**

Converts findings into structured format:

- **AI Detection JSON Schema**: Standardized detection results format
- **Provider Attribution**: Links findings to specific ML providers
- **Risk Categorization**: Security and compliance classifications

#### 📦 **SBOM Generation**

Creates comprehensive software bills of materials:

- **manifest-cli Integration**: Uses external SBOM generation tools
- **SBOM Builder**: Internal component for SBOM creation
- **Dependency Analysis**: Maps AI/ML dependencies

#### 📤 **Output Formats**

Multiple standard formats for integration:

- **CycloneDX JSON**: Industry-standard SBOM format
- **SPDX JSON**: Open-source license compliance format

## Installation

```bash
# Using pip
pip install mais
```

```python
# Import and initialize the MAIS plugin
from mais import MAIS

# V1: Default legacy detection (production-safe)
m = MAIS(api_token="<manifest-api-token>")

# V2: Enhanced provider-based detection (recommended for dev/comprehensive monitoring)
m = MAIS(api_token="<manifest-api-token>", use_v2_detectors=True)

# Now run your notebook as normal
# MAIS will monitor for potentially risky model loads
```

## Detection Architecture Configuration

### Constructor Parameter (Per Instance)

```python
# Use V2 provider-based detection architecture
from mais import MAIS

# Enable V2 provider-based detection (default: legacy V1)
m = MAIS(api_token="token", use_v2_detectors=True)

# Use legacy detection (default)
m = MAIS(api_token="token")  # or use_v2_detectors=False

# Explicitly use legacy detection
m = MAIS(api_token="token", use_v2_detectors=False)
```

### Google Colab Usage

Perfect for environments where you can't set environment variables:

```python
from google.colab import userdata
api_token = userdata.get('MANIFEST_API_KEY')

from mais import MAIS
# Use V2 for comprehensive OpenAI + HuggingFace detection
m = MAIS(api_token=api_token, use_v2_detectors=True)
```

### Advanced Usage

MAIS supports different detection strategies and provider combinations:

```python
from mais.application.services.ast_analyzer import ASTAnalyzer

# Use default baseline detection (backward compatible)
analyzer = ASTAnalyzer()

# Or use with custom detectors
from mais.domain.model_analysis.detectors.baseline_detector import BaselineDetector
analyzer = ASTAnalyzer(detectors=[BaselineDetector()])

# Analyze code for model usage
findings = analyzer.analyze_code(your_code)
```

## SBOM Generation

```python
# Generate an SBOM for your project or notebook environment.
m.create_sbom(path=".", publish=False)
```

## SBOM Publishing

```python
m.create_sbom(path=".", publish=True)
```

## Environment Variables

MAIS supports configuration through environment variables:

### Core Configuration

- `MANIFEST_API_TOKEN` - API token for MOSAIC/Manifest integration
- `MAIS_MOSAIC_API_URL` - Override default API URL
- `MAIS_DEFAULT_VERBOSITY` - Set default logging level
- `MAIS_API_TIMEOUT` - API request timeout in seconds

All configuration values can be overridden with `MAIS_` prefix.

## Detection Mode Information

```python
from mais import MAIS

m = MAIS(api_token="token", use_v2_detectors=True)

# Check current detection mode
print(m.get_detection_mode())  # "new" or "legacy"

# Get detailed detection information
info = m.get_detection_info()
print(info["detection_mode"])      # Current mode
print(info["source"])              # "constructor parameter" or "config/environment"
print(info["feature_flag"])        # Environment variable name
print(info["current_value"])       # Boolean value of feature flag
```
