Metadata-Version: 2.3
Name: c3-charm
Version: 0.1.0
Summary: Python SDK for the CHARM time-series foundation model — embeddings, forecasting, and a downstream-task toolkit.
License: Apache-2.0
Keywords: time-series,embeddings,forecasting,foundation-model,anomaly-detection
Author: C3 AI
Author-email: opensource@c3.ai
Requires-Python: >=3.10
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Scientific/Engineering :: Information Analysis
Classifier: Typing :: Typed
Provides-Extra: toolkit
Requires-Dist: datasetsforecast (>=0.1.0) ; extra == "toolkit"
Requires-Dist: dill (>=0.3.7) ; extra == "toolkit"
Requires-Dist: gin-config (>=0.5.0) ; extra == "toolkit"
Requires-Dist: httpx[http2] (>=0.25.0)
Requires-Dist: lightgbm (>=4.0.0) ; extra == "toolkit"
Requires-Dist: matplotlib (>=3.7.0) ; extra == "toolkit"
Requires-Dist: minisom (>=2.3.1) ; extra == "toolkit"
Requires-Dist: numpy (>=1.24.0) ; extra == "toolkit"
Requires-Dist: optuna (>=3.0.0) ; extra == "toolkit"
Requires-Dist: pandas (>=2.0.0) ; extra == "toolkit"
Requires-Dist: pyarrow (>=14.0.0) ; extra == "toolkit"
Requires-Dist: python-dotenv (>=1.0.0)
Requires-Dist: requests (>=2.31.0)
Requires-Dist: scienceplots (>=2.0.0) ; extra == "toolkit"
Requires-Dist: scikit-learn (>=1.3.0) ; extra == "toolkit"
Requires-Dist: seaborn (>=0.13.0) ; extra == "toolkit"
Requires-Dist: tensordict (>=0.1.0) ; extra == "toolkit"
Requires-Dist: torch (>=2.0.0) ; extra == "toolkit"
Requires-Dist: tqdm (>=4.60.0)
Project-URL: Documentation, https://github.com/c3ai/c3-charm#readme
Project-URL: Homepage, https://c3.ai
Project-URL: Issues, https://github.com/c3ai/c3-charm/issues
Project-URL: Repository, https://github.com/c3ai/c3-charm
Description-Content-Type: text/markdown

# c3-charm

[![PyPI version](https://img.shields.io/pypi/v/c3-charm.svg)](https://pypi.org/project/c3-charm/)
[![License](https://img.shields.io/badge/License-Apache_2.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)
[![Python](https://img.shields.io/pypi/pyversions/c3-charm.svg)](https://pypi.org/project/c3-charm/)

A Python SDK for interacting with the CHARM time-series API. It provides a simple interface for **embeddings** (multivariate time series → vectors) and **forecast/backcast** (predict future or reconstruct past steps).

## What is CHARM?

CHARM (CHannel Aware Representation Model) is a foundation model specifically designed for multi-variate time series data. It generates high-quality embeddings that capture the semantic essence of time series segments, making them ideal for various downstream applications:

- **Anomaly detection**: Identify unusual patterns in time series data
- **Clustering**: Group similar time series together
- **Classification**: Categorize time series into predefined classes
- **Forecasting**: Improve time series predictions
- **Similarity search**: Find similar patterns across large datasets

## Data shapes and API reference

Use this section to shape your inputs and interpret outputs. The API has two endpoints; both expect the same **time series format**.

### Shared input format (embeddings and forecast/backcast)

Every request uses:

- **`descriptions`**: List of **channel names** per time series.  
  - Type: `list[list[str]]`.  
  - Shape: **(N, C)** — N samples, each with C channel names.  
  - Example: `[["engine", "temperature"], ["fan", "speed"]]` for N=2, C=2.

- **`ts_array`**: List of **time series values** (one per sample).  
  - Type: `list[list[list[float]]]`.  
  - Shape: **(N, T, C)** — N samples, each of T timesteps × C channels.  
  - **All samples in a single request must have the same T and the same C.**  
  - Example: one sample with 10 timesteps and 2 channels → a list of 10 rows, each row a list of 2 floats.

**Conventions:**

- **N** = batch size (number of time series in the call).  
- **T** = timesteps per series (same for all). Must be **≥ 1** and **< 1500** (SDK enforces T < 1500).  
- **C** = channels per series (same for all). Must be **< 1500**.  
- **N × C × T ≤ 500,000** per request (client may split into multiple requests via batching).

### 1. Embeddings — `client.embeddings.create` / `client.embeddings.async_create`

**Endpoint:** `POST {base_url}/predict`

| | |
|--|--|
| **Input** | `descriptions` (N×C), `ts_array` (N×T×C). See shared format above. |
| **Output** | `response.embeds`: one vector per time series. Shape **(N, D)** where D = embedding dimension (model-dependent). |
| **Return type** | `EmbeddingsResponse`: `.embeds`, `.model`, `.usage`, `.raw`. |

Use `return_tensors="list"`, `"np"`, or `"torch"` to get lists, a NumPy array, or a PyTorch tensor.

### 2. Forecast — `client.prediction.create` / `client.prediction.async_create`

**Endpoint:** `POST {base_url}/forecast`

**Input:** Same `descriptions` and `ts_array` as above, plus:

- **`target_len`** (int, required, non-zero):  
  - **Positive** → forecast that many steps **ahead** (e.g. `10` = next 10 steps).  
  - **Negative** → backcast that many steps **in the past** (e.g. `-8` = last 8 steps).

| | |
|--|--|
| **Output** | `response.denormalized_predictions`: predictions in original scale. Shape **(N, abs(target_len), C, Q)** where Q = number of quantiles (e.g. 21). |
| **Also** | `response.predictions` (normalized), `response.data` (input echo). Same batch dimension N. |
| **Return type** | `ForecastResponse`: `.denormalized_predictions`, `.predictions`, `.data`, `.target_len`, `.mode` (`"forecast"` or `"backcast"`), `.raw`. |

Use `return_tensors="list"`, `"np"`, or `"torch"` for all tensor fields.

### Quick reference

| Functionality | Method | Input | Output shape (main) |
|---------------|--------|--------|---------------------|
| Embeddings | `embeddings.create` / `async_create` | `descriptions` (N×C), `ts_array` (N×T×C) | **(N, D)** |
| Forecast | `prediction.create` / `async_create` | Same + `target_len` > 0 | **(N, target_len, C, Q)** |
| Backcast | `prediction.create` / `async_create` | Same + `target_len` < 0 | **(N, abs(target_len), C, Q)** |

## Installation

Install from PyPI:

```bash
pip install c3-charm
```

To include the downstream-task toolkit (models, trainers, datasets):

```bash
pip install c3-charm[toolkit]
```

Or install from source with Poetry:

```bash
git clone https://github.com/c3ai/c3-charm.git
cd c3-charm
poetry install                    # core SDK only
poetry install --with toolkit     # include toolkit dependencies
```

## Dependencies

**Core** (installed by default):
- `requests` — synchronous HTTP client
- `httpx[http2]` — asynchronous HTTP/2 client
- `python-dotenv` — `.env` file loading
- `tqdm` — progress bars

**Toolkit** (optional, `pip install c3-charm[toolkit]`):
- `torch`, `tensordict` — tensor operations
- `numpy`, `pandas` — data manipulation
- `matplotlib`, `seaborn`, `scienceplots` — visualization
- `scikit-learn` — ML utilities
- `lightgbm`, `optuna` — gradient boosting & hyperparameter tuning
- `gin-config` — experiment configuration

## Quick Start

```python
from charm import CharmClient
from dotenv import load_dotenv
import os

# Load environment variables from .env file
load_dotenv()

# Get API key and base URL from environment variables
api_key = os.getenv("CHARM_API_KEY", "your-api-key")
base_url = os.getenv("CHARM_BASE_URL", "http://your-server-url:8080")

# Create a client
client = CharmClient(
    base_url=base_url,
    api_key=api_key,
    timeout=30,         # Increased timeout for potentially large requests
    max_retries=3,      # Automatically retry failed requests
    http2=True,         # Enable HTTP/2 for async requests (default)
)

# Generate embeddings for time series data (synchronous with progress bar)
response = client.embeddings.create(
    descriptions=[["engine", "temperature"], ["fan", "speed"]],
    ts_array=[
        # First time series (10 timesteps, 2 channels)
        [
            [0.1, 0.2], [0.3, 0.4], [0.5, 0.6], [0.7, 0.8], [0.9, 1.0],
            [1.1, 1.2], [1.3, 1.4], [1.5, 1.6], [1.7, 1.8], [1.9, 2.0]
        ],
        # Second time series (10 timesteps, 2 channels)
        [
            [2.1, 2.2], [2.3, 2.4], [2.5, 2.6], [2.7, 2.8], [2.9, 3.0],
            [3.1, 3.2], [3.3, 3.4], [3.5, 3.6], [3.7, 3.8], [3.9, 4.0]
        ]
    ],
    batch_size=32,      # Process in batches of 32 (for large datasets)
    return_tensors="np", # Options: "list", "np", "torch"
    progress=True       # Show progress bar (default: True)
)

# Access the embeddings
embeddings = response.embeds
print(f"Model: {response.model}")
print(f"Embeddings shape: {embeddings.shape}")

# Asynchronous processing (much faster for large datasets)
import asyncio

async def generate_embeddings_async():
    response = await client.embeddings.async_create(
        descriptions=[["engine", "temperature"], ["fan", "speed"]],
        ts_array=[
            # Same time series data as above
            [
                [0.1, 0.2], [0.3, 0.4], [0.5, 0.6], [0.7, 0.8], [0.9, 1.0],
                [1.1, 1.2], [1.3, 1.4], [1.5, 1.6], [1.7, 1.8], [1.9, 2.0]
            ],
            [
                [2.1, 2.2], [2.3, 2.4], [2.5, 2.6], [2.7, 2.8], [2.9, 3.0],
                [3.1, 3.2], [3.3, 3.4], [3.5, 3.6], [3.7, 3.8], [3.9, 4.0]
            ]
        ],
        max_B_per_request=32,    # Process 32 time series per API call
        concurrency_per_call=8,  # Run up to 8 concurrent API calls
        return_tensors="np",     # Options: "list", "np", "torch"
        progress=True            # Show progress bar (default: True)
    )
    return response

# Run the async function
response_async = asyncio.run(generate_embeddings_async())
```

### Time Series Forecasting

The CHARM SDK also supports time series forecasting through the `/forecast` endpoint:

```python
# Forecasting (predict future values)
response = client.prediction.create(
    descriptions=[["sensor_A", "sensor_B"]],
    ts_array=[[
        [1.0, 2.0],
        [1.1, 2.1],
        [1.2, 2.2],
        [1.3, 2.3],
        [1.4, 2.4],
        [1.5, 2.5],
        [1.6, 2.6],
        [1.7, 2.7],
        [1.8, 2.8],
        [1.9, 2.9],
    ]],
    target_len=10,  # Forecast 10 steps ahead
    return_tensors="np"
)

# Access the denormalized predictions
forecast = response.denormalized_predictions
print(f"Forecast shape: {forecast.shape}")  # e.g., (1, 10, 2, Q) where Q is number of quantiles
print(f"Mode: {response.mode}")  # "forecast"

# Backcasting (reconstruct past values)
response = client.prediction.create(
    descriptions=[["sensor_A", "sensor_B"]],
    ts_array=[[
        [1.0, 2.0],
        [1.1, 2.1],
        [1.2, 2.2],
        [1.3, 2.3],
        [1.4, 2.4],
        [1.5, 2.5],
        [1.6, 2.6],
        [1.7, 2.7],
        [1.8, 2.8],
        [1.9, 2.9],
    ]],
    target_len=-8,  # Reconstruct last 8 steps
    return_tensors="np"
)

reconstructed = response.denormalized_predictions
print(f"Mode: {response.mode}")  # "backcast"
```

**Note:** `target_len` is required and must be non-zero:
- **Positive values**: Forecast future timesteps (e.g., `target_len=10`)
- **Negative values**: Reconstruct past timesteps (e.g., `target_len=-8`)

### Using a .env file

You can create a `.env` file in your project directory with the following content:

```
CHARM_API_KEY=your-api-key
CHARM_BASE_URL=http://your-server-url:8080
```

This allows you to keep your credentials separate from your code and avoid hardcoding sensitive information.

## Features

- OpenAI-style SDK for CHARM time-series embeddings
- API key authentication
- Automatic retries with exponential backoff
- Configurable timeouts
- Client-side batching for large datasets
- Flexible return types (Python lists, NumPy arrays, or PyTorch tensors)
- Both synchronous and asynchronous methods in a single client:
  - `client.embeddings.create()` - Synchronous method with progress tracking
  - `await client.embeddings.async_create()` - Asynchronous method with concurrent batch processing
  - `client.prediction.create()` - Synchronous prediction method
  - `await client.prediction.async_create()` - Asynchronous prediction method
- Progress tracking with tqdm for both sync and async methods
- HTTP/2 support for asynchronous requests
- Comprehensive error handling with specific exception types
- Binary protocol for efficient data transfer (handles raw fp16 bytes from server)

## Performance Considerations

- **Synchronous Method** (`client.embeddings.create`): Suitable for smaller datasets or when simplicity is preferred. Processes batches sequentially, which can be slow for large datasets. Now includes progress tracking with tqdm. Avoid sending very large batches (>100 samples) in a single request to prevent timeouts.

- **Asynchronous Method** (`client.embeddings.async_create`): Recommended for large datasets. Significantly faster due to concurrent processing with features like:
  - Parallel batch processing
  - Bounded concurrency to avoid overwhelming the server
  - Progress tracking for long-running operations
  - HTTP/2 support for efficient connections

## Payload limitations

The SDK and API enforce:

- **Timesteps per series:** T ≥ 1 and **T < 1500** (enforced by SDK).
- **Channels per series:** **C < 1500** (see [usage guide](./usage.md)).
- **Per-request size:** **N × C × T ≤ 500,000** (client-side batching can split larger jobs).
- **Batch consistency:** All time series in a single request must have the **same T** and the **same C**.

See the [Data shapes and API reference](#data-shapes-and-api-reference) section above for input/output shapes.

## Requirements

- Python 3.10+
- See [Installation](#installation) for dependency details

## Testing

The CHARM SDK uses pytest for testing. To run the tests:

```bash
# Install pytest if not already installed
pip install pytest

# Run all tests
python -m pytest tests/

# Run specific test file
python -m pytest tests/test_utils.py

# Run with verbose output
python -m pytest -v tests/
```

## Documentation

For detailed documentation, see the [examples](./docs/notebooks) directory, the [usage guide](./usage.md), the [quickstart guide](./quickstart.md), and the docstrings in the code.

## Example Applications

The CHARM SDK can be used for various time series applications:

1. **Anomaly Detection**: Identify unusual patterns in sensor data, network traffic, or financial transactions
2. **Time Series Clustering**: Group similar time series patterns for market segmentation or behavior analysis
3. **Classification**: Categorize time series data for predictive maintenance or activity recognition
4. **Similarity Search**: Find similar patterns across large datasets for pattern discovery
5. **Forecasting**: Predict future values or reconstruct past values in time series data

Check out the notebooks in the `docs/notebooks` directory for detailed examples of these applications.

## License

This project is licensed under the Apache License 2.0 — see the [LICENSE](LICENSE) file for details.

