Metadata-Version: 2.4
Name: cognite-ai
Version: 0.7.5
Summary: A set of AI tools for working with Cognite Data Fusion in Python.
Author-email: Anders Hafreager <anders.hafreager@cognite.com>
License: MIT
License-File: LICENSE.md
Requires-Python: >=3.10
Requires-Dist: pydantic>=2.0
Requires-Dist: regex>=2022.10.31
Provides-Extra: dev
Requires-Dist: cognite-sdk>=7.0.0; extra == 'dev'
Requires-Dist: pandasai==2.3.0; extra == 'dev'
Requires-Dist: pre-commit>=4.0.0; extra == 'dev'
Requires-Dist: pyright>=1.1.343; extra == 'dev'
Requires-Dist: pytest-asyncio>=1.2.0; extra == 'dev'
Requires-Dist: pytest-cov>=7.0.0; extra == 'dev'
Requires-Dist: pytest>=8.0.0; extra == 'dev'
Requires-Dist: ruff>=0.13; extra == 'dev'
Requires-Dist: twine>=5.1.1; extra == 'dev'
Description-Content-Type: text/markdown

# cognite-ai

A set of AI tools for working with CDF (Cognite Data Fusion) in Python, including vector stores and intelligent data manipulation features leveraging large language models (LLMs).

# Installation

This package is intended to be used in Cognite's Jupyter notebook and Streamlit. To get started, install the package using:

```bash
%pip install cognite-ai
```

## Smart Data Tools

With `cognite-ai`, you can enhance your data workflows by integrating LLMs for intuitive querying and manipulation of data frames. The module is built on top of [PandasAI](https://docs.pandas-ai.com/en/latest/) and adds Cognite-specific features.

The Smart Data Tools come in three components:

Pandas Smart DataFrame
Pandas Smart DataLake
Pandas AI Agent

### 1. Pandas Smart DataFrame

`SmartDataframe` enables you to chat with individual data frames, using LLMs to query, summarize, and analyze your data conversationally.

#### Example

```python
from cognite.ai import load_pandasai
from cognite.client import CogniteClient
import pandas as pd

# Load the necessary classes
client = CogniteClient()
SmartDataframe, SmartDatalake, Agent = await load_pandasai()

# Create demo data
workorders_df = pd.DataFrame({
    "workorder_id": ["WO001", "WO002", "WO003", "WO004", "WO005"],
    "description": [
        "Replace filter in compressor unit 3A",
        "Inspect and lubricate pump 5B",
        "Check pressure valve in unit 7C",
        "Repair leak in pipeline 4D",
        "Test emergency shutdown system"
    ],
    "priority": ["High", "Medium", "High", "Low", "Medium"]
})

# Create a SmartDataframe object
s_workorders_df = SmartDataframe(workorders_df, cognite_client=client)

# Chat with the dataframe
s_workorders_df.chat('Which 5 work orders are the most critical based on priority?')
```

#### Customizing LLM Parameters

You can configure the LLM parameters to control aspects like model selection and temperature.

```python
params = {
    "model": "azure/gpt-4.1",
    "temperature": 0.5
}

s_workorders_df = SmartDataframe(workorders_df, cognite_client=client, params=params)
```

### 2. Pandas Smart DataLake

`SmartDatalake` allows you to combine and query multiple data frames simultaneously, treating them as a unified data lake.

#### Example

```python
from cognite.ai import load_pandasai
from cognite.client import CogniteClient
import pandas as pd

# Load the necessary classes
client = CogniteClient()
SmartDataframe, SmartDatalake, Agent = await load_pandasai()

# Create demo data
workorders_df = pd.DataFrame({
    "workorder_id": ["WO001", "WO002", "WO003"],
    "asset_id": ["A1", "A2", "A3"],
    "description": ["Replace filter", "Inspect pump", "Check valve"]
})
workitems_df = pd.DataFrame({
    "workitem_id": ["WI001", "WI002", "WI003"],
    "workorder_id": ["WO001", "WO002", "WO003"],
    "task": ["Filter replacement", "Pump inspection", "Valve check"]
})
assets_df = pd.DataFrame({
    "asset_id": ["A1", "A2", "A3"],
    "name": ["Compressor 3A", "Pump 5B", "Valve 7C"]
})

# Combine them into a smart lake
smart_lake_df = SmartDatalake([workorders_df, workitems_df, assets_df], cognite_client=client)

# Chat with the unified data lake
smart_lake_df.chat("Which assets have the most work orders associated with them?")
```

### 3. Pandas AI Agent

The `Agent` provides conversational querying capabilities across a single data frame, allowing you to have follow up questions.

#### Example

```python
from cognite.ai import load_pandasai
from cognite.client import CogniteClient
import pandas as pd

# Load the necessary classes
client = CogniteClient()
SmartDataframe, SmartDatalake, Agent = await load_pandasai()

# Create example data
sensor_readings_df = pd.DataFrame({
    "sensor_id": ["A1", "A2", "A3", "A4", "A5"],
    "temperature": [75, 80, 72, 78, 69],
    "pressure": [30, 35, 33, 31, 29],
    "status": ["Normal", "Warning", "Normal", "Warning", "Normal"]
})

# Create an Agent for the dataframe
agent = Agent(sensor_readings_df, cognite_client=client)

# Ask a question
print(agent.chat("Which sensors are showing a warning status?"))
```

# Development

This project uses [uv](https://docs.astral.sh/uv/) for dependency management and building. To get started with development:

```bash
# Install dependencies
uv sync

# Install the package in editable mode
uv pip install -e .

# Build the package
uv build
```

For more information on using uv, see the [uv documentation](https://docs.astral.sh/uv/).

# Contributing

This package exists mainly to get around the install problems
a user gets in Pyodide when installing `pandasai` due to
dependencies that are not pure Python 3 wheels.

The current development cycle is not great, but consists of copying the contents
of the source code in this package into e.g. a Jupyter notebook
in Fusion to verify that everything works there.
