Metadata-Version: 2.4
Name: bot-knows
Version: 0.1.3
Summary: Framework-agnostic Python library for graph-backed personal knowledge bases from chat data
Project-URL: Homepage, https://github.com/Snezhana/bot-knows
Project-URL: Documentation, https://github.com/Snezhana/bot-knows#readme
Project-URL: Repository, https://github.com/Snezhana/bot-knows
Project-URL: Issues, https://github.com/Snezhana/bot-knows/issues
Author-email: Your Name <your@email.com>
License-Expression: MIT
License-File: LICENSE
Keywords: chat,embedding,graph,knowledge-base,memory,nlp,recall
Classifier: Development Status :: 3 - Alpha
Classifier: Framework :: AsyncIO
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3.13
Classifier: Programming Language :: Python :: 3.14
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Typing :: Typed
Requires-Python: >=3.13
Requires-Dist: anyio<5.0,>=4.7
Requires-Dist: numpy<3.0,>=2.2
Requires-Dist: pydantic-settings<3.0,>=2.6
Requires-Dist: pydantic<3.0,>=2.10
Requires-Dist: structlog<26.0,>=25.1
Provides-Extra: anthropic
Requires-Dist: anthropic<1.0,>=0.42; extra == 'anthropic'
Provides-Extra: dev
Requires-Dist: hypothesis<7.0,>=6.100; extra == 'dev'
Requires-Dist: mongomock-motor>=0.0.32; extra == 'dev'
Requires-Dist: mypy<2.0,>=1.14; extra == 'dev'
Requires-Dist: pre-commit<5.0,>=4.0; extra == 'dev'
Requires-Dist: pytest-asyncio<1.0,>=0.25; extra == 'dev'
Requires-Dist: pytest-cov<7.0,>=6.0; extra == 'dev'
Requires-Dist: pytest<9.0,>=8.3; extra == 'dev'
Requires-Dist: ruff<1.0,>=0.9; extra == 'dev'
Provides-Extra: mongo
Requires-Dist: motor<4.0,>=3.6; extra == 'mongo'
Provides-Extra: neo4j
Requires-Dist: neo4j<6.0,>=5.27; extra == 'neo4j'
Provides-Extra: openai
Requires-Dist: openai<2.0,>=1.59; extra == 'openai'
Provides-Extra: redis
Requires-Dist: redis<6.0,>=5.2; extra == 'redis'
Provides-Extra: taskiq
Requires-Dist: taskiq-redis<2.0,>=1.0; extra == 'taskiq'
Requires-Dist: taskiq<1.0,>=0.11; extra == 'taskiq'
Description-Content-Type: text/markdown

# bot-knows

A framework-agnostic Python library for building graph-backed personal knowledge bases from chat data.
Implemented with Claude Code (model: claude-opus-4-5).

## Features

- **Multi-source Chat Ingestion**: Import chats from ChatGPT, Claude, and custom JSON formats
- **Semantic Topic Extraction**: LLM-powered topic extraction with confidence scores
- **Intelligent Deduplication**: Embedding-based semantic deduplication with configurable thresholds
- **Graph-backed Knowledge Base**: Neo4j-powered relationship graph for topics and messages
- **Evidence-weighted Recall**: Spaced repetition-inspired recall system with decay and reinforcement
- **Pluggable Infrastructure**: Bring your own storage, graph database, or LLM provider

## Requirements

- Python >= 3.13
- MongoDB (storage) - or custom storage implementation
- Neo4j (graph database) - or custom graph implementation
- Redis (optional, for caching)
- OpenAI or Anthropic API key (for LLM features) - or custom LLM implementation

## Installation

```bash
pip install bot-knows
```

Or with uv:

```bash
uv add bot-knows
```

### Optional Dependencies

Install with optional dependencies for specific infrastructure:

```bash
# With pip - install specific extras
pip install bot-knows[mongo,neo4j,openai]

# With uv
uv add bot-knows[mongo,neo4j,openai]
```

Available extras:
- `mongo` - MongoDB storage (motor)
- `neo4j` - Neo4j graph database
- `redis` - Redis caching
- `taskiq` - Task queue support
- `openai` - OpenAI LLM provider
- `anthropic` - Anthropic LLM provider

## Quick Start

The `BotKnows` class is the main orchestrator that accepts implementation classes for storage, graph database, and LLM providers. Configuration is automatically loaded from environment variables.

### Using Built-in Infrastructure

```python
from bot_knows import (
    BotKnows,
    MongoStorageRepository,
    Neo4jGraphRepository,
    OpenAIProvider,
    ChatGPTAdapter,
)

async def main():
    # Config is loaded from .env automatically
    async with BotKnows(
        storage_class=MongoStorageRepository,
        graphdb_class=Neo4jGraphRepository,
        llm_class=OpenAIProvider,
    ) as bk:
        # Import ChatGPT conversations
        result = await bk.insert_chats("conversations.json", ChatGPTAdapter)
        print(f"Imported {result.chats_new} chats, {result.topics_created} topics")

        # Query the knowledge base
        topics = await bk.get_chat_topics(chat_id)
        due_topics = await bk.get_due_topics(threshold=0.3)
```

### Available Implementations

**Storage:**
- `MongoStorageRepository` - MongoDB-based storage

**Graph Database:**
- `Neo4jGraphRepository` - Neo4j graph database

**LLM Providers:**
- `OpenAIProvider` - OpenAI API (GPT models + embeddings)
- `AnthropicProvider` - Anthropic API (Claude models)

**Import Adapters:**
- `ChatGPTAdapter` - ChatGPT export format
- `ClaudeAdapter` - Claude export format
- `GenericJSONAdapter` - Custom JSON format


## Custom Implementations

You can provide your own implementations by implementing the required interfaces. Set `config_class = None` on your class and pass configuration via the `*_custom_config` parameters.

### Interfaces

- `StorageInterface` - Persistent storage for chats, messages, topics, evidence, and recall state
- `GraphServiceInterface` - Graph database operations for the knowledge graph
- `LLMInterface` - LLM interactions for classification and topic extraction
- `EmbeddingServiceInterface` - Text embedding generation

### Example: Custom Storage Implementation

```python
from bot_knows import BotKnows, StorageInterface, Neo4jGraphRepository, OpenAIProvider

class MyCustomStorage:
    """Custom storage implementation."""

    config_class = None  # Signals custom config

    @classmethod
    async def from_dict(cls, config: dict) -> "MyCustomStorage":
        """Factory method for custom config."""
        return cls(connection_string=config["connection_string"])

    def __init__(self, connection_string: str):
        self.conn = connection_string

    # Implement all StorageInterface methods...
    async def save_chat(self, chat): ...
    async def get_chat(self, chat_id): ...
    # ... etc

async with BotKnows(
    storage_class=MyCustomStorage,
    graphdb_class=Neo4jGraphRepository,
    llm_class=OpenAIProvider,
    storage_custom_config={"connection_string": "postgresql://..."},
) as bk:
    result = await bk.insert_chats("data.json", ChatGPTAdapter)
```

### Example: Custom LLM Provider

```python
from bot_knows import BotKnows, LLMInterface, MongoStorageRepository, Neo4jGraphRepository

class MyLLMProvider:
    """Custom LLM provider (e.g., local model, different API)."""

    config_class = None

    @classmethod
    async def from_dict(cls, config: dict) -> "MyLLMProvider":
        return cls(api_url=config["api_url"], model=config["model"])

    def __init__(self, api_url: str, model: str):
        self.api_url = api_url
        self.model = model

    # Implement LLMInterface methods
    async def classify_chat(self, first_pair, last_pair): ...
    async def extract_topics(self, user_content, assistant_content): ...
    async def normalize_topic_name(self, name): ...

    # Implement EmbeddingServiceInterface if used as embedding provider
    async def embed(self, texts): ...

async with BotKnows(
    storage_class=MongoStorageRepository,
    graphdb_class=Neo4jGraphRepository,
    llm_class=MyLLMProvider,
    llm_custom_config={"api_url": "http://localhost:8000", "model": "llama3"},
) as bk:
    result = await bk.insert_chats("data.json", ChatGPTAdapter)
```

## Configuration

Configuration is loaded from environment variables. See `.env.example` for all available options.

Key environment variables:
- `MONGODB_URI` - MongoDB connection string
- `NEO4J_URI`, `NEO4J_USER`, `NEO4J_PASSWORD` - Neo4j connection
- `OPENAI_API_KEY` - OpenAI API key
- `ANTHROPIC_API_KEY` - Anthropic API key
- `DEDUP_HIGH_THRESHOLD`, `DEDUP_LOW_THRESHOLD` - Deduplication thresholds

## Architecture

```
Input Sources (ChatGPT, Claude, Custom JSON)
        ↓
Import Adapters (normalize to ChatIngest)
        ↓
Domain Processing
  ├── Chat identity resolution
  ├── One-time Chat classification
  ├── Message creation & ordering
        ↓
Topic Extraction
  ├── LLM-based extraction
  ├── Semantic deduplication
  ├── Evidence append
        ↓
Graph Updates (Neo4j)
```

## Retrieval API

```python
async with BotKnows(...) as bk:
    # Get messages for a chat
    messages = await bk.get_messages_for_chat(chat_id)

    # Get topics for a chat
    topic_ids = await bk.get_chat_topics(chat_id)

    # Get related topics
    related = await bk.get_related_topics(topic_id, limit=10)

    # Get topic evidence
    evidence = await bk.get_topic_evidence(topic_id)

    # Spaced repetition recall
    recall_state = await bk.get_recall_state(topic_id)
    due_topics = await bk.get_due_topics(threshold=0.3)
    all_states = await bk.get_all_recall_states()
```

## Development

```bash
# Install with dev dependencies
uv sync --dev

# Install with dev and optional dependencies
uv sync --dev --extra mongo --extra neo4j --extra openai

# Install all extras
uv sync --dev --all-extras

# Run tests
uv run pytest

# Type checking
uv run mypy src/

# Linting
uv run ruff check src/
```

## Future Plans

The built-in infrastructure will be extended with additional providers:

- **Storage**: PostgreSQL, SQLite
- **Graph**: Amazon Neptune, TigerGraph, MemGraph
- **LLM**: Google Gemini, Ollama, HuggingFace

## Contributing

Contributions are welcome! If you'd like to add a new infrastructure implementation:

1. Implement the appropriate interface (`StorageInterface`, `GraphServiceInterface`, `LLMInterface`, or `EmbeddingServiceInterface`)
2. Add a `config_class` for environment-based configuration (or set to `None` for custom config)
3. Implement the `from_config` class method (or `from_dict` if `config_class` is `None`)
4. Add tests for your implementation
5. Submit a pull request

## License

MIT
