Metadata-Version: 2.4
Name: memorai
Version: 0.1.3
Summary: Conversation memory as a knowledge graph — pipeline, retrieval, and generation
Author: MemOrai Contributors
License-Expression: MIT
Project-URL: Homepage, https://github.com/memorai/memorai
Project-URL: Repository, https://github.com/memorai/memorai
Project-URL: Bug Tracker, https://github.com/memorai/memorai/issues
Project-URL: Changelog, https://github.com/memorai/memorai/blob/main/CHANGELOG.md
Keywords: memory,knowledge-graph,llm,rag,conversation,nlp
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: python-dotenv>=1.0.0
Requires-Dist: pandas>=2.0.0
Requires-Dist: numpy>=1.24.0
Requires-Dist: networkx>=3.0
Requires-Dist: scipy>=1.10.0
Requires-Dist: sentence-transformers>=2.2.0
Requires-Dist: groq>=0.4.0
Requires-Dist: openai>=1.0.0
Requires-Dist: google-generativeai>=0.5.0
Requires-Dist: google-api-core>=2.0.0
Requires-Dist: pyarrow>=14.0.0
Requires-Dist: tqdm>=4.65.0
Requires-Dist: huggingface-hub>=0.20.0
Requires-Dist: sqlalchemy>=2.0.0
Requires-Dist: psycopg2-binary>=2.9.0
Requires-Dist: alembic>=1.11.0
Requires-Dist: redis>=5.0.0
Requires-Dist: rq>=1.15.0
Requires-Dist: neo4j>=5.0.0
Requires-Dist: pymongo>=4.0.0
Provides-Extra: backend
Requires-Dist: fastapi>=0.100.0; extra == "backend"
Requires-Dist: uvicorn[standard]>=0.22.0; extra == "backend"
Requires-Dist: httpx>=0.24.0; extra == "backend"
Requires-Dist: python-jose[cryptography]>=3.3.0; extra == "backend"
Requires-Dist: google-auth>=2.0.0; extra == "backend"
Requires-Dist: requests>=2.28.0; extra == "backend"
Requires-Dist: pydantic>=2.0.0; extra == "backend"
Provides-Extra: dev
Requires-Dist: build>=1.0.0; extra == "dev"
Requires-Dist: twine>=4.0.0; extra == "dev"
Requires-Dist: pytest>=7.0.0; extra == "dev"
Requires-Dist: pytest-cov>=4.0.0; extra == "dev"
Requires-Dist: ruff>=0.1.0; extra == "dev"
Dynamic: license-file

# MemOrai

[![PyPI version](https://img.shields.io/pypi/v/memorai.svg)](https://pypi.org/project/memorai/)
[![Python 3.10+](https://img.shields.io/badge/python-3.10+-blue.svg)](https://www.python.org/downloads/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](LICENSE)

Build knowledge graphs from conversations and answer questions using retrieval-augmented generation over a knowledge graph.

---

## 📦 Installation

### From PyPI

```bash
pip install memorai
```

### With FastAPI backend extras

```bash
pip install "memorai[backend]"
```

### From source (editable install)

```bash
git clone https://github.com/memorai/memorai.git
cd memorai
pip install -e .
# or with backend extras:
pip install -e ".[backend]"
```

---

## ⚙️ Configuration

Copy `.env.example` to `.env` (or set environment variables directly) before running:

```bash
cp .env.example .env
```

Then edit `.env` with your real credentials.

Example:

```env
# LLM Configuration (OpenAI-compatible endpoint)
LLM_API_KEY=your-api-key-here
LLM_BASE_URL=https://openrouter.ai/api/v1
LLM_MODEL=google/gemini-2.0-flash-001

# Embedding Model (HuggingFace model name)
EMBEDDING_MODEL=BAAI/bge-m3

# Database Configuration (Neo4j)
# Use Aura DB or a local Neo4j instance
NEO4J_URI=neo4j+s://your-id.databases.neo4j.io
NEO4J_USER=neo4j
NEO4J_PASSWORD=your-password
```

---

## 🚀 Quick Start

MemOrai uses **Neo4j** as a unified storage backend. It acts both as a **Document Store** (for pipeline states) and a **Knowledge Graph**. No local files are needed for retrieval once indexed.

### Python API

```python
import memorai

# Optional: configure at runtime instead of relying on os.getenv
memorai.configure(
    llm_provider="groq",
    llm_api_key="<your-llm-key>",
    llm_model="llama-3.3-70b-versatile",
    embedding_provider="cloudflare",
    cloudflare_api_token="<your-cf-token>",
    cloudflare_account_id="<your-cf-account-id>",
    neo4j_uri="neo4j+s://<your-instance>.databases.neo4j.io",
    neo4j_user="neo4j",
    neo4j_password="<your-neo4j-password>",
    max_workers=4,
    rpm_limit=60,
    timeout=120,
)

# 1. Initialize a conversation scope
memorai.create_conversation(
    conversation_id="alice-bot",
    name="Alice - Support Bot",
)

# 2. Index conversation history (builds graph in Neo4j)
history = [
    {"role": "user", "content": "My name is Alice and I live in Hanoi."},
    {"role": "assistant", "content": "Nice to meet you, Alice!"},
]

memorai.index(
    history=history,
    conversation_id="alice-bot",
    session_id="alice-session-001",
    update=True,
    fast_mode=True,
)

# 3. Retrieve using Graph Vector Search
result = memorai.retrieve(
    query="Where does Alice live?",
    conversation_id="alice-bot",
)
print(result["top_turn_contents"])
```

Notes:
- `conversation_id` isolates tenant data in Neo4j.
- `session_id` lets you append incremental chat batches inside one conversation scope.
- `fast_mode=True` runs low-latency indexing (skips heavy post-processing).

### CLI — Full pipeline

```bash
# Run full pipeline from a JSON file
memorai pipeline \
    --input_json data/conversations.json \
    --output_dir output \
    --save_embeddings \
    --cleanup

# Answer a single question
memorai qa \
    --data_path output/graph_db/session-001 \
    --query "Where does Alice live?"

# Batch QA
memorai qa-batch \
    --questions_file questions.json \
    --data_path output/graph_db/session-001 \
    --output answers.json
```

---

## 📋 CLI Commands

### Pipeline commands

| Command | Description |
|---------|-------------|
| `memorai segment` | Segment conversations into turns |
| `memorai filter` | Filter important messages |
| `memorai triplets` | Extract knowledge triplets |
| `memorai entities` | Generate entity descriptions |
| `memorai summarize` | Summarize segments |
| `memorai graph` | Build knowledge graph |
| `memorai pipeline` | Run full pipeline end-to-end |

### Post-processing commands

| Command | Description |
|---------|-------------|
| `memorai segment-chunk-map` | Export segment → chunk mapping |
| `memorai consolidate-turns` | Deduplicate turn IDs |
| `memorai rebuild-graph` | Rebuild graph after consolidation |
| `memorai embed-turns` | Add turn embeddings |
| `memorai embed-entities` | Add entity embeddings |
| `memorai embed-triplets` | Add triplet embeddings |
| `memorai embed-summaries` | Add summary embeddings |

### QA commands

| Command | Description |
|---------|-------------|
| `memorai retrieve` | Retrieve relevant nodes from KG |
| `memorai qa` | Answer a single question |
| `memorai qa-batch` | Answer a batch of questions |

---

## 🏗️ Architecture

```
Conversation History
        │
        ▼
  ┌─────────────┐
  │  Segmenter  │  Split into semantic turns
  └──────┬──────┘
         │
  ┌──────▼──────┐
  │   Filter    │  Remove low-signal messages
  └──────┬──────┘
         │
  ┌──────▼──────────┐
  │ TripletExtractor│  Extract (entity, relation, entity)
  └──────┬──────────┘
         │
  ┌──────▼──────────┐
  │ EntityDescriptor│  Describe entities in context
  └──────┬──────────┘
         │
  ┌──────▼──────┐
  │ GraphBuilder│  Build Knowledge Graph in Neo4j
  └──────┬──────┘
         │
  ┌──────▼────────┐
  │ Neo4jRetriever│  Vector Search + Cypher Traversal
  └──────┬────────┘
         │
  ┌──────▼────────┐
  │ AnswerGenerator│  RAG over Neo4j context
  └───────────────┘
```

---

## 🔧 Development

```bash
# Install with dev extras
pip install -e ".[dev]"

# Run tests
pytest

# Build distribution
make build

# Check package
twine check dist/*

# Publish to PyPI
make publish
```

---

## 📤 Publish Guide (DIY)

Use this section when you want to publish manually.

### 1. Prepare account + API tokens

1. Create an account on PyPI and TestPyPI.
2. Create API token on TestPyPI (for dry-run upload).
3. Create API token on PyPI (real release).
4. Keep tokens in env vars (recommended):

```bash
export TWINE_USERNAME=__token__
export TWINE_PASSWORD=pypi-<your-token>
```

### 2. Build package artifacts

```bash
python -m pip install --upgrade build twine
rm -rf dist build *.egg-info
python -m build
python -m twine check dist/*
```

Expected artifacts:
- `dist/memorai-<version>.tar.gz`
- `dist/memorai-<version>-py3-none-any.whl`

### 3. Upload to TestPyPI first

```bash
export TWINE_USERNAME=__token__
export TWINE_PASSWORD=pypi-<your-testpypi-token>
python -m twine upload --repository testpypi dist/*
```

Install test package:

```bash
python -m pip install \
    --index-url https://test.pypi.org/simple/ \
    --extra-index-url https://pypi.org/simple \
    memorai
```

### 4. Publish to real PyPI

```bash
export TWINE_USERNAME=__token__
export TWINE_PASSWORD=pypi-<your-pypi-token>
python -m twine upload dist/*
```

### 5. Verify release

```bash
python -m pip install --upgrade memorai
python -c "import memorai; print(memorai.__version__)"
```

### Common issues

- `File already exists`: bump version in `pyproject.toml` and `memorai/__init__.py`, then rebuild.
- `403 invalid token`: ensure token scope matches target index (PyPI vs TestPyPI).
- Long README render errors: run `python -m twine check dist/*` before upload.

---

## 📄 License

MIT — see [LICENSE](LICENSE).
