# RAG Memory - AI Agent Usage Guide

## System Architecture
Dual-storage knowledge management:
- **Vector Store** (PostgreSQL+pgvector): Semantic search via embeddings
- **Knowledge Graph** (Neo4j+Graphiti): Entity relationships and temporal reasoning

**IMPORTANT:** Every ingestion writes to BOTH stores using AI models (embeddings + LLMs). Ingestion has cost. Queries are free (local database operations).

---

## 1. COLLECTION DISCIPLINE (CRITICAL)

**MUST review collections before ingesting ANY content:**

1. `list_collections()` - See available collections
2. `get_collection_info(name)` - Review purpose, domain, metadata schema
3. Choose collection matching content's domain/topic
4. If no good match: `create_collection()` with clear domain/purpose

**Why:** Collections partition BOTH vector search AND knowledge graph. Poor collection choices degrade knowledge quality and search relevance.

**Never:**
- Dump unrelated content into same collection
- Ignore collection descriptions when choosing where to ingest
- Create collections without clear, focused domain/purpose
- Ingest before reviewing what collections already exist

**Pattern:**
```
list_collections()
  → review purposes/domains
  → choose best fit OR create new
  → ingest_*(collection_name=chosen)
```

---

## 2. SEARCH: USE FULL QUESTIONS (NOT KEYWORDS)

**Semantic search matches MEANING, not exact words.**

✅ Good: "How do I configure authentication in the system?"
❌ Bad: "authentication configuration"

Applies to: `search_documents`, `query_relationships`, `query_temporal`

---

## 3. INGESTION WORKFLOWS

### Check Before Ingesting (Avoid Duplicates)
```
list_documents(collection_name, include_details=True)
  → Review titles/metadata
  → If exists unchanged: SKIP
  → If exists but updated: update_document()
  → If new: ingest_*()
```

**Why:** Avoids redundant processing and maintains clean knowledge base.

### Analyze Before Large Crawls
```
analyze_website(url, include_url_lists=True)
  → Review total_urls and site structure
  → If large scope (dozens/hundreds of pages):
      - Present to user: "This will ingest at least N pages"
      - Warn: "This operation will take significant time and incur costs"
      - Get user confirmation before proceeding
  → ingest_url(follow_links=True)
```

**Why:** Understanding scope before crawling enables informed decisions and efficient ingestion strategy.

**Note:** Sitemap shows minimum page count. Actual crawl with `follow_links` may discover additional pages.

### Use Recrawl for Website Updates
```
Instead of: delete_document() + ingest_url()
Use: ingest_url(url, mode="recrawl", ...)
```

**Why:** Safer, maintains metadata tracking, cleaner knowledge base.

---

## 4. RAG vs GRAPH USAGE

**Use `search_documents` for:**
- Content semantically similar to query
- "What does knowledge base say about X?"

**Use `query_relationships` for:**
- Connections between entities
- "What is related to X?" or "How are A and B connected?"

**Use `query_temporal` for:**
- How knowledge evolved over time
- "How has X changed since 2023?"

**Pro tip:** Combine both - graph for connections, RAG for detailed content.

---

## 5. EFFICIENCY & COST AWARENESS

**Ingestion operations use AI models and have cost:**
- Every `ingest_*` call processes content with embeddings + LLMs
- Writes to both vector store and knowledge graph
- Cost varies by document size and complexity

**Query operations are FREE** (local database):
- `search_documents`
- `query_relationships`
- `query_temporal`
- All list/view operations

**Best practices for efficiency:**
- Check for duplicates before ingesting (see #3)
- Use `update_document()` instead of re-ingesting
- Analyze large crawls before proceeding (see #3)
- Use recrawl mode for website updates (see #3)

---

## 6. COMMON PATTERNS

**Documentation Ingestion:**
```
1. analyze_website(url) - understand scope and structure
2. Review results, get user confirmation if large
3. create_collection(name, domain) - organize by source
4. ingest_url(url, follow_links=True, max_depth=2)
5. get_collection_info(name) - verify completion
```

**Research Query:**
```
1. search_documents(query, collection) - find relevant content
2. query_relationships(query, collection) - find connections
3. Synthesize findings from both sources
```

**Maintenance:**
```
1. list_documents(collection) - identify stale docs
2. update_document(id) - refresh changed content
   OR ingest_url(url, mode="recrawl") - refresh websites
```

---

## 7. KEY IMPERATIVES

- **MUST** review collections before ingesting (see #1)
- **MUST** use full questions for search, not keywords (see #2)
- **MUST** check for duplicates before ingesting (see #3)
- **MUST** analyze large websites before crawling (see #3)
- **SHOULD** present scope to user for large operations and get confirmation
- **SHOULD** use recrawl mode for website updates (see #3)
- **SHOULD** combine RAG + Graph queries for comprehensive research (see #4)

**For tool-specific details:** See individual tool docstrings.
