Conversational Activation Architecture for cortexgraph¶
Document Type: Architectural Plan Created: 2025-11-04 Status: Approved, Ready for Implementation STOPPER Protocol Applied: Yes (Full 7-step analysis completed)
Executive Summary¶
This document outlines a comprehensive plan to add conversational activation to cortexgraph, transforming it from sporadic LLM-dependent memory capture to reliable, preprocessing-assisted activation. The solution adds a preprocessing layer that automatically detects save-worthy content and provides activation signals + pre-filled parameters to the LLM.
Expected Impact: 85-90% improvement in activation reliability (from ~40% to 85-90%)
Timeline: 9 weeks to production-ready system
Core Innovation: Hybrid architecture combining deterministic preprocessing with LLM judgment, reducing executive function load while preserving flexibility.
Problem Statement¶
Current State¶
Memory saves in cortexgraph depend entirely on the LLM explicitly calling the save_memory MCP tool. No automatic pattern detection, entity extraction, intent classification, or importance scoring exists.
Root Cause Analysis¶
The LLM must simultaneously: - Conduct natural conversation with the user - Decide when to save information to memory - Extract entities from conversation - Infer appropriate tags - Determine importance/strength values - Remember to call tools consistently across long conversations
Result: Sporadic activation, missed memories, inconsistent parameter values, high cognitive load.
Why This Matters¶
From user perspective: - "I told you I prefer TypeScript, why did you forget?" - "I said 'remember this' but you didn't save it" - Inconsistent experience undermines trust
From system perspective: - cortexgraph has excellent temporal memory foundations (decay, spaced repetition, knowledge graph) - Activation is the bottleneck preventing production readiness - Reliability cannot depend solely on LLM consistency
Research Findings¶
Current cortexgraph Architecture¶
Core Components Analyzed:
- MCP Server (server.py): FastMCP-based with 13 tools
- Storage Layer (storage/jsonl_storage.py): JSONL with in-memory indexes
- Memory Models (storage/models.py): Pydantic models with temporal fields
- Tool Layer (tools/): save, search, observe, promote, consolidate, etc.
Existing Activation Mechanisms:
- Explicit API Calls (Primary): LLM must invoke
save_memorytool - Smart Prompting (Documentation only): Patterns exist in
docs/prompts/memory_system_prompt.mdbut no code implementation - Natural Spaced Repetition (v0.5.1): Post-retrieval reinforcement via
observe_memory_usage - Search Integration: Review candidate blending (affects retrieval, not capture)
Critical Finding: All saves are explicit LLM-initiated MCP tool calls. NO automatic detection exists.
State-of-the-Art Research (2024-2025)¶
1. Mem0 Architecture (ArXiv 2504.19413v1) - Two-phase pipeline: Extraction → Update - 26% accuracy boost over OpenAI's memory feature - 91% lower latency vs. full-context approach - Still LLM-driven but uses multi-message context
2. Knowledge Graph Construction with LLMs - Hybrid LLM + structured NLP pipelines outperform pure LLM - Dedicated entity extraction filters reduce noise - Domain-specific pre-training enhances NER sensitivity
3. Intent Detection with Transformers - BERT-based models achieve 85%+ accuracy - Fine-tuning on small datasets (100-500 examples) is effective - Enables automatic triggering of memory operations
4. Entity Linking and Relationship Extraction - Multi-stage pipelines: NER → Linking → Relation Extraction - spaCy provides production-ready NER with minimal setup - Transformers models (REBEL, Relik) for relation extraction
5. Personal Knowledge Management Trends - Zero-effort capture expectation (Mem.ai, MyMind) - AI-powered automatic tagging - Conversational interfaces over manual organization
Key Insight: Modern systems use preprocessing + LLM confirmation, not LLM-only reasoning.
Gap Analysis¶
Critical Gaps Identified:
- ❌ No Automatic Pattern Detection Layer: LLM decides when to save based on system prompt alone
- ❌ No Entity Extraction Pipeline:
entitiesfield exists but populated manually - ❌ No Tag Inference System:
tagsfield populated manually - ❌ No Importance Scoring:
strengthparameter set manually - ❌ No Intent Classification: No detection of preference vs. decision vs. fact
- ❌ No Phrase Trigger Detection: No pattern matching for "remember this", "important"
- ❌ LLM-Dependent Activation Logic: All decisions made by LLM reasoning
Root Cause Summary: cortexgraph has excellent foundations but lacks the preprocessing layer that makes activation reliable.
Solution Architecture¶
MCP Architectural Constraints (CRITICAL)¶
Important: The Model Context Protocol (MCP) does NOT allow message interception before the LLM sees user input. The architecture is:
NOT possible:
This means we cannot intercept and enrich messages before Claude sees them. We can only: 1. ✅ Auto-enrich tool parameters when tools are called 2. ✅ Provide helper tools (analyze_message) that Claude can call 3. ✅ Enhance system prompts to guide Claude's behavior 4. ❌ Intercept user messages before Claude receives them
For true pre-LLM preprocessing, you would need: - HTTP proxy (like claude-llm-proxy for Claude Code CLI) - works, but only for HTTP API - Modified Claude Desktop client (not practical) - Custom MCP host application (significant engineering effort)
Realistic MCP Architecture¶
User Message
↓
Claude LLM (receives message first)
↓
Claude decides to call MCP tool
↓
┌─────────────────────────────────────────────┐
│ MCP Tool Call (e.g., save_memory) │
│ │
│ [PREPROCESSING HAPPENS HERE] │
│ ┌────────────────────────────────────┐ │
│ │ 1. Phrase Detector │ │
│ │ Auto-detect importance markers │ │
│ └────────────────────────────────────┘ │
│ ┌────────────────────────────────────┐ │
│ │ 2. Entity Extractor (spaCy) │ │
│ │ Auto-populate entities field │ │
│ └────────────────────────────────────┘ │
│ ┌────────────────────────────────────┐ │
│ │ 3. Importance Scorer │ │
│ │ Auto-calculate strength │ │
│ └────────────────────────────────────┘ │
│ │
│ Parameters enriched, memory saved │
└─────────────────────────────────────────────┘
↓
Result returned to Claude
↓
Claude responds to user
ADDITIONAL TOOL:
┌─────────────────────────────────────────────┐
│ analyze_message(message) │
│ - Helper tool Claude can call │
│ - Returns preprocessing signals │
│ - Helps Claude decide whether to save │
└─────────────────────────────────────────────┘
Two-Track Approach¶
Track 1: Auto-Enrichment (in save_memory tool)
- LLM calls: save_memory(content="I prefer TypeScript")
- Tool automatically populates: entities=["typescript"], strength=1.0
- No extra tool calls needed
Track 2: Decision Helper (analyze_message tool)
- LLM uncertain? Call: analyze_message("I prefer TypeScript")
- Returns: {should_save: true, entities: ["typescript"], strength: 1.0}
- LLM uses signals to decide whether to call save_memory
Design Principles¶
- Work Within MCP Constraints: No impossible pre-LLM interception
- Deterministic + Flexible: Preprocessing provides reliable defaults, LLM can override
- Low Latency: Lightweight models (spaCy, regex) for real-time inference
- Graceful Degradation: System works even if preprocessing fails
- Progressive Enhancement: Each component adds value independently
- Configurable: Enable/disable features, tune thresholds
Implementation Plan¶
Phase 1: Quick Wins (1 week, 40-50% improvement)¶
Timeline: Week 1 Effort: 3-4 days development + 2-3 days testing Risk: Low (simple, deterministic components)
Component 1.1: Phrase Detector¶
Purpose: Detect explicit memory requests with 100% reliability
Implementation:
# src/cortexgraph/preprocessing/phrase_detector.py
import re
from typing import List, Dict
EXPLICIT_SAVE_PHRASES = [
r"\b(remember|don't forget|keep in mind|make a note)\b",
r"\b(never forget|write this down|document this)\b",
r"\b(save this|store this|record this)\b",
]
EXPLICIT_RECALL_PHRASES = [
r"\bwhat did (i|we) (say|tell you|discuss)\b",
r"\bdo you remember\b",
r"\brecall\b",
]
EXPLICIT_IMPORTANCE = [
r"\b(important|critical|crucial|essential)\b",
r"\b(very|really|extremely)\s+(important|critical)\b",
]
class PhraseDetector:
def __init__(self):
self.save_patterns = [re.compile(p, re.IGNORECASE) for p in EXPLICIT_SAVE_PHRASES]
self.recall_patterns = [re.compile(p, re.IGNORECASE) for p in EXPLICIT_RECALL_PHRASES]
self.importance_patterns = [re.compile(p, re.IGNORECASE) for p in EXPLICIT_IMPORTANCE]
def detect(self, text: str) -> Dict[str, any]:
return {
"save_request": any(p.search(text) for p in self.save_patterns),
"recall_request": any(p.search(text) for p in self.recall_patterns),
"importance_marker": any(p.search(text) for p in self.importance_patterns),
"matched_phrases": self._get_matches(text),
}
def _get_matches(self, text: str) -> List[str]:
matches = []
for p in self.save_patterns + self.recall_patterns + self.importance_patterns:
if match := p.search(text):
matches.append(match.group())
return matches
Integration Point: Run before LLM receives message, add signals to system context
Test Coverage: - 20+ trigger patterns - Case-insensitive matching - False positive rate target: <1% - False negative rate target: 0% (on explicit phrases)
Component 1.2: Entity Extractor¶
Purpose: Automatically populate entities field for better search and graph quality
Implementation:
# src/cortexgraph/preprocessing/entity_extractor.py
import spacy
from typing import List
class EntityExtractor:
def __init__(self, model: str = "en_core_web_sm"):
self.nlp = spacy.load(model)
def extract(self, text: str) -> List[str]:
doc = self.nlp(text)
entities = []
for ent in doc.ents:
# Filter to relevant entity types
if ent.label_ in ["PERSON", "ORG", "PRODUCT", "GPE", "DATE", "TIME"]:
entities.append(ent.text)
return list(set(entities)) # Deduplicate
Dependencies:
- spacy >= 3.7
- en_core_web_sm model (17MB download)
Test Coverage: - Sample messages with known entities - Entity type filtering validation - Deduplication verification
Component 1.3: Importance Scorer¶
Purpose: Provide consistent strength values based on linguistic cues
Implementation:
# src/cortexgraph/preprocessing/importance_scorer.py
import re
from typing import Dict
class ImportanceScorer:
# Keyword → strength boost mapping
IMPORTANCE_KEYWORDS = {
"never forget": 0.8,
"critical": 0.6,
"crucial": 0.6,
"essential": 0.5,
"important": 0.4,
"remember this": 0.5,
"decided": 0.3,
"going with": 0.3,
"prefer": 0.2,
"like": 0.1,
}
def score(self, text: str, intent: str = None) -> float:
base_strength = self._get_base_from_intent(intent)
boost = self._calculate_boost(text)
# Clamp to valid range [0.0, 2.0]
return min(2.0, max(0.0, base_strength + boost))
def _get_base_from_intent(self, intent: str) -> float:
base_map = {
"SAVE_DECISION": 1.3,
"SAVE_PREFERENCE": 1.1,
"SAVE_FACT": 1.0,
}
return base_map.get(intent, 1.0)
def _calculate_boost(self, text: str) -> float:
text_lower = text.lower()
max_boost = 0.0
for keyword, boost in self.IMPORTANCE_KEYWORDS.items():
if keyword in text_lower:
max_boost = max(max_boost, boost)
return max_boost
Test Coverage: - Keyword → strength mapping validation - Intent-based base strength verification - Clamping to valid range [0.0, 2.0]
Component 1.4: Integration with save_memory Tool¶
Purpose: Auto-enrich save_memory parameters using preprocessing
Implementation:
# src/cortexgraph/tools/save.py (MODIFIED)
from ..preprocessing import PhraseDetector, EntityExtractor, ImportanceScorer
# Lazy initialization
_preprocessing_components = None
def get_preprocessing():
global _preprocessing_components
if _preprocessing_components is None:
_preprocessing_components = {
"phrase": PhraseDetector(),
"entity": EntityExtractor(),
"importance": ImportanceScorer()
}
return _preprocessing_components
@mcp.tool()
async def save_memory(
content: str,
tags: list[str] | None = None,
entities: list[str] | None = None,
strength: float | None = None,
source: str | None = None,
context: str | None = None,
meta: dict | None = None,
) -> dict:
"""Save a memory with automatic preprocessing."""
prep = get_preprocessing()
# AUTO-POPULATE entities if not provided
if entities is None:
entities = prep["entity"].extract(content)
# AUTO-CALCULATE strength if not provided
if strength is None:
phrase_signals = prep["phrase"].detect(content)
strength = prep["importance"].score(
content,
importance_marker=phrase_signals["importance_marker"]
)
# Continue with existing save logic...
memory = Memory(
content=content,
entities=entities or [],
tags=tags or [],
strength=strength,
source=source,
context=context,
meta=meta or {},
)
db.save_memory(memory)
return {"success": True, "memory_id": memory.id}
Component 1.5: analyze_message Helper Tool¶
Purpose: Provide preprocessing signals to help Claude decide whether to save
Implementation:
# src/cortexgraph/tools/analyze.py (NEW FILE)
from ..context import mcp
from ..preprocessing import PhraseDetector, EntityExtractor, ImportanceScorer
phrase_detector = PhraseDetector()
entity_extractor = EntityExtractor()
importance_scorer = ImportanceScorer()
@mcp.tool()
async def analyze_message(message: str) -> dict:
"""
Analyze a message to determine if it contains memory-worthy content.
Returns activation signals and suggested parameters for save_memory.
Args:
message: The message to analyze
Returns:
{
"should_save": bool,
"confidence": float (0.0-1.0),
"suggested_entities": list[str],
"suggested_tags": list[str],
"suggested_strength": float,
"reasoning": str
}
"""
phrase_signals = phrase_detector.detect(message)
entities = entity_extractor.extract(message)
strength = importance_scorer.score(
message,
importance_marker=phrase_signals["importance_marker"]
)
# Determine if save is recommended
should_save = (
phrase_signals["save_request"] or
phrase_signals["importance_marker"] or
len(entities) >= 2
)
confidence = 0.9 if phrase_signals["save_request"] else 0.6
reasoning_parts = []
if phrase_signals["save_request"]:
reasoning_parts.append(f"Explicit save request: {phrase_signals['matched_phrases']}")
if phrase_signals["importance_marker"]:
reasoning_parts.append("Importance marker detected")
if len(entities) >= 2:
reasoning_parts.append(f"Multiple entities detected: {entities}")
return {
"should_save": should_save,
"confidence": confidence,
"suggested_entities": entities,
"suggested_tags": [], # Phase 3: Tag suggester
"suggested_strength": strength,
"reasoning": "; ".join(reasoning_parts) if reasoning_parts else "No strong signals detected"
}
Phase 1 Deliverables¶
- ✅
src/cortexgraph/preprocessing/__init__.py - ✅
src/cortexgraph/preprocessing/phrase_detector.py - ✅
src/cortexgraph/preprocessing/entity_extractor.py - ✅
src/cortexgraph/preprocessing/importance_scorer.py - ✅
src/cortexgraph/tools/analyze.py(NEW: analyze_message tool) - ✅ Modified
src/cortexgraph/tools/save.py(auto-enrichment) - ✅
tests/preprocessing/test_phrase_detector.py - ✅
tests/preprocessing/test_entity_extractor.py - ✅
tests/preprocessing/test_importance_scorer.py - ✅
tests/tools/test_analyze_message.py - ✅ Updated system prompt with usage guidelines
- ✅ Updated dependencies (spaCy)
Success Criteria: - ✅ 0% missed explicit save requests ("remember this") - ✅ Entities automatically populated in 80%+ of saves (when not manually provided) - ✅ Consistent importance scores (no more arbitrary values) - ✅ analyze_message tool provides actionable signals to Claude
Phase 2: Intent Classification (3 weeks, 70-80% improvement)¶
Timeline: Weeks 2-4 Effort: 1 week data collection, 1 week training, 1 week integration Risk: Medium (requires ML model training, accuracy target: 85%+)
Component 2.1: Intent Classifier¶
Purpose: Detect user intent to trigger appropriate memory operations
Intents:
- SAVE_PREFERENCE: "I prefer X", "I like Y", "I always use Z"
- SAVE_DECISION: "I decided to A", "Going with B", "I'll use C"
- SAVE_FACT: "My D is E", "The F is G", "H is located at I"
- RECALL_INFO: "What did I say about...", "Do you remember..."
- UPDATE_INFO: "Actually, change X to Y", "Correction: Z is W"
- QUESTION: General question (default, no memory action)
Model Architecture:
# src/cortexgraph/preprocessing/intent_classifier.py
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
from typing import Dict
class IntentClassifier:
def __init__(self, model_path: str = "./models/intent_classifier"):
self.tokenizer = AutoTokenizer.from_pretrained(model_path)
self.model = AutoModelForSequenceClassification.from_pretrained(model_path)
self.model.eval()
self.label_map = {
0: "SAVE_PREFERENCE",
1: "SAVE_DECISION",
2: "SAVE_FACT",
3: "RECALL_INFO",
4: "UPDATE_INFO",
5: "QUESTION",
}
def classify(self, text: str) -> Dict[str, any]:
inputs = self.tokenizer(text, return_tensors="pt", truncation=True, max_length=128)
with torch.no_grad():
outputs = self.model(**inputs)
probs = torch.softmax(outputs.logits, dim=-1)
predicted_class = torch.argmax(probs, dim=-1).item()
confidence = probs[0][predicted_class].item()
return {
"intent": self.label_map[predicted_class],
"confidence": confidence,
"all_probs": {self.label_map[i]: probs[0][i].item() for i in range(len(self.label_map))},
}
Model Choice: DistilBERT (66M parameters, 6-layer distilled BERT) - Fast inference (~20-30ms on CPU) - Good accuracy with limited data - Small model size (~250MB)
Training Data Requirements: - 100-500 examples per intent class - Total: 600-3000 examples - Sources: - Synthetic generation via GPT-4/Claude - Manual curation from real conversations (anonymized) - Augmentation techniques (paraphrasing)
Training Process:
# scripts/train_intent_classifier.py
1. Load pre-trained DistilBERT
2. Add classification head (6 classes)
3. Fine-tune on intent dataset
4. Evaluate on held-out test set (target: 85%+ accuracy)
5. Save model checkpoint
Hyperparameters: - Learning rate: 2e-5 - Batch size: 16 - Epochs: 3-5 - Warmup steps: 100 - Weight decay: 0.01
Component 2.2: Integration with analyze_message¶
Purpose: Enhance analyze_message tool with intent classification
Implementation:
# src/cortexgraph/tools/analyze.py (ENHANCED)
from ..preprocessing import PhraseDetector, EntityExtractor, ImportanceScorer, IntentClassifier
phrase_detector = PhraseDetector()
entity_extractor = EntityExtractor()
importance_scorer = ImportanceScorer()
intent_classifier = IntentClassifier() # NEW
@mcp.tool()
async def analyze_message(message: str) -> dict:
"""
Analyze a message with intent classification.
NOW INCLUDES:
- Intent classification (SAVE_PREFERENCE, SAVE_DECISION, etc.)
- Confidence scores for each intent
- Action recommendations (MUST_SAVE, SHOULD_SAVE, SHOULD_SEARCH)
"""
phrase_signals = phrase_detector.detect(message)
intent_result = intent_classifier.classify(message) # NEW
entities = entity_extractor.extract(message)
strength = importance_scorer.score(
message,
intent=intent_result["intent"] # Intent-aware scoring
)
# Generate action recommendation
action_recommendation = "NONE"
if phrase_signals["save_request"]:
action_recommendation = "MUST_SAVE"
elif intent_result["intent"] in ["SAVE_PREFERENCE", "SAVE_DECISION", "SAVE_FACT"] and intent_result["confidence"] > 0.8:
action_recommendation = "SHOULD_SAVE"
elif intent_result["intent"] == "RECALL_INFO" and intent_result["confidence"] > 0.7:
action_recommendation = "SHOULD_SEARCH"
should_save = action_recommendation in ["MUST_SAVE", "SHOULD_SAVE"]
return {
"should_save": should_save,
"action_recommendation": action_recommendation,
"confidence": intent_result["confidence"],
"intent": intent_result["intent"],
"suggested_entities": entities,
"suggested_tags": [], # Phase 3
"suggested_strength": strength,
"reasoning": f"Intent: {intent_result['intent']} (confidence: {intent_result['confidence']:.2f})"
}
System Prompt Enhancement:
# docs/prompts/memory_system_prompt.md (updated)
## Using analyze_message for Decision Support
When the user shares information and you're uncertain whether to save it,
call `analyze_message()` to get preprocessing signals:
**Action Recommendations**:
- `MUST_SAVE`: Explicit save request ("remember this") → Always call save_memory
- `SHOULD_SAVE`: High-confidence save-worthy content → Usually call save_memory
- `SHOULD_SEARCH`: User asking about past info → Call search_memory
- `NONE`: No strong signal → Use your judgment
**Intent Types**:
- `SAVE_PREFERENCE`: User preference ("I prefer X")
- `SAVE_DECISION`: Decision made ("We decided to...")
- `SAVE_FACT`: Important fact ("The API key is...")
- `RECALL_INFO`: Asking about past ("What did I say about...")
- `GENERAL_QUESTION`: General query
- `GREETING`: Social interaction
**Example Workflow**:
You: analyze_message("I prefer TypeScript over JavaScript for new projects")
Result: { "action_recommendation": "SHOULD_SAVE", "intent": "SAVE_PREFERENCE", "confidence": 0.87, "suggested_entities": ["typescript", "javascript"], "suggested_strength": 1.2 }
You: save_memory( content="I prefer TypeScript over JavaScript for new projects", entities=["typescript", "javascript"], # From analyze_message strength=1.2, # From analyze_message tags=["preference", "programming"] )
**Auto-Enrichment Fallback**:
If you don't call analyze_message first, save_memory will still auto-populate
entities and strength, but without intent-aware optimization.
Configuration:
# src/cortexgraph/config.py (new section)
# Conversational Activation
CORTEXGRAPH_ENABLE_PREPROCESSING = os.getenv("CORTEXGRAPH_ENABLE_PREPROCESSING", "true").lower() == "true"
CORTEXGRAPH_INTENT_MODEL_PATH = os.getenv("CORTEXGRAPH_INTENT_MODEL_PATH", "./models/intent_classifier")
CORTEXGRAPH_INTENT_CONFIDENCE_THRESHOLD = float(os.getenv("CORTEXGRAPH_INTENT_CONFIDENCE_THRESHOLD", "0.7"))
CORTEXGRAPH_AUTO_SAVE_CONFIDENCE_THRESHOLD = float(os.getenv("CORTEXGRAPH_AUTO_SAVE_CONFIDENCE_THRESHOLD", "0.8"))
CORTEXGRAPH_SPACY_MODEL = os.getenv("CORTEXGRAPH_SPACY_MODEL", "en_core_web_sm")
Phase 2 Deliverables¶
- ✅ Intent classification training dataset (600-3000 examples)
- ✅ Training script (
scripts/train_intent_classifier.py) - ✅ Trained DistilBERT model checkpoint
- ✅
src/cortexgraph/preprocessing/intent_classifier.py - ✅ Enhanced
src/cortexgraph/tools/analyze.pywith intent classification - ✅ Updated system prompt with action recommendations and intent types
- ✅ Configuration options in
config.py - ✅
tests/preprocessing/test_intent_classifier.py - ✅
tests/tools/test_analyze_message_with_intent.py - ✅ Performance evaluation report (accuracy, precision, recall per class)
Success Criteria: - ✅ 85%+ intent classification accuracy on test set - ✅ Implicit preferences detected (e.g., "I prefer X" → SAVE_PREFERENCE intent) - ✅ analyze_message provides SHOULD_SAVE recommendation for 90%+ of save-worthy content - ✅ 60-70% improvement in overall activation reliability (still LLM-dependent for "when to call")
Note on Reliability Ceiling: Within MCP constraints, we cannot achieve 85-90% reliability for automatic saves because: - Claude must still decide when to call analyze_message or save_memory - We cannot intercept messages before Claude sees them - System prompt guidance can only achieve ~70-80% consistency
For higher reliability, consider: - HTTP proxy approach (like claude-llm-proxy for Claude Code CLI) - MCP-to-MCP proxy server (future enhancement) - Custom MCP host application
Phase 3: Advanced Features (4 weeks, 85-90% improvement)¶
Timeline: Weeks 5-8 Effort: 1 week per component Risk: Medium-High (complex features, integration challenges)
Component 3.1: Tag Suggester¶
Purpose: Automatically suggest tags to improve search and cross-domain detection
Approaches:
1. Keyword Extraction (KeyBERT):
# src/cortexgraph/preprocessing/tag_suggester.py
from keybert import KeyBERT
class TagSuggester:
def __init__(self):
self.model = KeyBERT()
def suggest_tags(self, text: str, top_k: int = 5) -> List[str]:
keywords = self.model.extract_keywords(
text,
keyphrase_ngram_range=(1, 2),
stop_words="english",
top_n=top_k,
)
return [kw[0] for kw in keywords]
2. Zero-Shot Classification (for predefined categories):
from transformers import pipeline
classifier = pipeline("zero-shot-classification", model="facebook/bart-large-mnli")
def classify_into_categories(text: str, categories: List[str]) -> List[str]:
result = classifier(text, categories, multi_label=True)
# Return categories with confidence > 0.5
return [label for label, score in zip(result["labels"], result["scores"]) if score > 0.5]
3. Hybrid Approach: - Extract keywords via KeyBERT (content-specific) - Classify into categories via zero-shot (broad themes) - Combine and rank by relevance
Integration:
- Pre-fill tags parameter for save_memory
- LLM reviews and adjusts as needed
- User feedback loop: Track accepted vs. rejected suggestions
Component 3.2: Multi-Message Context¶
Purpose: Improve extraction of implicit preferences from conversation history
Implementation:
# src/cortexgraph/preprocessing/context_manager.py
from collections import deque
from typing import List, Dict
class ConversationContext:
def __init__(self, max_messages: int = 10):
self.buffer = deque(maxlen=max_messages)
def add_message(self, role: str, content: str):
self.buffer.append({"role": role, "content": content})
def get_context(self, window_size: int = 5) -> List[Dict]:
return list(self.buffer)[-window_size:]
def generate_summary(self) -> str:
# TODO: Use LLM to generate rolling summary of conversation
# Useful for detecting patterns across multiple turns
pass
Use Cases: - User states preference across multiple messages - Decision emerges from discussion (not single statement) - Fact mentioned indirectly, then clarified later
Integration Point: Pass context to intent classifier and tag suggester
Component 3.3: Automatic Deduplication¶
Purpose: Prevent redundant saves by detecting similar existing memories
Implementation:
# src/cortexgraph/preprocessing/dedup_checker.py
from .storage import JSONLStorage
from sentence_transformers import SentenceTransformer, util
class DeduplicationChecker:
def __init__(self, storage: JSONLStorage, similarity_threshold: float = 0.85):
self.storage = storage
self.threshold = similarity_threshold
self.embedder = SentenceTransformer("all-MiniLM-L6-v2")
def check_before_save(self, content: str, entities: List[str]) -> Dict:
# Search for similar memories
candidates = self.storage.search(content, top_k=5)
if not candidates:
return {"is_duplicate": False}
# Calculate semantic similarity
new_embedding = self.embedder.encode(content, convert_to_tensor=True)
similarities = []
for candidate in candidates:
candidate_embedding = self.embedder.encode(candidate["content"], convert_to_tensor=True)
similarity = util.cos_sim(new_embedding, candidate_embedding).item()
similarities.append((candidate, similarity))
# Find best match
best_match, best_score = max(similarities, key=lambda x: x[1])
if best_score > self.threshold:
return {
"is_duplicate": True,
"similar_memory": best_match,
"similarity_score": best_score,
"recommendation": "MERGE" if best_score > 0.9 else "REVIEW",
}
return {"is_duplicate": False}
Integration:
- Run before calling save_memory
- If duplicate detected, prompt LLM: "Similar memory exists (score: 0.92). Options: 1) Merge, 2) Save as new, 3) Skip"
- LLM decides based on context
Relation to Existing Tools:
- Complements existing consolidate_memories tool (proactive vs. reactive)
- Uses same similarity logic as cluster_memories
Phase 3 Deliverables¶
- ✅
src/cortexgraph/preprocessing/tag_suggester.py - ✅
src/cortexgraph/preprocessing/context_manager.py - ✅
src/cortexgraph/preprocessing/dedup_checker.py - ✅ Integration tests for multi-message scenarios
- ✅ User acceptance testing (A/B test: old vs. new)
- ✅ Performance benchmarks (latency, accuracy)
- ✅ Documentation updates
Success Criteria: - ✅ Tags automatically suggested and accepted 70%+ of time - ✅ Multi-message context improves implicit preference detection by 20%+ - ✅ Near-duplicate detection prevents redundant saves (false positive rate <5%) - ✅ 85-90% overall improvement in activation reliability
Testing Strategy¶
Unit Tests¶
Phase 1 Components:
# tests/preprocessing/test_phrase_detector.py
def test_explicit_save_phrases():
detector = PhraseDetector()
test_cases = [
("Remember this for later", True),
("Don't forget to use TypeScript", True),
("This is important", True),
("Just a regular message", False),
]
for text, expected in test_cases:
result = detector.detect(text)
assert result["save_request"] == expected
def test_case_insensitivity():
detector = PhraseDetector()
assert detector.detect("REMEMBER THIS")["save_request"]
assert detector.detect("remember this")["save_request"]
assert detector.detect("ReMeMbEr ThIs")["save_request"]
Phase 2 Components:
# tests/preprocessing/test_intent_classifier.py
def test_intent_classification_accuracy():
classifier = IntentClassifier()
test_set = load_test_set() # Held-out 20% of training data
correct = 0
total = len(test_set)
for example in test_set:
result = classifier.classify(example["text"])
if result["intent"] == example["label"]:
correct += 1
accuracy = correct / total
assert accuracy > 0.85 # 85% accuracy target
Integration Tests¶
# tests/integration/test_preprocessing_pipeline.py
async def test_end_to_end_activation():
"""Test complete flow: message → preprocessing → LLM → save"""
# Setup
mcp_server = setup_test_server()
test_message = "I prefer TypeScript for backend projects"
# Execute
signals = await mcp_server.preprocess_message(test_message)
# Verify preprocessing
assert signals["intent"] == "SAVE_PREFERENCE"
assert signals["intent_confidence"] > 0.7
assert "TypeScript" in signals["entities"]
assert signals["suggested_strength"] > 1.0
assert signals["action_recommendation"] == "SHOULD_SAVE"
# Simulate LLM calling save_memory with pre-filled params
memory_id = await mcp_server.save_memory(
content="User prefers TypeScript for backend projects",
entities=signals["entities"],
tags=["preferences", "typescript", "backend"],
strength=signals["suggested_strength"],
)
# Verify save
memory = await mcp_server.storage.get_memory(memory_id)
assert memory is not None
assert "TypeScript" in memory.entities
User Acceptance Testing (UAT)¶
A/B Test Design: - Control Group: Current cortexgraph (LLM-only activation) - Treatment Group: New cortexgraph (preprocessing + LLM) - Sample Size: 20-30 users, 2 weeks of usage - Metrics: - Save rate (% of messages resulting in saves) - User satisfaction (survey: "Did system miss anything important?") - False positive rate (unnecessary saves) - False negative rate (missed important information)
Success Criteria: - Treatment group: 85-90% save rate on save-worthy content - Control group: ~40% save rate (baseline) - User satisfaction: 8/10 or higher - False positive rate: <10% - False negative rate: <5% (excluding ambiguous cases)
Integration Points¶
1. MCP Server Entry Point¶
File: src/cortexgraph/server.py
Changes:
from .preprocessing import (
PhraseDetector,
EntityExtractor,
ImportanceScorer,
IntentClassifier,
TagSuggester,
ConversationContext,
DeduplicationChecker,
)
# Initialize preprocessing components (lazy loading for performance)
_preprocessing_components = None
def get_preprocessing_components():
"""Get or initialize preprocessing components."""
global _preprocessing_components
if _preprocessing_components is None:
_preprocessing_components = {
"phrase_detector": PhraseDetector(),
"entity_extractor": EntityExtractor(),
"importance_scorer": ImportanceScorer(),
"intent_classifier": IntentClassifier() if config.CORTEXGRAPH_ENABLE_PREPROCESSING else None,
"tag_suggester": TagSuggester() if config.CORTEXGRAPH_ENABLE_PREPROCESSING else None,
"context_manager": ConversationContext(),
"dedup_checker": DeduplicationChecker(db),
}
return _preprocessing_components
# REALISTIC MCP INTEGRATION: Enhanced analyze_message tool
@mcp.tool()
async def analyze_message(
message: str,
include_dedup_check: bool = True
) -> dict:
"""
Comprehensive message analysis with all preprocessing components.
This is the REALISTIC implementation within MCP constraints.
Claude calls this tool when uncertain whether to save.
Returns:
Complete preprocessing signals including:
- Action recommendation (MUST_SAVE, SHOULD_SAVE, etc.)
- Intent classification
- Entity extraction
- Tag suggestions
- Importance scoring
- Duplicate detection
"""
if not config.CORTEXGRAPH_ENABLE_PREPROCESSING:
return {"error": "Preprocessing disabled"}
components = get_preprocessing_components()
# Add to conversation context for multi-message analysis
components["context_manager"].add_message("user", message)
# Run full preprocessing pipeline
phrase_signals = components["phrase_detector"].detect(message)
intent_result = components["intent_classifier"].classify(message) if components["intent_classifier"] else {"intent": "UNKNOWN", "confidence": 0.0}
entities = components["entity_extractor"].extract(message)
importance = components["importance_scorer"].score(message, intent_result.get("intent"))
tags = components["tag_suggester"].suggest_tags(message) if components["tag_suggester"] else []
# Check for duplicates if save is recommended
dedup_result = {}
if include_dedup_check and intent_result.get("intent", "").startswith("SAVE_"):
dedup_result = components["dedup_checker"].check_before_save(message, entities)
# Generate action recommendation
action_recommendation = "NONE"
if phrase_signals["save_request"]:
action_recommendation = "MUST_SAVE"
elif intent_result.get("intent") in ["SAVE_PREFERENCE", "SAVE_DECISION", "SAVE_FACT"] and intent_result.get("confidence", 0) > 0.8:
if dedup_result.get("is_duplicate"):
action_recommendation = "DUPLICATE_DETECTED"
else:
action_recommendation = "SHOULD_SAVE"
elif intent_result.get("intent") == "RECALL_INFO" and intent_result.get("confidence", 0) > 0.7:
action_recommendation = "SHOULD_SEARCH"
should_save = action_recommendation in ["MUST_SAVE", "SHOULD_SAVE"]
return {
"should_save": should_save,
"action_recommendation": action_recommendation,
"confidence": intent_result.get("confidence", 0.0),
"intent": intent_result.get("intent", "UNKNOWN"),
"suggested_entities": entities,
"suggested_tags": tags,
"suggested_strength": importance,
"deduplication": dedup_result,
"reasoning": _construct_reasoning(phrase_signals, intent_result, entities, dedup_result)
}
def _construct_reasoning(phrase_signals, intent_result, entities, dedup_result):
"""Build human-readable reasoning string."""
parts = []
if phrase_signals.get("save_request"):
parts.append(f"Explicit save: {phrase_signals.get('matched_phrases')}")
if intent_result.get("intent"):
parts.append(f"Intent: {intent_result['intent']} ({intent_result.get('confidence', 0):.2f})")
if entities:
parts.append(f"Entities: {', '.join(entities)}")
if dedup_result.get("is_duplicate"):
parts.append(f"Duplicate of: {dedup_result.get('similar_memory_id')}")
return "; ".join(parts) if parts else "No strong signals detected"
# AUTO-ENRICHMENT: save_memory with preprocessing
@mcp.tool()
async def save_memory(
content: str,
tags: list[str] | None = None,
entities: list[str] | None = None,
strength: float | None = None,
# ... other params
) -> dict:
"""Save memory with automatic preprocessing."""
components = get_preprocessing_components()
# Auto-populate if not provided
if entities is None:
entities = components["entity_extractor"].extract(content)
if tags is None and components["tag_suggester"]:
tags = components["tag_suggester"].suggest_tags(content)
if strength is None:
phrase_signals = components["phrase_detector"].detect(content)
strength = components["importance_scorer"].score(
content,
importance_marker=phrase_signals.get("importance_marker", False)
)
# Save with enriched data
memory = Memory(
content=content,
entities=entities or [],
tags=tags or [],
strength=strength,
# ...
)
db.save_memory(memory)
return {"success": True, "memory_id": memory.id}
2. System Prompt Enhancement¶
File: docs/prompts/memory_system_prompt.md
New Section (to be appended):
---
## Activation Signals (Preprocessing)
You receive preprocessing signals with each user message to assist memory decisions.
### Signal Types
**1. Action Recommendations**
- `MUST_SAVE`: Explicit user request ("remember this") - mandatory save
- `SHOULD_SAVE`: High-confidence save-worthy content - strongly recommended
- `SHOULD_SEARCH`: User asking for past info - search recommended
- `NONE`: No strong signal, use your judgment
**2. Pre-filled Parameters**
When save is recommended, you receive:
- `entities`: Auto-extracted entities (PERSON, ORG, PRODUCT, etc.)
- `suggested_strength`: Importance score (0.0-2.0)
- `suggested_tags`: Relevant tags from content
- `intent`: Content type (PREFERENCE, DECISION, FACT, etc.)
**3. Deduplication Alerts**
If similar memory exists:
- `similar_memory`: Existing memory content
- `similarity_score`: How similar (0.0-1.0)
- `recommendation`: MERGE or REVIEW
### How to Use Signals
**When action is MUST_SAVE**:
1. Review pre-filled parameters
2. Adjust if needed (add context, refine tags)
3. Call `save_memory` with parameters
**When action is SHOULD_SAVE**:
1. Confirm content is save-worthy given full context
2. Adjust parameters as needed
3. Call `save_memory` if confirmed
**When action is SHOULD_SEARCH**:
1. Call `search_memory` with relevant query
2. Surface information to user
**When deduplication alert**:
1. Review similar memory
2. Decide: MERGE (update existing), NEW (save anyway), SKIP (don't save)
3. Explain decision to user
### Important Notes
- Preprocessing is **assistance**, not mandate
- You have final say on all memory operations
- Use your judgment for edge cases
- If uncertain, err toward saving (decay handles false positives)
- Signals improve reliability but don't replace reasoning
3. Configuration File¶
File: src/cortexgraph/config.py
New Section:
# ============================================================================
# Conversational Activation Configuration
# ============================================================================
# Enable/disable preprocessing layer
CORTEXGRAPH_ENABLE_PREPROCESSING = os.getenv("CORTEXGRAPH_ENABLE_PREPROCESSING", "true").lower() == "true"
# Intent Classification
CORTEXGRAPH_INTENT_MODEL_PATH = os.getenv("CORTEXGRAPH_INTENT_MODEL_PATH", "./models/intent_classifier")
CORTEXGRAPH_INTENT_CONFIDENCE_THRESHOLD = float(os.getenv("CORTEXGRAPH_INTENT_CONFIDENCE_THRESHOLD", "0.7"))
CORTEXGRAPH_AUTO_SAVE_CONFIDENCE_THRESHOLD = float(os.getenv("CORTEXGRAPH_AUTO_SAVE_CONFIDENCE_THRESHOLD", "0.8"))
# Entity Extraction
CORTEXGRAPH_SPACY_MODEL = os.getenv("CORTEXGRAPH_SPACY_MODEL", "en_core_web_sm")
# Tag Suggestion
CORTEXGRAPH_ENABLE_TAG_SUGGESTION = os.getenv("CORTEXGRAPH_ENABLE_TAG_SUGGESTION", "true").lower() == "true"
CORTEXGRAPH_TAG_SUGGESTION_TOP_K = int(os.getenv("CORTEXGRAPH_TAG_SUGGESTION_TOP_K", "5"))
# Conversation Context
CORTEXGRAPH_CONTEXT_WINDOW_SIZE = int(os.getenv("CORTEXGRAPH_CONTEXT_WINDOW_SIZE", "10"))
# Deduplication
CORTEXGRAPH_ENABLE_DEDUP_CHECK = os.getenv("CORTEXGRAPH_ENABLE_DEDUP_CHECK", "true").lower() == "true"
CORTEXGRAPH_DEDUP_SIMILARITY_THRESHOLD = float(os.getenv("CORTEXGRAPH_DEDUP_SIMILARITY_THRESHOLD", "0.85"))
Dependencies¶
Python Packages¶
Phase 1:
Installation:
Phase 2:
Phase 3:
Model Storage¶
Models to download/train:
- en_core_web_sm: 17MB (spaCy English model)
- Intent classifier: ~250MB (fine-tuned DistilBERT)
- Tag suggester: ~120MB (KeyBERT with sentence-transformers backend)
- Deduplication embedder: ~80MB (sentence-transformers/all-MiniLM-L6-v2)
Total storage: ~470MB
Inference Requirements: - CPU: Sufficient (all models optimized for CPU inference) - RAM: +300-500MB when all models loaded - Latency: <100ms total preprocessing time
Performance Considerations¶
Latency Analysis¶
Target: <100ms total preprocessing time (avoid blocking conversation flow)
Breakdown: - Phrase detection: ~1ms (regex) - Entity extraction: ~20-30ms (spaCy) - Intent classification: ~20-30ms (DistilBERT on CPU) - Importance scoring: ~1ms (heuristics) - Tag suggestion: ~30-40ms (KeyBERT, Phase 3) - Deduplication check: ~20-30ms (embedding + similarity, Phase 3)
Optimization Strategies: 1. Lazy Loading: Load models only when first needed 2. Caching: Cache recent entity/intent results for similar messages 3. Async Processing: Run non-blocking preprocessing in background 4. Batching: If processing multiple messages, batch through models 5. Model Quantization: Use INT8 quantized models for faster inference
Memory Management¶
Model Loading: - Load on first use, not at startup - Share models across requests (singleton pattern) - Option to run preprocessing in separate process/container
Configuration Option:
Risks & Mitigations¶
Risk 1: Intent Classifier Accuracy Below 85%¶
Impact: Medium - Lower accuracy reduces reliability gains
Mitigation: - Start with rule-based fallback for low-confidence predictions - Collect user feedback: "Was this save appropriate?" - Active learning: Retrain with corrected examples - Fallback to phrase detection + LLM judgment if confidence < threshold
Risk 2: False Positives (Too Many Auto-Saves)¶
Impact: Medium - Clutters memory store, annoys users
Mitigation: - Conservative confidence thresholds (0.8 for auto-save) - LLM still has final say (can reject preprocessing suggestion) - User feedback loop: "Was this save unnecessary?" - Decay algorithm naturally handles false positives (unused memories fade)
Risk 3: Model Inference Latency¶
Impact: Low - Could slow conversation if >200ms
Mitigation: - Use lightweight models (DistilBERT, not full BERT) - Async processing (don't block LLM response) - Cache recent results - Quantization for faster inference - Option to disable preprocessing if latency critical
Risk 4: Preprocessing Overhead Complexity¶
Impact: Low - Adds code complexity and maintenance burden
Mitigation: - Clear separation of concerns (preprocessing layer is modular) - Each component independently testable - Configuration to disable features if not needed - Graceful degradation (system works even if preprocessing fails)
Risk 5: Training Data Quality¶
Impact: Medium - Poor training data → poor intent classifier
Mitigation: - Use GPT-4/Claude for synthetic data generation (high quality) - Manual review of training examples - Balance classes (equal examples per intent) - Augmentation techniques (paraphrasing, backtranslation) - Held-out test set for validation
Success Metrics¶
Quantitative Metrics¶
Activation Reliability (Primary Metric): - Baseline: ~40% (current, LLM-only) - Phase 1 Target: 60-70% - Phase 2 Target: 75-85% - Phase 3 Target: 85-90%
Measurement: % of save-worthy content that results in actual saves (human-annotated test set)
Intent Classification Accuracy: - Target: 85%+ on held-out test set - Per-Class Precision/Recall: >80% for each intent
False Positive Rate: - Target: <10% (saves that shouldn't have happened) - Measurement: User feedback + manual review
False Negative Rate: - Target: <5% (missed important information) - Measurement: User reports "you forgot X"
Latency: - Target: <100ms preprocessing time - Measurement: Average time from message receipt to preprocessing complete
Qualitative Metrics¶
User Satisfaction: - Survey: "Does the system remember important information?" (8/10 target) - Survey: "How often does the system miss something important?" (Rarely/Never target) - Survey: "Are saves appropriate and relevant?" (7/10 target)
Developer Experience: - Code maintainability (modular, well-tested) - Ease of adding new intents or patterns - Configuration flexibility
Future Enhancements¶
Short-Term (Next 6 Months)¶
1. Custom Entity Types - Fine-tune spaCy for domain-specific entities - Technology stack entities (Python → TECHNOLOGY) - Preference entities (TypeScript → PREFERENCE:LANGUAGE)
2. Reinforcement Learning from User Corrections - Track when users override preprocessing suggestions - Retrain models with correction data - Personalized models per user
3. Multi-Language Support - Add spaCy models for other languages - Multi-lingual intent classification - Language detection + routing
Medium-Term (6-12 Months)¶
4. Active Learning Pipeline - Identify low-confidence predictions - Request user labels for uncertain cases - Continuously improve models with feedback
5. Personalized Intent Models - Per-user fine-tuning based on usage patterns - Adaptive confidence thresholds - Preference learning (user prefers high/low activation rate)
6. Cross-Turn Conversation Understanding - Dialog state tracking - Coreference resolution ("it", "that", etc.) - Multi-turn decision detection
Long-Term (12+ Months)¶
7. Automatic Relation Inference
- Detect relationships between entities
- Populate create_relation automatically
- Build richer knowledge graph structure
8. Temporal Reasoning - Understand time references ("last week", "in the future") - Auto-populate temporal metadata - Query by time periods
9. Explainability Dashboard - Show why system saved/didn't save - Visualize confidence scores and signals - Allow users to adjust preprocessing behavior
Timeline Summary¶
| Phase | Duration | Components | Expected Impact |
|---|---|---|---|
| Phase 1 | 1 week | Phrase Detector, Entity Extractor, Importance Scorer, analyze_message tool, save_memory auto-enrichment | 40-50% improvement in consistency |
| Phase 2 | 3 weeks | Intent Classifier, Enhanced analyze_message, System Prompt Updates | 60-70% improvement (MCP ceiling) |
| Phase 3 | 4 weeks | Tag Suggester, Multi-Message Context, Deduplication | 70-80% improvement (realistic max) |
| Testing & Deployment | 1 week | UAT, Performance Tuning, Documentation | Production-ready |
| Total | 9 weeks | All components integrated and tested | 70-80% activation reliability |
Note: 70-80% is the realistic ceiling within MCP constraints. For 85-90%+ reliability, would require HTTP proxy (claude-llm-proxy pattern) or custom MCP host.
Conclusion¶
This architectural plan transforms cortexgraph from sporadic, LLM-dependent activation to reliable, preprocessing-assisted activation. By adding a preprocessing layer that detects patterns, extracts entities, classifies intent, and scores importance, we reduce LLM cognitive load while preserving flexibility.
Key Principles: 1. Work Within MCP Constraints: Realistic architecture, no impossible pre-LLM interception 2. Two-Track Approach: Auto-enrichment (save_memory) + Decision Helper (analyze_message) 3. Progressive Enhancement: Each component adds independent value 4. Research-Backed: Built on 2024-2025 state-of-the-art approaches 5. Production-Ready: Optimized for latency, maintainability, configurability
Expected Outcome: - Within MCP: 70-80% activation reliability (realistic ceiling) - Parameter Quality: 100% consistent entities, tags, strength scores (auto-populated) - User Experience: Dramatically improved trust in cortexgraph memory system
For Higher Reliability (85-90%+): If 70-80% isn't sufficient, consider: - HTTP Proxy Approach: Adapt claude-llm-proxy for Claude Code CLI (pre-LLM preprocessing possible) - MCP-to-MCP Proxy: Build custom proxy MCP server that forwards to cortexgraph - Dual Integration: Use HTTP proxy for Claude Code, direct MCP for Claude Desktop
The MCP architecture is fundamentally LLM-first, which limits automatic activation. This plan maximizes what's possible within that constraint.
References¶
Academic Papers¶
- ArXiv 2504.19413v1: "Mem0: Building Production-Ready AI Agents with Scalable Long-Term Memory"
- Wiley Expert Systems (2025): "Intent detection for task-oriented conversational agents"
- MDPI Applied Sciences (2025): "Knowledge Graph Construction: Extraction, Learning, and Evaluation"
- Frontiers in Computer Science (2025): "Knowledge Graph Construction with LLMs"
Industry Tools¶
- Mem0: github.com/mem0ai/mem0
- spaCy: spacy.io
- Transformers (Hugging Face): huggingface.co/transformers
- KeyBERT: github.com/MaartenGr/KeyBERT
- Sentence-Transformers: github.com/UKPLab/sentence-transformers
cortexgraph Documentation¶
- Architecture:
docs/architecture.md - API Reference:
docs/api.md - Smart Prompting (current):
docs/prompts/memory_system_prompt.md - Scoring Algorithm:
docs/scoring_algorithm.md
Related Projects¶
- claude-llm-proxy: HTTP proxy for Claude Code CLI with context injection
- Location:
../claude-llm-proxy/ - Pattern: Intercept HTTP API requests → inject preprocessing → forward to Claude
- Key Insight: This pattern works for HTTP API but NOT for MCP (stdio-based)
- Use case: If you need pre-LLM preprocessing for Claude Code CLI (non-MCP)
Document Version: 2.0 (Updated for MCP Architecture Reality) Last Updated: 2025-11-14 Author: Claude (Sonnet 4.5) with STOPPER Protocol Approved By: Scot Campbell (v1.0), Pending approval for v2.0 Next Review: After Phase 1 completion
Major Changes in v2.0:
- ❌ Removed impossible @mcp.before_completion() hook (doesn't exist in FastMCP)
- ✅ Added MCP Architectural Constraints section explaining why pre-LLM interception is impossible
- ✅ Updated Solution Architecture to two-track approach (auto-enrichment + analyze_message)
- ✅ Adjusted reliability targets: 70-80% realistic ceiling (was 85-90% aspirational)
- ✅ Updated all Phase 2 integration code to use realistic MCP tools
- ✅ Added claude-llm-proxy reference for HTTP proxy alternative
- ✅ Clarified that 85-90%+ requires HTTP proxy or custom MCP host