Skip to content

Prompt Optimization Flow

Overview

This document describes a sophisticated prompt optimization architecture that intercepts, analyzes, enhances, and validates user prompts before they reach Claude. The system uses a multi-stage pipeline involving local LLMs, MCP tool chains, knowledge graph integration, and cloud-based optimization to maximize prompt quality while minimizing API costs.

Key Benefits

  • Zero Initial API Cost: All optimization happens before hitting paid Claude API endpoints
  • Intelligent Complexity Routing: Simple prompts bypass optimization for speed; complex prompts get full treatment
  • Knowledge Graph Integration: Automatically enriches prompts with relevant context from CortexGraph
  • Multi-Model Validation: Cross-validates optimizations using multiple LLMs to ensure quality
  • Flexible Architecture: Local LLMs can be swapped with cloud providers as needed
  • Metadata Enrichment: Adds confidence scores, similarity metrics, and processing metadata to prompts

Architecture Components

1. Proxy Server

  • Central orchestration layer
  • Handles routing decisions based on complexity
  • Manages communication between all components
  • Tracks confidence/similarity thresholds

2. Local LLMs

  • Primary: Prompt optimization and tagging
  • Validation: Multiple instances for cross-validation
  • Can be replaced with cloud providers (OpenAI, Anthropic, etc.)

3. MCP Tool Chain

  • CortexGraph: Knowledge graph for context retrieval
  • STOPPER: Process control and validation
  • Custom Tools: User-defined extensions
  • Gemini Optimizer: Large context window for final assembly

4. Validation Layer

  • Semantic similarity checks
  • Confidence scoring
  • Iterative refinement below thresholds

Detailed Flow Description

Phase 1: Initial Intake

  1. User Input: User enters prompt in Claude Code interface
  2. Proxy Intercept: Proxy captures the prompt before it reaches Claude
  3. Complexity Analysis: NLP-based complexity rating determines routing strategy

Phase 2: Intelligent Routing

  1. Simple Path (Low Complexity):
  2. Proxy applies basic formatting rules
  3. Routes directly to Claude with minimal processing
  4. Optimizes for speed and reduces overhead

  5. Complex Path (High Complexity):

  6. Triggers full optimization pipeline
  7. Proceeds to Phase 3

Phase 3: Prompt Optimization

  1. Local LLM Processing:
  2. Adds semantic tags to categorize intent
  3. Restructures prompt for optimal Claude comprehension
  4. Formats according to Claude best practices
  5. Extracts key entities and concepts

Phase 4: Validation & Refinement

  1. Multi-Model Validation:
  2. Routes optimized prompt to 2-n additional local LLMs
  3. Each validator scores the optimization independently
  4. Can use semantic similarity algorithms instead of LLMs
  5. Calculates confidence and similarity metrics

  6. Threshold Check:

  7. If scores meet threshold: Proceed to Phase 5
  8. If scores below threshold: Return to Phase 3 for reprocessing
  9. Prevents low-quality optimizations from proceeding

  10. Tool Recommendation:

  11. Proxy receives validated prompt with metadata
  12. System suggests relevant MCP tools for the query

Phase 5: MCP Tool Chain Execution

  1. CortexGraph Search:

    • Searches knowledge graph for related concepts
    • Retrieves relevant memories and context
    • Returns similarity-scored results
  2. STOPPER Validation:

    • Process control checks
    • Safety and constraint validation
    • Prevents out-of-scope operations
  3. Additional Tools:

    • Routes to n other tools based on user preferences
    • Each tool contributes specialized context
    • Tools run in parallel for efficiency

Phase 6: Final Assembly

  1. Gemini Optimization:

    • Combines original prompt + optimizations + tool outputs
    • Leverages Gemini's large context window (2M tokens)
    • Uses generous free tier for cost optimization
    • Assembles coherent final prompt
  2. Quality Assurance:

    • Compares input to assembled output
    • Generates similarity score (drift detection)
    • Calculates final confidence rating
    • Appends metadata to prompt

Phase 7: Claude Execution

  1. Final Prompt Delivery:

    • Proxy sends optimized prompt to Claude
    • First API cost incurred at this step
    • Prompt includes:
    • Original user intent (preserved)
    • Optimization tags and structure
    • Knowledge graph context
    • Tool outputs and recommendations
    • Confidence/similarity metadata
    • Processing history
  2. Normal Operation:

    • Claude processes the enriched prompt
    • Claude Code continues standard workflow
    • User receives high-quality response

Sequence Diagram

```mermaid sequenceDiagram actor User participant Claude Code Interface participant Proxy participant NLP Complexity Analyzer participant Local LLM (Optimizer) participant Local LLM 2 (Validator) participant Local LLM N (Validator) participant Semantic Similarity Engine participant MCP Chain participant CortexGraph participant STOPPER participant Custom Tools participant Gemini participant Claude API

%% Phase 1: Initial Intake
User->>Claude Code Interface: Enter prompt
Claude Code Interface->>Proxy: Forward prompt
Proxy->>NLP Complexity Analyzer: Analyze complexity
NLP Complexity Analyzer-->>Proxy: Complexity rating

%% Phase 2: Routing Decision
alt Low Complexity (Simple Prompt)
    Proxy->>Proxy: Apply basic rules
    Proxy->>Claude API: Route directly to Claude
    Note over Proxy,Claude API: Fast path for simple queries
else High Complexity (Complex Prompt)
    Note over Proxy: Trigger full optimization pipeline

    %% Phase 3: Optimization
    Proxy->>Local LLM (Optimizer): Optimize prompt
    Note over Local LLM (Optimizer): - Add semantic tags<br/>- Format for Claude<br/>- Extract entities<br/>- Restructure query
    Local LLM (Optimizer)-->>Proxy: Optimized prompt v1

    %% Phase 4: Validation Loop
    rect rgb(240, 240, 240)
        Note over Proxy,Semantic Similarity Engine: Validation & Refinement Loop

        par Parallel Validation
            Proxy->>Local LLM 2 (Validator): Validate optimization
            Proxy->>Local LLM N (Validator): Validate optimization
            Proxy->>Semantic Similarity Engine: Check semantic similarity
        end

        Local LLM 2 (Validator)-->>Proxy: Confidence score 2
        Local LLM N (Validator)-->>Proxy: Confidence score N
        Semantic Similarity Engine-->>Proxy: Similarity score

        Proxy->>Proxy: Aggregate scores

        alt Below Confidence/Similarity Threshold
            Note over Proxy,Local LLM (Optimizer): Quality check failed
            Proxy->>Local LLM (Optimizer): Reprocess with feedback
            Local LLM (Optimizer)-->>Proxy: Optimized prompt v2
            Note over Proxy: Loop until threshold met
        else Above Threshold
            Note over Proxy: Quality validated, proceed
        end
    end

    Proxy->>Proxy: Append recommendation metadata

    %% Phase 5: MCP Tool Chain
    Proxy->>MCP Chain: Route validated prompt + metadata

    rect rgb(230, 245, 255)
        Note over MCP Chain,Custom Tools: MCP Tool Execution (Parallel)

        par Tool Execution
            MCP Chain->>CortexGraph: Search knowledge graph
            MCP Chain->>STOPPER: Validate constraints
            MCP Chain->>Custom Tools: Execute user-defined tools
        end

        CortexGraph-->>MCP Chain: Context + memories (similarity scored)
        STOPPER-->>MCP Chain: Validation results
        Custom Tools-->>MCP Chain: Tool outputs
    end

    %% Phase 6: Final Assembly
    MCP Chain->>Gemini: Assemble final prompt
    Note over Gemini: - Combine all inputs<br/>- Optimize structure<br/>- 2M token context<br/>- Free tier usage

    Gemini->>Gemini: Compare input vs output
    Gemini->>Gemini: Calculate similarity & confidence
    Gemini-->>MCP Chain: Final prompt + metadata

    MCP Chain-->>Proxy: Return final prompt

    %% Phase 7: Claude Execution
    Note over Proxy,Claude API: 💰 First API cost incurred here
    Proxy->>Claude API: Send final optimized prompt
    Note over Claude API: Prompt includes:<br/>- Original intent<br/>- Optimizations<br/>- Knowledge graph context<br/>- Tool outputs<br/>- Metadata
end

%% Normal Operation
Claude API-->>Claude Code Interface: Process request
Claude Code Interface-->>User: Return response
Note over User,Claude Code Interface: Claude Code continues as normal

```

Configuration Options

Complexity Thresholds

```python

Proxy configuration

Prompts with complexity > COMPLEX_PROMPT_THRESHOLD follow the complex path, otherwise the simple path is used.

COMPLEX_PROMPT_THRESHOLD = 0.4 ```

Validation Settings

```python

Validation thresholds

CONFIDENCE_THRESHOLD = 0.75 # Minimum confidence to proceed SIMILARITY_THRESHOLD = 0.80 # Minimum semantic similarity MAX_REFINEMENT_ITERATIONS = 3 # Prevent infinite loops ```

Model Selection

```python

Local LLMs (can be replaced with cloud providers)

OPTIMIZER_MODEL = "llama-3.1-70b" # Primary optimizer VALIDATOR_MODELS = [ # Validation ensemble "mixtral-8x7b", "qwen-2.5-72b", "deepseek-v2" ]

Example using cloud providers (alternative to local)

OPTIMIZER_MODEL = "openai:gpt-4"

VALIDATOR_MODELS = ["anthropic:claude-3-opus", "openai:gpt-4"]

```

MCP Tools

```python

Tool chain configuration

MCP_TOOLS = { "cortex_graph": { "enabled": True, "similarity_threshold": 0.7, "max_results": 10 }, "stopper": { "enabled": True, "strict_mode": False }, "custom": { "user_preferences": True, "context_retrieval": True } } ```

Gemini Settings

```python

Final assembly configuration

GEMINI_MODEL = "gemini-2.0-flash-exp" # Free tier, large context GEMINI_MAX_TOKENS = 2000000 # 2M token context window GEMINI_TEMPERATURE = 0.3 # Consistent assembly ```

Performance Characteristics

Latency Profile

Stage Estimated Time Notes
Complexity Analysis 10-50ms Fast NLP classification
Simple Path (total) 50-100ms Minimal processing overhead
Optimization 200-500ms Local LLM inference
Validation 150-300ms Parallel execution
MCP Tool Chain 100-400ms Depends on tool complexity
Gemini Assembly 300-800ms Large context processing
Complex Path (total) 1-3 seconds Full pipeline

Cost Analysis

Traditional Approach (direct to Claude): - Every prompt hits Claude API immediately - No optimization or context enrichment - Cost: $X per request from first token

Optimized Approach (this architecture): - Local LLMs: Free (self-hosted) or cheap (cloud) - Gemini: Leverages the generous free tier for final assembly - Claude API: Only hit after full optimization - Cost: $0 until Claude execution, then same $X but better results

Net Effect: - Same Claude API cost per request - Significantly better prompt quality - Higher success rate (fewer retries needed) - Lower total cost due to reduced iterations

Implementation Considerations

1. Local LLM Requirements

  • GPU: RTX 4090 or better for 70B models
  • RAM: 64GB+ recommended
  • Alternative: Use cloud inference APIs (Groq, Together.ai, OpenRouter)

2. Proxy Server

  • Needs to be MCP-compatible
  • Should support WebSocket for streaming
  • Must handle concurrent validation requests

3. Knowledge Graph Integration

  • CortexGraph needs to be populated with relevant data
  • Index must be kept up-to-date
  • Consider using CortexGraph for temporal memory

4. Error Handling

  • Fallback to simple path if optimization fails
  • Timeout protection (max 5s total processing)
  • Graceful degradation if tools unavailable

5. Monitoring & Observability

  • Track optimization success rates
  • Monitor confidence/similarity distributions
  • Log processing times for each stage
  • A/B test optimized vs non-optimized prompts

Future Enhancements

  1. Adaptive Thresholds: Learn optimal confidence/similarity thresholds per user
  2. Caching Layer: Cache optimizations for similar prompts
  3. User Feedback Loop: Incorporate user ratings to improve optimization
  4. Model Selection: Automatically choose best LLM based on prompt type
  5. Streaming Optimization: Stream partial results during processing
  6. Cost Tracking: Detailed cost accounting per stage
  7. A/B Testing Framework: Compare different optimization strategies

Security Considerations

  • Prompt Injection: Validate all optimized prompts for injection attempts
  • Data Privacy: Local LLMs keep sensitive data on-premise
  • Rate Limiting: Prevent abuse of free tier services
  • Access Control: Authenticate proxy requests
  • Audit Trail: Log all prompt transformations

Example Workflow

Input Prompt

``` "Help me write a Python function to process user data" ```

After Optimization

```markdown

Task: Python Function Development

User Intent: Create data processing function

Context (from CortexGraph): - User prefers type hints (from memory: 2025-10-15) - Uses pytest for testing (from memory: 2025-10-20) - Prefers dataclasses over dicts (from memory: 2025-10-12)

Requirements: 1. Function should process user data 2. Follow user's Python style preferences 3. Include type hints and docstrings 4. Consider testing approach

Metadata: - Confidence: 0.87 - Similarity: 0.92 - Optimization iterations: 1 - Tools used: CortexGraph, STOPPER - Processing time: 1.2s ```

Result

Claude receives a rich, contextualized prompt that produces higher-quality output on the first try, reducing the need for follow-up iterations.


Built with Claude Code 🤖