Changelog¶
All notable changes to MeshFlow are documented here. Format follows Keep a Changelog.
[1.0.0] — 2026-05-30¶
First stable release — Production/Stable¶
4,349 tests passing (19 skipped = live API + optional deps).
MeshFlow 1.0 is the first production-stable release. The public API is now locked under semantic versioning. Breaking changes will require a major version bump.
Stable API surface¶
All symbols exported from meshflow.__all__ are now part of the stable public API.
Internal modules (prefixed with _) remain subject to change.
What's included in 1.0¶
Agents
- Agent — role, tools, memory, guardrails, streaming, structured output, healing, handoffs
- Team, GroupChat, GroupChatManager — multi-agent coordination patterns
- Supervisor, AdversarialTeam — orchestrator and debate patterns
- ReActAgent — Plan → Act → Observe → Reflect loop
- AgentSession — stateful multi-turn with compression
- AgentPool — async queue, round-robin, global registry
- CriticAgent, AdaptiveAgent, DebatePanel, EarlyExitAgent — specialized patterns
- Pre-built agents library: agents.ResearchAgent(), agents.CoderAgent(), etc.
Orchestration
- StateGraph — typed LangGraph-compatible state graph with interrupt() / Command HITL
- Flow — event-driven decorator API (CrewAI Flows parity)
- Crew, Task, Process — CrewAI-compatible team primitives
- DurableWorkflowExecutor — SQLite / Redis / Postgres / S3 checkpoint and resume
- WorkflowDefinition.from_yaml() — full YAML-driven pipeline execution
- @workflow — decorator API for defining typed workflows
- BranchCompare — parallel fork comparison (LangGraph Branch & Compare parity)
Governance (the kernel)
- StepRuntime — 15-step governed execution kernel
- Compliance profiles: hipaa, sox, gdpr, pci, nerc
- ComplianceGuard — real-time mid-run enforcement
- ComplianceReporter + SnapshotExporter — post-hoc audit artifacts
- PolicyEngine / PolicyLoader — YAML policy-as-code (DENY wins, 10 operators)
- DascGate + AutoRiskClassifier + TaintGraph — 4-tier risk governance
- VaultStore — Fernet AES secret vault with PBKDF2 key derivation
- TenantStore / TenantContext — full tenant isolation with scoped DB paths
- SLATracker — p50/p95/p99 latency, breach detection, CLI reporting
- AuditLedger — SHA-256 hash chain for tamper-evident audit trails
- KeyStore — PBKDF2 API key management, roles (admin/operator/viewer)
Security
- GuardrailStack — 9 built-in guardrails (PII, toxicity, cost cap, JSON schema, regex, keyword)
- SensitiveDataDetector — 23 PHI/PII + credential patterns, mask/audit
- PromptInjectionDetector + SecretScanner — supply chain and injection defenses
- AgentIdentity / sign_token / verify_token — zero-trust agent authentication
- CircuitBreaker — per-model circuit breakers with rolling-window stats
Memory & RAG
- AgentMemory — 4-tier: Working → Episodic → Semantic (BM25) → Procedural
- VectorStore, KnowledgeSource, AgentKnowledge — native RAG pipeline
- HybridRetriever (BM25 + dense RRF), LLMRanker, SelfCorrectingRAG
- SemanticMemoryStore — dense embedding search
- CrossSessionMemoryStore — persist memories across sessions
- MemoryConsolidator, TeamWorkspace — shared team memory
Evaluation
- EvalSuite — YAML-driven evals, --save-baseline / --compare-baseline / --fail-on-regression
- LLMJudge — LLM-as-judge with structured scoring
- ConversationEval, ABTest, QualityGate — multi-turn and A/B eval primitives
- ShadowResult / shadow_run — production shadow mode with regression detection
- FeedbackStore — collect human feedback in production
Observability
- EventProjector — AuditTrail, NodeLatency, PolicyViolation, WorkflowSummary projections
- OTELExporter — OTLP/HTTP span export (zero external deps in core)
- TraceServer — visual trace studio (Sprint 69+)
- MetricsCollector — Prometheus-compatible metrics
- WebhookManager + WebhookRetryQueue — HMAC-signed durable webhook delivery
- AlertEngine — metric-threshold alert rules
Providers
- AnthropicProvider (with prompt caching), OpenAICompatibleProvider, GeminiProvider
- BedrockProvider, AzureOpenAIProvider, OllamaProvider, LiteLLMProvider
- AzureIdentityProvider, BedrockIAMProvider, VertexAIProvider — cloud managed identity
- LLM("model-name") — universal entry point
- ProviderRouter — role × budget × compliance → model selection
- ModelHealthTracker — rolling-window health, fallback chain
- AnthropicBatchClient — Anthropic Batch API for high-throughput eval and inference
- CachedProvider — SQLite LLM response cache
Deployment
- Doctor — pre-deploy environment health check
- EnvGenerator — generate production .env from schema
- DockerDeployer — programmatic Docker build + run
- Helm chart at k8s/helm/
- meshflow serve — FastAPI REST + SSE + WebSocket server
- /health/live + /health/ready — Kubernetes probes
- Graceful SIGTERM/SIGINT shutdown
Protocols
- A2A (Agent-to-Agent) protocol — AgentCard, A2AClient, A2AServer, A2ATaskStore
- MCP gateway — MCPServer, MCPClient (consume and expose MCP tools)
- TypeScript client SDK — typed REST + SSE, WebCrypto signature verification
- Go SDK — generated from OpenAPI spec
CLI
meshflow serve, eval, run, graph, audit, compliance, vault, tenant,
tracing, policy, sla, snapshot, dasc, keys, webhooks, analytics,
queue, doctor, bench
Migration from 0.x¶
- No breaking changes in the stable
__all__surface between 0.77 and 1.0. pyproject.tomlversion classifier updated from4 - Betato5 - Production/Stable.- Sprint-numbered section headers removed from
__init__.py(cosmetic only — no symbol changes).
[0.77.0] — 2026-05-30¶
Sprint 77 — Integration, CLI completeness, Studio navigation¶
4231 tests passing (19 skipped).
Wires Sprint 74-76 capabilities into the execution path, adds missing CLI
commands, integrates HybridRetriever/SelfCorrectingRAG as Agent knowledge
backends, adds RoleRouter to Crew, adds model_router: YAML section, and
connects the three studio pages with a shared navigation bar.
[0.76.0] — 2026-05-30¶
Sprint 76 — Strict competitive gap closure (all 6 frameworks)¶
4231 tests passing (19 skipped).
Closed every remaining gap from the May 2026 Competitive Intelligence document.
BranchCompare — LangGraph Branch & Compare mode¶
BranchCompareruns N workflow forks in parallel from a checkpoint, diffs outputs, picks winner by confidence score (core/branch_compare.py)ForkConfig(label, model_override, prompt_override, context_patch)— each fork independently configured;context_patchimplements LangGraph's State Injection modeCompareResult.cost_comparison()/quality_comparison()— structured analytics_word_diff()— unified diff between fork outputs
S3 backend for DurableWorkflowExecutor¶
_S3Store— checkpoints stored as S3 objects under<prefix>/<run_id>/<node_id>.jsonDurableWorkflowExecutor(backend="s3", s3_bucket=..., s3_prefix=...)— serverless resume- Fork method dispatches all four backends: memory / sqlite / redis / postgres / s3
RoleRouter — first-mover dynamic role assignment¶
RoleRouter— LLM-driven role classification with 13-role catalogueAgentSpec(role, goal, tools, model_tier)→spec.to_agent()instantiates live agents- Keyword heuristic fallback (no LLM required for offline use)
RAG depth parity with Haystack¶
LLMRanker— LLM relevance scoring with heuristic fallback;score_thresholdfilteringHybridRetriever— BM25 + dense Reciprocal Rank Fusion;add_texts(),query()SelfCorrectingRAG— retrieve → grade → refine loop;grade_threshold,max_correction_roundsRAGAnswer(text, correction_rounds, grade, context_used)
Curated template library — 20 specialist templates¶
- 20 pre-built templates: HIPAA analyst, SOC2 auditor, GDPR DPO, security CVE researcher, contract legal analyst, financial risk, market researcher, Python code reviewer, data pipeline analyst, clinical literature, API planner, incident response, prompt engineer, PCI DSS checker, technical writer, A/B test analyst, cloud cost optimiser, accessibility auditor, agent workflow designer, competitive intelligence analyst
load_curated_library(registry_dir)— loads all 20 into a local registrytemplate_by_name(name)/templates_by_tag(*tags)— lookup helpers
Interactive studio pages¶
graph.html— browser-based interactive Mermaid graph with clickable nodes (cost/latency/token detail panel), runs viameshflow studioon/graphrag_builder.html— no-code RAG pipeline configurator (7 stages: data → chunk → embed → retrieve → rank → generate → guard), YAML export, Dify feature parityTraceServer.get_mermaid(run_id)— generates Mermaid TD syntax from ledger steps/graphand/ragroutes added to TraceServer
ModelRouter → analytics integration¶
ModelRouter(analytics_ledger=ledger)— routing decisions emitted as async fire-and-forget events to the ReplayLedger, closing the cost-analytics feedback loop
[0.75.0] — 2026-05-30¶
Sprint 75 — Token optimization layer + CriticAgent + Haystack pipeline parity¶
4141 tests passing (19 skipped).
ModelRouter — pre-dispatch model tier routing (first mover)¶
ModelRouterwithRouterConfig— classifies task → nano/small/medium/large- YAML-configurable tiers, keyword catalogue, token-count thresholds
record_decisions=True+savings_vs_default()for cost analytics- Zero competitors (LangGraph/CrewAI/AutoGen/Dify/Flowise/Haystack) have this
CriticAgent — propose / challenge / refine loop¶
CriticAgent(proposer, critic, max_refinements, stop_on_confidence)closes AutoGen multi-agent critique gap; lighter than DebatePanel (no arbiter required)CriticResult.improvement_delta— confidence gain across refinement turns
ToolOutputSummarizer — token Tier 1 gap closure¶
ToolOutputSummarizer(max_tokens=500)— nano-model summarization pass when tool output exceeds threshold;passthrough_toolsset;summary_report()analytics
WorkflowDefinition.to_yaml() — Haystack pipeline portability¶
- Round-trip YAML export: nodes, edges, loop edges, policy, terminal
to_yaml(path=...)writes to file; closes Haystack pipeline-serialization gap
DurableWorkflowExecutor cloud backends¶
backend="redis"—_RedisStorewith TTL, run index (pip install redis)backend="postgres"—_PostgresStorewith UPSERT (pip install psycopg2-binary)
Subprocess sandbox hardening¶
CodeInterpreter(max_memory_mb=N)—resource.setrlimit(RLIMIT_AS)on UnixCodeInterpreter(block_network=True)— strips proxy env vars from subprocess env
[0.74.0] — 2026-05-30¶
Sprint 74 — Scorecard gap closures: public API, managed identity, marketplace¶
4080 tests passing (19 skipped).
Public API promotion (7 modules → all)¶
AdaptiveAgent,DebatePanel/DebateNode/DebateResult,EarlyExitAgent,ContextDeduplicator,TokenBudgetPlanner/ModelSizingAdvisor,RewindEngine/RewindResult/StepSnapshot,ParetoAnalyzer/ModelBenchmark/BenchmarkRun
Cloud managed identity providers¶
AzureIdentityProvider—DefaultAzureCredential(CLI, managed identity, workload identity)BedrockIAMProvider— IAM role assumption viasts:AssumeRole+ named AWS profilesVertexAIProvider— GCP Application Default Credentials, Vertex AI Gemini
Marketplace HTTP registry¶
MarketplaceClient—push(tmpl),pull(name),list_all(),search(query)MarketplaceServer— self-hostable HTTP server wrappingTemplateRegistry- CLI:
meshflow templates share <name> --url http://marketplace.example.com
Docker isolation CI test¶
test_sprint74.py::TestDockerCodeInterpreter— provesdocker=Trueflag wiring and graceful-fail path without Docker daemon
[0.69.0] — 2026-05-29¶
Sprint 68 — Structured Output on Agent/LLM¶
3747 tests passing (18 skipped).
Agent.with_structured_output(schema)¶
Agent.with_structured_output(schema, *, max_retries=3)(meshflow/agents/builder.py): Returns aStructuredAgentbound to the given schema. Calling.run(task)returns the validated Pydantic instance or dict directly — noStructuredOutputResultwrapper..ainvoke(task)is a LangChain-compatible alias.StructuredAgent(meshflow/agents/builder.py): Thin wrapper aroundAgent.run_structuredthat unwraps.dataautomatically. Exported asmeshflow.StructuredAgent.
Provider response_format parameter¶
LLMProvider.complete(…, response_format=None)(meshflow/agents/base.py): Protocol updated — all providers accept an optionalresponse_formatstring.AnthropicProvider:response_format="json"prepends a JSON-only directive to the system prompt.OpenAICompatibleProvider:response_format="json"passesresponse_format={"type": "json_object"}natively to the API.EchoProvider:response_format="json"returns{"echo": <input>}JSON, enabling deterministic structured-output tests without an API key.
Tests¶
tests/test_structured_output.py— 27 tests coveringStructuredOutputParser,with_structured_output,ainvoke, Pydantic schema validation, andresponse_formatonEchoProvider.
[0.68.0] — 2026-05-29¶
Sprint 67 — Flows Decorator API¶
3699+ tests passing.
Event-driven workflow decorators (CrewAI Flows parity)¶
@start()(meshflow/core/flows.py): Marks one or more Flow methods as entry points. All@startmethods fire concurrently whenFlow.kickoff()is called.@listen(trigger)(meshflow/core/flows.py): Fires aftertriggercompletes. Trigger may be a method name string, a method reference, or a(method, route)tuple for router branches.@router(trigger)(meshflow/core/flows.py): Conditional branching — return a route string;@listen((trigger, route))handlers fire only when the route matches.Flow[S](meshflow/core/flows.py): Generic base class.Smust subclassFlowState. Handles BFS execution, state propagation, and result collection.Flow.kickoff(inputs=None)— async execution entry point.Flow.kickoff_sync(inputs=None)— synchronous wrapper.Flow.plot()— returns a Mermaid diagram string of the flow graph.FlowState— typed shared state base class (subclass to add fields).FlowResult—final_output,state,steps_executed,total_tokens,total_cost_usd,duration_s.
All six symbols exported as meshflow.Flow, meshflow.FlowState,
meshflow.FlowResult, meshflow.flow_start, meshflow.flow_listen,
meshflow.flow_router.
Bug fix¶
- Fixed
Flowrouter routing key: routed listeners (@listen((trigger, route))) now correctly use the trigger method name as the key, matching the documented@listen((fn, route))convention.
Tests¶
tests/test_flows_api.py— 28 tests covering all decorators,kickoff,kickoff_sync,plot, chaining, branching, and public API exports.
[0.67.0] — 2026-05-29¶
Sprint 66 — Prebuilt Agent Graphs + StateGraph enhancements + Scorecard gap closure¶
3658+ tests passing.
Prebuilt Agent Graphs (LangGraph parity)¶
MessagesState(meshflow/core/prebuilt.py): Built-inTypedDictwith amessageschannel using theaddreducer.ToolNode(meshflow/core/prebuilt.py): Graph node that dispatches tool calls from the last AI message. Supports Anthropic content blocks, OpenAItool_calls, and ReAct inlineAction: / Action Input:format.handle_errors=Trueby default.create_react_agent(model, tools, *, state_schema, system_message, max_iterations): One-liner factory for a full ReAct loop as aCompiledGraph.create_tool_calling_agent(model, tools, *, system_message): Single-shot tool-calling graph (agent → tools → end, no loop).
StateGraph enhancements (Sprint 68)¶
Send(node, state={})— dynamic fan-out: returnSendorlist[Send]from a conditional edge to dispatch parallel branches with per-branch state.add_sequence([(name, fn), ...])— chain nodes in one call.- Subgraph nesting — pass a
CompiledGraphdirectly toadd_node(). MemorySaver— in-memory checkpoint store keyed bythread_id.SqliteSaver— SQLite-backed checkpoint store, survives restarts.compile(checkpointer=...)— attach a checkpointer at compile time.CompiledGraph.get_state(config)/update_state(config, values)— inspect and patch saved thread state between runs.add_conditional_edges(..., mapping=None)— mapping is now optional forSend-based routing.
Scorecard gap closure (Sprints 67 RAG/context/memory)¶
RAGTokenBudget(meshflow/agents/rag_budget.py): Enforcemax_chars/max_tokensper knowledge injection. Strategies:"truncate","drop","tail".SlidingWindowPruner(meshflow/core/context_pruner.py): Keep the N most recent messages; always preserves system prompt.SummaryPruner(meshflow/core/context_pruner.py): Compress old messages into a rolling summary when token count exceeds limit. Supports custom sync/async summarise functions.CrossSessionMemoryStore(meshflow/intelligence/cross_session.py): SQLite-backed persistent episodic memory across sessions. Features: bigram-similarity deduplication, LRU eviction, multi-agent isolation, tag/session filtering, keyword search.
[0.26.0] — 2026-05-24¶
Sprint 26 — Streaming at all layers + Sprint 27 — Native RAG / Knowledge¶
1521+ tests passing (18 skipped).
Sprint 26 — Streaming¶
-
StreamChunk(meshflow/core/streaming.py): Unified streaming event type across all MeshFlow layers.kind:token | node_start | node_end | task_start | task_end | done | error. Fields:content,node_name,task_index,metadata. Properties:is_token,is_done. Exported asmeshflow.StreamChunk. -
Team.stream(task, context): Async generator yieldingStreamChunkobjects. Sequential/hierarchical/supervised patterns stream each agent in order, passing accumulated output forward. Parallel pattern interleaves token chunks across agents viaasyncio.Queue. Each agent produces:node_start → token… → node_end. Ends withdone. -
Crew.kickoff_stream(inputs): Stream token-by-token from each Task in the crew — one LLM call per task (no double-calling). Collects streamed tokens, setstask.outputfor downstream context injection, then yieldstask_endwith full content. Supports sequential, hierarchical, and parallel process modes. Events:task_start → token… → task_end → done. -
Agent.stream()already existed — regression tested (Sprint 9). -
34 new deterministic tests in
tests/test_sprint26.py.
Sprint 27 — Native RAG / Knowledge¶
-
VectorStore(meshflow/intelligence/knowledge.py): In-memory semantic search with zero required dependencies. Embedding chain: sentence-transformers → numpy BoW → pure-Python char n-gram (always works offline).from_texts(texts),from_file(path)(txt/md/py/json/yaml/csv/pdf),from_directory(dir, extensions).query(text, top_k) → list[str]. Sentence-boundary-aware chunking with configurablechunk_sizeandoverlap. -
KnowledgeSource: A single retrievable source — file path, directory, raw text snippet, URL, orVectorStore. Lazy-loaded on firstretrieve(). Configurabletop_k. -
AgentKnowledge: Aggregates multipleKnowledgeSource/VectorStore/ string sources.retrieve(query)deduplicates across sources.context_string(query, max_chars)returns a prompt-ready[Knowledge]block with---separators. -
Agent(knowledge=[...]): Accepts file paths, text snippets,VectorStore, orKnowledgeSourceobjects. Before each LLM call, the agent queries its knowledge and injects retrieved chunks as[Knowledge]\n...context. Zero cost when no knowledge is provided. -
Task(knowledge=[...]): Per-task knowledge override; injected as[Task Knowledge]\n...in the task prompt. Independent from (and additive to) the agent's own knowledge. -
48 new deterministic tests in
tests/test_sprint27.py.
[0.25.0] — 2026-05-24¶
Sprint 25 — Guardrails: input/output validation at every agent and node¶
1357+ tests passing (18 skipped).
Added¶
-
Guardrailbase class (meshflow/security/guardrails.py): Abstract base withcheck(text) -> GuardrailResultandactionfield ("block"/"warn"/"modify").GuardrailResultcarriespassed,guardrail_name,reason,modified_text,severity, andmetadata.GuardrailViolationexception carries the failingGuardrailResult. -
GuardrailStack: Compose multiple guardrails in sequence.mode="strict"raisesGuardrailViolationon first blocking failure;mode="collect"runs all. "modify" action guardrails rewrite the text in-place for downstream checks. "warn" action guardrails record the failure but do not block the stack.stack.run(text) -> (all_passed, final_text, results). -
8 built-in guardrails:
PIIBlockGuardrail— detect & block/mask/warn PHI/PII viaSensitiveDataDetectorConfidenceGuardrail— block outputs below stated CONFIDENCE:0.XX thresholdLengthGuardrail— enforce min/max chars or word countToxicityGuardrail— block violence / self_harm / hate / profanity patternsJSONSchemaGuardrail— validate JSON output; extracts from markdown fencesRegexGuardrail— require or forbid a regex patternKeywordBlockGuardrail— block forbidden keywords/phrases (whole-word or substring)CostCapGuardrail— reject tasks whose estimated input cost exceeds a budget-
CustomGuardrail— wrap any callable; supportsbool,(bool, str),(bool, str, str)and(bool, modified_text)for modify-mode -
Agent(input_guardrails=[], output_guardrails=[])parameters: Input guardrails run on the task text before the LLM call; output guardrails run on the LLM response before returning to the caller. A blocking violation returns{"blocked": True, "guardrail": name, "guardrail_reason": reason}instead of calling the LLM (zero cost on input block). Non-blocking runs include{"blocked": False, "guardrail_results": [...]}in the result dict. -
Exported from
meshflow: all 9 guardrail classes +GuardrailResult,GuardrailStack,GuardrailViolation.
Tests¶
83 new deterministic tests in tests/test_sprint25.py across 14 test classes.
[0.24.0] — 2026-05-24¶
Sprint 24 — CrewAI/LangGraph/AutoGen feature parity¶
1274+ tests passing (18 skipped).
Added¶
-
Task class (
meshflow/agents/task.py): CrewAI-compatible first-class task abstraction.Task(description, expected_output, agent, human_input=False, context=[], tools=[]). Supports{placeholder}substitution indescriptionviakickoff(inputs={...}). Auto-injects prior task outputs as context whencontextis set. Extratoolsare merged for the duration of the task, then restored.TaskOutputholdsraw,agent_name,tokens,cost_usd. Theoutputfield isNonebefore run and filled after. Exported asmeshflow.Task. -
Crew + Process + CrewOutput (
meshflow/agents/crew.py):Crew(agents, tasks, process=Process.sequential, verbose=False)— governed crew with three execution modes:sequential(chain with auto context injection),parallel(concurrent asyncio.gather),hierarchical(first task is manager, rest are workers that receive manager output as context).CrewOutputaggregates per-task outputs, total tokens, and total cost.Crew.kickoff(inputs={})is the entry point.Processis astrenum so string literals work interchangeably. Exported asmeshflow.Crew,meshflow.Process,meshflow.CrewOutput. -
Built-in skills library (
meshflow/agents/skills.py): 15 built-in skills:python,javascript,data_analysis,sql,web_search,code_review,writing,legal,medical,security,api_design,devops,machine_learning,finance,product. EachSkillis a frozen dataclass withname,description, andtags.skill_prompt(["python", "security"])returns a combined system-prompt snippet. Unknown skill names are silently ignored.list_skills()returns sorted names. Exported asmeshflow.SKILLS,meshflow.Skill,meshflow.skill_prompt,meshflow.list_skills. -
Agent(skills=[], mcps=[])parameters (meshflow/agents/builder.py):skills: list of built-in skill names that augment the agent's system prompt. Skills are appended after the role prompt (or customsystem_prompt).mcps: list of MCP server URLs (strings); each is registered inMCPGatewayand exposed as aToolin the agent's tool list. Stdio params objects are accepted too. -
@nodedecorator (meshflow/core/state.py): LangGraph-style decorator to mark functions as StateGraph nodes.@nodebare sets_is_meshflow_node=Trueand_node_name=fn.__name__.@node("custom_name")sets a custom node name. Decorated functions are still callable directly. Exported asmeshflow.node. -
interrupt()+CommandHITL (meshflow/core/state.py):interrupt(value)raisesInterruptfrom inside a node to pause graph execution.CompiledGraph.run()catches theInterrupt, attaches.node,.value,.stateto the raisedInterruptedError, and stores_interrupted_nodefor resume.Command(resume=..., goto=None, update={})resumes execution:updateis merged intoinitialstate,gotoredirects to a different node,resumecarries the human decision back into the graph. Exported asmeshflow.interrupt,meshflow.Command,meshflow.Interrupt.
Tests¶
74 new deterministic tests in tests/test_sprint24.py across 14 test classes:
Task, TaskOutput, Crew (sequential/parallel/hierarchical), Process, CrewOutput,
Skills library, Agent skills integration, @node decorator, interrupt/Command,
StateGraph reducers (regression), and public API surface.
[0.23.0] — 2026-05-23¶
Sprint 23 — Sensitive data detection, model health tracking, workflow analytics, background task queue¶
1190+ tests passing (18 skipped).
Added¶
-
Sprint 23A — SensitiveDataDetector (
meshflow/security/sensitive_data.py): Rich PHI + PII + credential detection over arbitrary text. Returns structuredSensitiveMatchobjects withkind,category,value_preview,start,end,confidence. 11 PHI/PII patterns (SSN, EMAIL, PHONE, DATE, ZIP, IP, URL, MRN, NPI, CREDIT_CARD, NAME) and 12 credential patterns (Anthropic/ OpenAI/AWS/GCP/GitHub API keys, JWT, RSA private key, DB connection strings, high-entropy hex, Bearer tokens).mask()replaces PHI with[REDACTED]and credentials with[CREDENTIAL-REDACTED]in a single non-shifting pass.audit_report()returns a compliance-ready summary. Module singleton viaget_detector()/reset_detector(). Exported asmeshflow.SensitiveDataDetector. -
Sprint 23B — ModelHealthTracker + ProviderRouter auto-fallback (
meshflow/agents/health.py,meshflow/agents/router.py):ModelHealthTrackerrecords per-model success/failure outcomes in a rolling window (default 50; configurable viaMESHFLOW_HEALTH_WINDOW). Health score = success fraction; models below threshold (default 0.7;MESHFLOW_HEALTH_DEGRADED_THRESHOLD) are marked degraded.summary(model)returns aModelHealthSummarywith p50/p95 latency percentiles and last error. Global singleton viaget_health_tracker().ProviderRoutergainsset_fallback_chain(*models)androute_with_health()which skips degraded models and returns the best healthy candidate (orbest_model()if all degraded). -
Sprint 23C — WorkflowAnalytics (
meshflow/core/analytics.py): Async post-hoc analytics overReplayLedger.WorkflowAnalyticsexposes:cost_trend(n),latency_percentiles(n)(p50/p95/p99),blocked_rate(n),quality_drift(n)(uncertainty trend → "degrading"/"stable"/"improving"),carbon_trend(n),top_costly_nodes(n_runs, top_n), andfull_report(n). All methods are async. New server endpointGET /analytics?n=N. New CLI subcommandmeshflow analytics [--metric cost|latency|blocked|quality|carbon|nodes|full] [--runs N] [--format text|json]. New dashboard "Analytics" page with KPI tiles, cost bar chart, latency metrics, blocked-rate progress bar, quality drift delta, and top-costly-nodes table. -
Sprint 23D — Background task queue (
meshflow/queue/): SQLite-backed durable async task queue.TaskItem(task_id, payload, status, priority, timestamps, result, error) persists across restarts.TaskQueueis crash-safe: tasks stuck in "running" are automatically re-queued on startup.QueueWorkerprovides a bounded async concurrency pool with pluggablehandler(TaskItem) → dict. New server endpoints:GET /queue/status,POST /queue/push,DELETE /queue/{task_id}/cancel,GET /queue/{task_id}. New CLI subcommands:meshflow queue push <yaml>,meshflow queue status,meshflow queue list [--status ...],meshflow queue cancel <id>,meshflow queue worker [--concurrency N].
Changed¶
meshflow/__init__.pyversion bumped to 0.23.0.- New top-level exports:
SensitiveDataDetector,SensitiveMatch,get_sensitive_detector,ModelHealthTracker,ModelHealthSummary,get_health_tracker,WorkflowAnalytics,RunSummary,TaskQueue,QueueWorker,TaskItem,TaskStatus.
[0.22.0] — 2026-05-23¶
Sprint 22 — Dashboard v2, per-tenant rate limiting, scheduled compliance reports, declarative YAML workflows¶
1131 tests passing (18 skipped).
Added¶
-
Sprint 22A — Dashboard v2: New
API Keyspage in the Streamlit dashboard: list, generate, and revoke keys via the/keysREST endpoints (admin-only). Sidebar now shows the authenticated user's name, role, and tenant fromGET /keys/whoami. OTEL page updated to show liveOTELExporterstats:exported_count,error_count, endpoint, and service name with a Refresh button. (dashboard/app.py) -
Sprint 22D — Per-tenant rate limiting:
RateLimiternow usestenant_idas the bucket key instead of the raw API key string.status()returnstenant_idfield. Per-tenant limits are configurable via env vars:MESHFLOW_RATE_LIMIT_TENANT_<ID>_RPSandMESHFLOW_RATE_LIMIT_TENANT_<ID>_BURST. Server_require_auth()passes principal'stenant_idto the limiter. (meshflow/observability/sla.py,meshflow/runtime/server.py) -
Sprint 22B — Scheduled compliance reports: New
meshflow/compliance/scheduler.pywithReportSchedule(dataclass),ScheduleStore(JSON-backed persistence at~/.meshflow/schedules.json),ScheduledReporter.run_now()(generatesComplianceReporterartifact, delivers to file/webhook/stdout sink).create_schedule()factory. HMAC-SHA256 signatures on webhook delivery. Three sinks:file(write/append with separator),webhook(HTTP POST + signature),stdout. CLI:meshflow compliance schedule add|list|run|remove. -
Sprint 22C — Declarative YAML workflow extensions:
WorkflowDefinition.from_yaml()now parses: —loop_edges:list →add_loop_edge()(back-edges with condition + max_iterations) —compliance:section → liveComplianceGuardwired intoStepRuntime—metadata:section → stored aswf.metadatadictWorkflowDefinition.__init__gainscompliance_guardandmetadataattrs.describe()includesloop_edges,compliance_guard,metadatafields. CLImeshflow runpasseswf.compliance_guardtoStepRuntime.
Changed¶
- Rate limiter bucket key changed from raw API key string to
tenant_id("anonymous"for open-mode / unauthenticated requests). RateLimiter.status()now returnstenant_idkey instead ofkey.
[0.21.0] — 2026-05-23¶
Sprint 21 — Tenant isolation, CI, benchmarks, docs¶
1082 tests passing (18 skipped).
Added¶
-
Sprint 21A — Tenant isolation:
WebhookRegistrationgainstenant_idfield;WebhookManager.list(),get(),unregister(),deliver(),delivery_history()all accept optionaltenant_idparameter for scoped filtering. Tenant-scoped webhooks are only visible/deletable by the owning tenant; global hooks (empty tenant_id) are visible to all. Server: all data endpoints (/traces,/hitl,/webhooks,/eval-results) use_ledger_for(principal)helper to scope ledger reads/writes to the authenticated key's tenant. Per-tenant ledger cache avoids per-requestReplayLedgerconstruction. -
Sprint 21B — GitHub Actions CI (
.github/workflows/ci.yml): Matrix test job on Python 3.11 + 3.12; mypy type-check job; ruff lint job; benchmark smoke-test job (--quick); artifact upload for test results and benchmark output. -
Sprint 21C — Benchmark integration:
bench_core.pytracked;--quickflag added (first concurrency level only — used in CI);benchmarks/README.mdupdated with--quickdocumentation and latency regression comparison script. -
Sprint 21D — Docs:
docs/QUICKSTART.md— 9-section developer quickstart covering install, first run, team API, policy-as-code, server, keys, endpoints, Kubernetes, and OTEL tracing.SECURITY.mdat repo root (GitHub's standard location). All compliance guides (HIPAA_GUIDE.md,GDPR_GUIDE.md,SOC2_CONTROLS_MAPPING.md) tracked indocs/compliance/. -
tests/test_sprint21.py— 58 deterministic tests across all Sprint 21 features
[0.20.0] — 2026-05-23¶
Sprint 20 — API auth, Helm chart, policy-as-code, OTEL export¶
1024 tests passing (18 skipped).
Added¶
-
Sprint 20A — API key management (
meshflow/security/api_keys.py): SQLite-backedKeyStorewith PBKDF2-SHA256 hashed secrets; three roles (admin,operator,viewer); per-tenant scoping;create(),verify(),revoke(),list()API; co-exists with legacyMESHFLOW_API_KEYSenv var. Server:GET /keys,POST /keys,DELETE /keys/{key_id},GET /keys/whoami; key management endpoints restricted toadminrole via_require_role(). CLI:meshflow keys generate|list|revoke --db --role --tenant -
Sprint 20B — Deployment artifacts: Helm chart (
k8s/helm/) —Chart.yaml,values.yaml, templates for Deployment, Service, Secret, PVC, HPA,_helpers.tpl; Deployment uses/health/live+/health/readyprobes; autoscaling and ingress configurable via values.Dockerfileupdated: non-root user (uid 1000), healthcheck uses/health/live.docker-compose.ymladds Redis service profile.k8s/deployment.yamlupdated to use/health/live+/health/ready. -
Sprint 20C — Policy-as-code (
meshflow/core/policy_loader.py):load_policy_yaml(path)→Policy;load_guard_yaml(path)→ComplianceGuard | None;load_yaml(path)→(Policy, ComplianceGuard | None)convenience;validate_policy_yaml(path)→list[str]of issues. Minimal built-in YAML parser (no PyYAML dep; uses PyYAML when installed).meshflow serve --policy-file meshflow.policy.yamlvalidates on startup. Examplemeshflow.policy.yamlin project root. -
Sprint 20D — OTEL export pipeline (
meshflow/observability/otel_exporter.py):OTELExporterships spans as OTLP/HTTP JSON to any OTEL collector (Jaeger, Grafana Tempo, Honeycomb, etc.) using zero external dependencies.from_env()factory readsOTEL_EXPORTER_OTLP_ENDPOINT,OTEL_SERVICE_NAME,OTEL_EXPORTER_OTLP_HEADERS. Global singleton viaget_global_exporter().StepRuntimeexports a span per step viarun_in_executor(never blocks).GET /otel/confignow reports live exporter state includingexported_countanderror_count. -
tests/test_sprint20.py— 75 deterministic tests across all Sprint 20 features
[0.19.0] — 2026-05-23¶
Sprint 19 — Webhook wiring, production hardening, TypeScript SDK, ComplianceGuard¶
949 tests passing (18 skipped).
Added¶
-
Sprint 19A — Webhook wiring into StepRuntime:
meshflow/core/runtime.pynow firespolicy_violation,budget_exceeded,hitl_pending, andcollusion_alertwebhooks directly from step execution via fire-and-forgetasyncio.create_task()— step execution is never blocked -
Sprint 19B — TypeScript client SDK (
sdks/typescript/): Full typed REST+SSE client for all MeshFlow API endpoints;verifyWebhookSignature()using WebCrypto (constant-time HMAC comparison);liveEvents(runId?)AsyncIterable for Server-Sent Events streaming;createClient()factory readingMESHFLOW_SERVER/MESHFLOW_API_KEYenv vars; complete TypeScript interfaces for all MeshFlow types -
Sprint 19C — Production hardening:
RedisBusBackendinmeshflow/agents/messaging.py— asyncio pub/sub backed byredis[asyncio]; drop-in for the in-memory bus- PostgreSQL connection pool config via
MESHFLOW_PG_POOL_MIN/MAX/TIMEOUTenv vars (or constructor kwargs);statement_cache_size=100enabled - Kubernetes probes:
GET /health/live(always 200),GET /health/ready(503 during shutdown or ledger unreachable) -
Graceful SIGTERM/SIGINT shutdown with 2 s drain window in
server.py -
Sprint 19D — ComplianceGuard (
meshflow/compliance/guard.py): Real-time mid-run enforcement that runs before each step executes; 8 built-in rules across 5 frameworks — HIPAA§164.502(b)/§164.312(e), SOX§302/§404, GDPR Art.5(1)(b)/5(1)(c), PCI DSS Req 3, NERC CIP-007;block_on_violation=TrueraisesComplianceViolationand halts the step;Falserecords violations without blocking; integrates withStepRuntimevia optionalcompliance_guardparameter -
tests/test_sprint19.py— 47 deterministic tests across all Sprint 19 features
[0.18.0] — 2026-05-23¶
Sprint 18 — Compliance reporting, webhook alerting¶
902 tests passing (18 skipped).
Added¶
meshflow/compliance/reporter.py—ComplianceReporterthat generates structured audit artifacts from ledger data for five regulated frameworks: HIPAA (§164.308/312), SOX (§302/§404/§409), GDPR (Art. 5/6/30/32), PCI DSS v4 (Req 6/7/8/10/12), NERC CIP v6 (CIP-007/008/012/014)ComplianceReport,ComplianceFinding,ComplianceSummarydata model withto_dict(),to_json(),to_text()serialisationmeshflow/observability/webhooks.py—WebhookManager: in-memory webhook registry, HMAC-SHA256 signed payloads, async delivery with 3-attempt exponential-backoff retry, per-webhook delivery history- Supported event types:
policy_violation,budget_exceeded,hitl_pending,run_failed,run_completed,collusion_alert,* GET /compliance/report?framework=hipaa&run_id=X[&format=text|json]— on-demand compliance report from the ledgerPOST /webhooks— register a webhook endpointGET /webhooks— list webhooks + delivery statsDELETE /webhooks/{id}— remove a webhookGET /webhooks/{id}/deliveries— per-webhook delivery history- Dashboard pages: Compliance (generate/download reports per framework) and Alerts (manage webhooks, view delivery stats, register new hooks)
meshflow compliance report --framework <fw> [--run-id] [--format] [--out]meshflow webhooks list|add <url>|remove <id>CLI subcommandstests/test_sprint18.py— 64 deterministic tests across all new features
[0.17.0] — 2026-05-23¶
Sprint 17 — OTEL traces, graph export, audit CSV, SLA monitoring, rate limiting¶
838 tests passing (18 skipped).
Added¶
meshflow/observability/trace_context.py— W3C Trace Context RFC;TraceContext,extract_trace_context(),inject_trace_headers()StepRuntime.run()propagatestraceparentheader throughcontextdict so every step carries its trace lineagemeshflow/core/graph_export.py—steps_to_mermaid(),steps_to_dot(),graph_to_mermaid(),graph_to_dot()ReplayLedger.export_run_csv()— tamper-evident CSV audit artifactmeshflow/observability/sla.py—NodeLatencyTracker(p50/p95/p99 per node, thread-safe sorted reservoir) +RateLimiter(token-bucket per API key, env-configurable)GET /otel/config— OTEL setup introspectionGET /graph/{run_id}[?format=mermaid|dot]— execution graph exportGET /audit/export[?run_id=&format=csv|json]— compliance downloadGET /sla[?node_id=]— p50/p95/p99 latency per nodeGET /rate-limit/status— token-bucket stats per API key- Rate limiting wired into
_require_auth(global singleton, 60 RPS default, override viaMESHFLOW_RATE_LIMIT_RPS/_BURST) - SLA recording wired into
StepRuntime(best-effort, never raises) meshflow graph [--run-id] [--format] [--db] [--out]CLI subcommandmeshflow audit export [--run-id] [--format] [--db] [--out]CLI- Dashboard pages: Graph, Audit, SLA, OTEL
test_sprint17.py— 47 deterministic tests across all four features
[0.16.0] — 2026-05-23¶
Sprint 16 — Dashboard integration, eval history CLI, plugins endpoint¶
791 tests passing (18 skipped = live API tests + streamlit missing).
Added¶
GET /plugins[?group=]— REST endpoint exposing installed plugin entry-points as JSONmeshflow eval-history [--db] [--suite] [--json]— list stored eval results from the ledger in a formatted table or raw JSON- Dashboard Evals page — browse stored eval baselines, select two
to diff, display
BaselineDiff.report() - Dashboard Plugins page — list installed plugins, filter by group, one-click load-verify
- Dashboard Overview — two new KPI tiles: Last Eval Pass Rate and Installed Plugins count
fetch_eval_results()/fetch_plugins()cached fetch helpers indashboard/app.pytest_sprint16.py— 15 new deterministic tests (4 skipped when streamlit absent)
Fixed¶
weighted_scorefield name used consistently in CLI table and dashboard (wasmean_score)
[0.15.0] — 2026-05-23¶
Sprint 14 — SSE events, WebSocket message bus, OpenAI adapter parity¶
733 tests passing (14 skipped = live API tests).
Added¶
-
GET /eventsSSE endpoint — streams allWorkflowEventlifecycle events (STEP_START,STEP_COMPLETE,STEP_BLOCKED,HITL_REQUIRED,WORKFLOW_START/COMPLETE, etc.) to any SSE client. Pass?run_id=<id>to filter to a single run. Past events are replayed on connect; then live events follow in real time. -
Dashboard "Live" page — new Streamlit page that connects to
GET /events, renders a live event table (kind / run_id / node / timestamp), and accepts an optional run_id filter. Updates the table with each arriving SSE event. -
WebSocketBusBackend(meshflow/agents/messaging.py) — cross-process agent messaging over WebSocket. Connects to the server'sGET /ws/bushub; serialises/deserialisesMessageas JSON; fans out to remote peers and delivers incoming messages to local subscribers via a background drain task. -
InMemoryBusBackend— explicit in-process backend (replaces the implicit defaultdict behaviour).MessageBus()uses it by default; passMessageBus(backend=WebSocketBusBackend(url))for cross-process. -
BusBackendprotocol (runtime_checkable) —publish(),incoming(),connect(),disconnect(). Both built-in backends satisfy it. -
GET /ws/bus— WebSocket hub in the aiohttp server. Receives any JSON message from a connected client and fans it out to every other live connection, enabling agents in separate processes to communicate through one shared hub. -
team_from_openai_agents(agents, name, policy, pattern)— wraps a list of OpenAI Agents SDK agents as a governed MeshFlowTeam(any pattern). Mirrorsteam_from_autogen/team_from_crewaito give full adapter parity. -
mesh_tool_to_openai_function(tool)— converts a MeshFlowToolto an OpenAI function-calling schema dict compatible with Chat Completions and Assistants APItoolsparameters. Handlesstr/int/float/bool/list/dictannotations and marks only required params asrequired.
Fixed¶
-
team_from_autogenreturnedGroupChatManagerinstead ofTeam; corrected to returnTeamand updated staletest_integration_fixes.pyassertions. -
mesh_tool_to_openai_functionnow usestyping.get_type_hints()(not justparam.annotation) to resolve annotations correctly under Python 3.14 deferred evaluation (PEP 649).
[0.15.1] — 2026-05-23¶
Sprint 15 — Eval CI regression, AgentPool, Plugin system¶
776 tests passing (14 skipped = live API tests).
Added¶
-
EvalBaseline(meshflow/eval/baseline.py) — golden-set snapshot of anEvalResult.from_result(result),save(path),load(path),to_dict(). Serialised as plain JSON for version control and artefact storage. -
BaselineDiff— regression diff between two baselines. Tracks regressions (PASS→FAIL), improvements (FAIL→PASS), per-scenario score deltas, new/removed scenarios, and pass-rate delta.diff.has_regressionsis the CI gate.diff.report()produces a human-readable table. -
EvalBaseline+BaselineDiffexported frommeshflowtop-level. -
meshflow eval --save-baseline <path>— save the current run result as a golden baseline JSON after evaluation. -
meshflow eval --compare-baseline <path> --fail-on-regression— compare the current run against a saved baseline and exit 1 on any regression. Enables golden-set regression testing in CI without an LLM. -
meshflow eval-diff <baseline_a> <baseline_b>— standalone diff command comparing two baseline JSON files. Supports--fail-on-regression. -
meshflow eval --save-to-ledger— persist anEvalResultin the ledger as a checkpoint entry (eval:<suite>:<timestamp>). -
ReplayLedger.save_eval_result(result)— stores a result in the ledger; returns the storage key. -
ReplayLedger.list_eval_results(suite_name=None)— retrieves stored eval results, optionally filtered by suite name. -
AgentPool(meshflow/agents/pool.py) — a governed, bounded pool of agents driven by anasyncio.Queue.submit(task)dispatches to the next free agent;map(tasks)fans out and collects results in order. Accumulatestotal_cost_usdandtotal_tokensacross the pool. Context-manager (async with pool) handles start/stop. Raises on empty agents or zero concurrency. -
PoolStats— snapshot of pool counters: active workers, queue depth, submitted/completed/failed, cost, tokens, uptime.to_dict()for JSON. -
register_pool/deregister_pool— global pool registry so the server can expose stats without tight coupling. -
GET /pool/status— returns{"pools": [PoolStats.to_dict(), ...]}for all registered pools. -
Dashboard "Pool" page — live view of registered pools: metrics cards per pool (active workers, queued, completed, failed, cost, tokens, concurrency, uptime).
-
AgentPool,PoolStats,register_pool,deregister_poolexported frommeshflowtop-level. -
meshflow/plugins.py— entry-point–based plugin registry.discover_plugins(group=None)discovers installed packages that declaremeshflow.agents,meshflow.tools,meshflow.compliance, ormeshflow.ledgerentry points.load_plugin(name, group)loads the object.verify_plugin(name, group)returns(ok, message)without raising.list_plugins_table()for tabular display. -
PluginInfodataclass — name, group, ep_group, module, dist_name, version, description, loaded, error.to_dict(). -
meshflow plugins list [--group]— lists all installed plugins in a table. -
meshflow plugins verify <name> [--group]— loads and validates a plugin; exits 1 on failure. -
meshflow plugins info <name>— shows full metadata + load-check result. -
PluginInfo,discover_plugins,load_plugin,verify_pluginexported frommeshflowtop-level.
Fixed¶
-
Bumped
__version__to0.15.0; updated three stale== "0.14.0"version assertions in test files. -
Added
[[tool.mypy.overrides]] module = ["dashboard.*"]to suppressdisallow_untyped_decoratorserrors from@st.cache_data(Streamlit has no type stubs — pre-existing in all dashboard functions).
[0.14.0] — 2026-05-22¶
Release readiness — Sprint 13¶
644 tests passing (14 skipped = live API tests). All gap-plan items resolved.
Added¶
-
meshflow benchCLI command — runs the full performance benchmark suite without an API key. Concurrency sweep (10/100/1000), provider microbench, ledger write throughput, and hash-chain validation speed.--quickflag for CI smoke-check.--output results.jsonfor machine-readable results. -
Real multi-tenant isolation (
ReplayLedger) —write()now injects the ledger'stenant_idinto the SQLite row for non-default tenants.list_runs()filters bytenant_idso each tenant sees only their own runs. Previouslytenant_id='default'was written for every row regardless ofReplayLedger(tenant_id=...).
Fixed¶
ReplayLedger.write()tenant isolation — non-default tenant ledger instances now correctly scope their rows sodelete_tenant()andlist_runs()work as documented.
[0.13.0] — 2026-05-22¶
Sprint 12 — Comprehensive test coverage¶
88 new tests across 5 subsystems — previously implemented but untested.
Added¶
-
Built-in tool library tests (
tests/test_builtin_tools.py, 28 tests) — calculator, datetime_now, json_query, shell blocklist, web_search, web_fetch, python_repl, http_request, read_file/write_file, global_registry coverage. -
Provider extension tests (
tests/test_providers.py, 16 tests) — GeminiProvider, BedrockProvider, AzureOpenAIProvider complete() paths and provider_for() factory. -
HITL notification tests (
tests/test_hitl_notifications.py, 11 tests) — HITLNotifier webhook dispatch, HMAC-SHA256 signatures, approve/reject URL injection, network error handling; HITLTimeoutWatcher auto-reject/approve/escalate paths. -
RAG pipeline tests (
tests/test_rag_pipeline.py, 14 tests) — NumpyCosineIndex add/search/top_k, TFIDFEmbeddings async embed/determinism/semantic quality, DocumentStore ingest+retrieve, fixed/sentence chunking, metadata, RAGNode MeshNode wrapping. -
GDPR + multi-tenancy tests (
tests/test_gdpr_multitenancy.py, 19 tests) — delete_run, anonymize_run, delete_tenant, tenant isolation (shared SQLite DB), schema migration ordering.
Fixed¶
NumpyCosineIndexandTFIDFEmbeddingstest constructors corrected (zero-arg, async embed API).- HITL sync test converted to
@pytest.mark.asyncio. - GDPR tests use
write(StepRecord(...))— the correct ledger API.
[0.11.0–0.12.0] — 2026-05-22¶
Sprints 11–12 — SwarmTRM embeddings + EventProjector¶
Added¶
-
Real SwarmTRM embeddings (
meshflow/swarm/embeddings.py) — three-tier fallback:SentenceTransformerEmbedder(all-MiniLM-L6-v2) →NumpyBowEmbedder(random projection, seeded) →CharNgramEmbedder(zero-dep, hash-based).get_embedder(dim)factory islru_cache'd.embed_text(text, dim)convenience function.SwarmTRM._input_embedding()and_role_vector()now use real embeddings; falls back to hash-seeded noise only on exception. -
EventProjector (
meshflow/core/projections.py) — four projections over theMeshEventstream: AuditTrailProjection— per-run ordered timeline +to_dict(run_id).NodeLatencyProjection— STEP_START/STEP_COMPLETE pairs;query(),slowest(n).PolicyViolationProjection— captures BLOCKED/PAUSED/HITL_REQUIRED;violation_count().WorkflowSummaryProjection— per-run rollup (WorkflowSummarydataclass).-
EventProjector— coordinates all four;report(run_id)→ full dict. -
GroupChat + GroupChatManager (
meshflow/agents/conversation.py) — AutoGen-style multi-agent conversations.round_robin,random,auto,customspeaker strategies. Keyword and callable termination conditions.GroupChatManager.stream()yieldsChatMessageobjects. 18 tests intests/test_agentic_platform.py. -
DurableWorkflowExecutor (
meshflow/core/durable.py) — SQLite + in-memory checkpoint/resume._wrap_node()skips completed nodes on replay. -
GovernedToolRegistry (
meshflow/agents/tool_registry.py) —ToolPermissiontiers (READ_ONLY → DATABASE_WRITE → CODE_EXEC → EXTERNAL_API), async/sync dispatch, fullAuditEntrylog.
[0.10.0] — 2026-05-22¶
Added — MeshFlow as an MCP Server¶
MCPServer(meshflow/mcp/server.py) — MeshFlow now speaks MCP as a server, not just a client. Claude Desktop, Cursor, VS Code Copilot, and any MCP-capable host can connect and invoke governed workflows as tools.- Full JSON-RPC 2.0 dispatch:
initialize,tools/list,tools/call,resources/list,prompts/list,ping. - Built-in tools:
meshflow_run,meshflow_approve_hitl,meshflow_reject_hitl,meshflow_get_trace,meshflow_list_runs. register_agent(agent)— anyAgentbecomes an MCP tool automatically.register_team(team)— anyTeambecomes an MCP tool automatically.register_workflow(wf)— anyWorkflowDefinitionbecomes an MCP tool.- Every tool call returns a governance receipt: run_id, cost, tokens, HITL status.
-
mcp_from_config("meshflow.yaml")— builds a fully configured MCP server from YAML. -
HTTP+SSE transport (
/mcpendpoint on the aiohttp server): GET /mcp— discovery endpoint (server info, capabilities, full tool list).POST /mcp— JSON-RPC 2.0 endpoint (Claude Desktop remote connection).GET /mcp/sse— SSE stream for server→client notifications.- Full auth (
Authorization: Bearer/X-API-Key) and CORS support. -
204 No Contentfor MCP notifications (noidfield). -
stdio transport (
meshflow mcp-stdioCLI command): meshflow mcp-stdio— starts a governed MCP stdio server for Claude Desktop local mode.meshflow mcp-stdio --config meshflow.yaml— loads agents/teams from YAML.-
meshflow mcp-stdio --print-config— prints the exactclaude_desktop_config.jsonsnippet to add to Claude Desktop, including the correct executable path. -
meshflow.MCPServer,MCPToolEntry,mcp_from_configexported from the top-levelmeshflowpackage.
Claude Desktop integration¶
Add to ~/Library/Application Support/Claude/claude_desktop_config.json:
{
"mcpServers": {
"meshflow": {
"command": "meshflow",
"args": ["mcp-stdio", "--policy", "standard"],
"env": {"ANTHROPIC_API_KEY": "sk-ant-..."}
}
}
}
Or generate this snippet automatically: meshflow mcp-stdio --print-config
Updated in 0.10.0¶
- Server version string updated to
0.10.0in/healthand MCPserverInfo. MCPServer,MCPToolEntry,mcp_from_configadded tomeshflow.__all__.- Version bumped
0.9.0 → 0.10.0.
[0.9.0] — 2026-05-22¶
Added — Golden Standard Sprint¶
-
Typed state channels —
StateGraphwith reducer-awareChanneldescriptors.Annotated[list[str], add]accumulates across parallel branches;last,first,max_reducer,min_reducerbuilt-in reducers.compile()returns aCompiledGraphwith streaming (stream()async generator) and parallel fan-out execution. Parity with LangGraphStateGraph+ MeshFlow governance layer on top. -
Pre-built agent library — 21 drop-in specialist agents in
meshflow.agents:ResearchAgent,CoderAgent,ReviewerAgent,AnalystAgent,WriterAgent,CriticAgent,PlannerAgent,SummarizerAgent,ExtractorAgent,ClassifierAgent,ValidatorAgent,TranslatorAgent,SQLAgent,APIAgent,AuditorAgent,ReporterAgent,DebugAgent,TeacherAgent,NegotiatorAgent,OrchestratorAgent,GuardianAgent. All acceptpolicy=,model=,tools=. -
GroupChat — AutoGen-style multi-agent conversational orchestration.
GroupChat(agents, max_turns, speaker_selection)withround_robin,random,auto(LLM-driven),custom(callback) speaker strategies.GroupChatManager.stream()yieldsChatMessageobjects in real time.ConversationResult.transcript()returns full formatted dialogue. Callable or keyword-string termination conditions. -
Declarative YAML config —
meshflow.load("meshflow.yaml")builds a complete governed multi-agent system from a single file. Supportsagents,team,workflow(graph), andgroupchatsections. Environment variable expansion with${VAR}.meshflow.loads(yaml_string)for in-process use. -
Agent evaluation framework —
EvalSuite,EvalScenario,run_eval(). Scenarios supportexpected_contains,expected_not_contains,min_confidence,max_tokens,eval_fn(built-in:valid_json,check_runnable_python,non_empty,no_hallucination_markers; or inline Python expression).--fail-underthreshold for CI gating.meshflow eval evals.yaml --tags smoke --fail-under 0.9. -
LangChain tool bridge —
meshflow.integrations.langchain:lc_tool(lc_tool_obj)wraps any LangChainBaseToolas a MeshFlowTool.lc_tools([...])wraps a list.mesh_tool_to_lc(tool)converts the other way.agent_from_lc(lc_agent)wraps anAgentExecutoror LCEL chain as aMeshFlow Agent. -
meshflow evalCLI command —meshflow eval evals.yaml [--agent path.py] [--tags smoke] [--concurrency 4] [--fail-under 0.9]. Auto-loads aResearchAgentif--agentis omitted.
Updated¶
meshflow/__init__.py— exportsStateGraph,END,START,add,last,first,Channel,GroupChat,GroupChatManager,ConversationResult,MeshFlowConfig,load,loads,EvalSuite,EvalScenario,EvalResult,ScenarioResult,run_eval, and theagentsnamespace module.- Version — bumped
0.8.0 → 0.9.0. - Description — updated to "the golden standard of multi-agent orchestration."
- Deprecation fix — replaced
asyncio.iscoroutinefunction()withinspect.iscoroutinefunction()throughout (deprecated in Python 3.16).
Test Coverage¶
- 36 new tests in
tests/test_golden_standard.pycovering all six new feature areas. - Full suite: 265/265 passing.
[0.8.0] — 2026-05-22¶
Added — Critical gaps closed¶
- Token-level streaming —
AnthropicProviderandOpenAICompatibleProvidernow implementstream_complete()yieldingTokenChunkobjects. The HTTP server streams NDJSON overaiohttp.StreamResponse. - API key authentication —
Authorization: BearerandX-API-Keyheader support. Keys loaded fromMESHFLOW_API_KEYSenv var (comma-separated). Server rewritten fromBaseHTTPRequestHandlerto fully asyncaiohttp. - Graph cycles / loop edges —
WorkflowDefinition.add_loop_edge(src, dst, condition, max_iterations).MaxIterationsErrorraised as safety cap. Powers the new"reflective"team pattern. - Output compression + schema migrations — ledger entries >10 KB are gzip+base64
compressed transparently.
_MIGRATIONSregistry applied on startup for both SQLite and PostgreSQL.
Added — High priority¶
- Vector memory —
TFIDFEmbeddings(zero-dep, in-process TF-IDF) andNumpyCosineIndex(cosine similarity).MEM1Storegains semanticretrieve_relevant(). Vocabulary frozen after ingestion to guarantee consistent vector dimensions. - HITL webhooks + timeout —
HITLNotifierPOSTs HMAC-SHA256 signed payloads.HITLTimeoutWatcherauto-approves/rejects/escalates after configurable timeout. - Rich tool schemas —
_ann_to_json_schema()handlesAnnotated,Optional,Literal,list[X], PydanticBaseModel. Parallel tool dispatch viaasyncio.gather. - Schema migrations — versioned migration registry; SQLite wraps in
try/except, Postgres usesADD COLUMN IF NOT EXISTS.
Added — Medium priority¶
- RAG pipeline —
DocumentStore(chunk → embed → index),RAGNode(MeshNode),RAGPipeline(synchronous façade for scripts/tests),Evidence+RAGResulttypes. - Multi-tenancy —
ReplayLedger(tenant_id=...)scopes all queries.delete_run(),delete_tenant(),anonymize_run()for GDPR right-to-erasure. - Trace viewer + Prometheus metrics —
meshflow trace <run-id>rich terminal table with chain validation.MetricsCollectorsingleton;/metricsendpoint emits Prometheus text format. - Additional providers —
GeminiProvider,BedrockProvider,AzureOpenAIProvider.provider_for(name, **kwargs)factory. - Pre-built tool library — 10 tools:
web_search,web_fetch,python_repl,read_file,write_file,shell(with blocklist),json_query,http_request,datetime_now,calculator(AST-based safe eval). - Deployment —
Dockerfile(multi-stage,python:3.11-slim),docker-compose.yml(SQLite + PostgreSQL profiles),k8s/deployment.yaml(Deployment + Service + PVC + HPA).
Added — Low priority / DX¶
- TypeScript SDK —
@meshflow/client:MeshFlowClientwithrun(),stream()(async generator),getTrace(),listRuns(), HITL approve/reject.package.json+tsconfig.jsonwithtsupdual CJS/ESM build. - Python client SDK —
meshflow.client.MeshFlowClient(async) +_SyncClientwrapper. Exported frommeshflowtop-level. - SOC 2 / HIPAA / GDPR compliance docs —
docs/compliance/:SOC2_CONTROLS_MAPPING.md(CC1–CC9 + A1/C1/P),HIPAA_GUIDE.md,GDPR_GUIDE.md,SECURITY.md. - PHI scrubber —
PHIScrubbercovers all 18 HIPAA Safe Harbor categories. Activated viaPolicy.scrub_phi=Trueormode="hipaa". - CLI improvements —
meshflow trace,meshflow runs,meshflow dev,meshflow servewith--api-key,--ledger,--tls-cert,--tls-key. - Streamlit dashboard —
dashboard/app.py: Overview, Runs (trace inspector + hash viewer), HITL Queue, Metrics, Submit Task.make dashboardto launch. - Benchmarks —
benchmarks/bench_core.py: concurrency sweep (10/100/1000), provider microbench (155k calls/s), ledger writes (69k/s), chain validation (116 steps/ms). - Live integration tests —
tests/test_live.py(14 tests, gated behindANTHROPIC_API_KEY).make test-live. - conftest.py —
in_memory_ledger,shared_ledger,dev_policy,regulated_policy,make_step_recordfixtures. Session-scopedlive_server_url+live_client. - Policy-mode examples —
examples/hipaa_phi_pipeline.py,examples/regulated_financial_review.py,examples/legal_critical_nda_review.py.
Changed¶
pyproject.toml— mandatory deps trimmed to 6 (anthropic,aiohttp,httpx,aiosqlite,pyyaml,rich). Heavy deps moved to named extras:meshflow[openai],meshflow[gemini],meshflow[bedrock],meshflow[rag],meshflow[postgres],meshflow[s3],meshflow[dashboard],meshflow[otel],meshflow[full].- Ledger —
StepRecordgainstimestamp(required),prev_hash(default""),metadata(default{}). Output stored compressed when >10 KB. - Server — replaced
BaseHTTPRequestHandler+HTTPServerwithaiohttpapp. Added/metrics,/hitl/pending,/hitl/{id}/approve,/hitl/{id}/rejectroutes. pytestmarkers —liveandslowmarkers registered inpyproject.tomlandconftest.py. No more marker warnings.- Version — bumped
0.7.0 → 0.8.0.
Fixed¶
- TF-IDF embedding vocabulary frozen after corpus ingestion — prevents dimension mismatch between stored document vectors and query vectors.
RAGPipelinenow batch-ingests all documents on firstretrieve()call (lazy build) rather than per-add_document(), ensuring consistent vocabulary.PostgresLedgerBackendschema-migrations query no longer fails when the fake test connection returns step rows for unrecognised SQL.
[0.7.0] — 2026-05-01¶
- Universal
MeshNode+StepRuntimekernel WorkflowDefinitionwith fan-out/fan-in parallel execution- Conditional edge routing with transitive skip propagation
- Durable human approval checkpoints
- Pluggable ledger backends (SQLite, PostgreSQL, S3 archive)
- 33 integration tests
[0.1.0–0.6.x] — 2025¶
Initial development: cross-framework execution, governance layers, DID identity, SHA-256 audit chain, DascGate policy engine, HITL, collusion detection, uncertainty scoring, environmental optimizer, cross-run learner, MCP gateway.