Metadata-Version: 2.4
Name: envbert-mcp
Version: 3.0.0
Summary: MCP server exposing raw envbert EDD classification to Claude, no LLM fallback, no Ollama/Azure dependency
License: MIT
Project-URL: Homepage, https://github.com/YOUR_ORG/envbert-mcp
Keywords: envbert,mcp,edd,environmental,due-diligence,claude,bert
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.10
Description-Content-Type: text/markdown
Requires-Dist: mcp>=1.0.0
Requires-Dist: envbert>=0.1.0
Provides-Extra: dev
Requires-Dist: pytest>=8.0.0; extra == "dev"
Requires-Dist: pytest-asyncio>=0.23.0; extra == "dev"

# envbert-mcp

MCP server exposing **raw envbert** (DistilBERT EDD classification) to
Claude — standalone, no LLM fallback, no Ollama/Azure dependency.

---

## Why raw envbert, not envbert-agent

envbert-agent adds an LLM fallback step for low-confidence classifications
— useful when there's no other reasoning layer downstream (CLI usage,
batch pipelines writing straight to a CSV). But Claude *is* a reasoning
layer already sitting right there. Routing through a second, hidden LLM
call inside the tool — adding ~10s and, if configured for Azure, real
cost — is redundant when Claude can look at a low-confidence label
directly and decide what to do with it.

This server calls `envbert.due_diligence.envbert_predict()` directly. No
network hop, no Ollama, no Azure, no envbert-api. Sub-second responses.

If you need the LLM-fallback behaviour (e.g. building a non-LLM pipeline),
use [`envbert-agent`](https://pypi.org/project/envbert-agent/) or
[`envbert-api`](../envbert-api) instead — see the cross-comparison below.

---

## Architecture

```
Claude / Claude Code
        │ MCP (stdio)
        ▼
envbert_mcp/server.py     ←── this file, single process
        │ in-process call
        ▼
envbert.due_diligence.envbert_predict()
        │
        ▼
DistilBERT (d4data/environmental-due-diligence-model)
```

Everything runs in one process. The model loads once at startup
(warmup, ~10-40s on a cold HuggingFace cache) and stays in memory for
the life of the server.

---

## Tools

| Tool | Purpose |
|---|---|
| `check_envbert_status` | Confirm the model is loaded before relying on fast responses — the very first call in a session may be slow while warmup is still in progress |
| `classify_environmental_text` | Classify one sentence/paragraph — label + confidence, no LLM step |
| `classify_environmental_document` | Classify all paragraphs of a document concurrently, with category distribution and low-confidence flagging |

### Why low-confidence flagging matters here

Without an LLM fallback, a low envbert confidence score (e.g. 0.42) is the
final answer — there's no second opinion baked in. `classify_environmental_document`
surfaces a `low_confidence_items` list explicitly so Claude can apply its
own judgement to exactly those paragraphs, rather than treating every
result as equally reliable.

```json
{
  "category_distribution": {"Geology": 4, "Contaminants": 2},
  "low_confidence_count": 1,
  "low_confidence_items": [
    {"index": 3, "label": "Remediation Standards", "confidence": 0.42}
  ],
  "results": [ ... ]
}
```

---

## Quickstart

```bash
pip install envbert envbert-mcp
```

Add to `~/.claude/mcp_config.json`:

```json
{
  "mcpServers": {
    "envbert": {
      "command": "envbert-mcp"
    }
  }
}
```

Restart Claude Code. The model warms up in the background on first
launch — `check_envbert_status` will report `"loading": true` until ready.

**Example prompts:**
- *"Is envbert ready?"*
- *"Classify this: 'weathered shale was encountered below the surface with fluvial deposits'"*
- *"Here's a 12-paragraph site report — classify each section and flag anything you're not confident about."*

---

## envbert-mcp vs envbert-agent / envbert-api — which to use

| | This package (raw envbert) | envbert-agent / envbert-api |
|---|---|---|
| Used by | Claude / MCP clients | CLI, pipelines, non-LLM consumers |
| LLM fallback | None | Yes — Ollama (local) or Azure |
| Typical latency | <1s always | <1s confident, ~10s on fallback |
| External dependencies | None beyond envbert | Ollama or Azure OpenAI |
| Confidence on ambiguous text | Raw model score only | LLM-resolved final label |
| Why this shape | Claude can reason over raw scores itself — a second hidden LLM call is redundant | No reasoning layer downstream; the agent must resolve ambiguity itself |

Both are legitimate — they're solving for different consumers of the
classification, not competing implementations of the same thing.

---

## Configuration

| Variable | Default | Description |
|---|---|---|
| `LOG_LEVEL` | `INFO` | Logging verbosity |

No other configuration needed — there's no backend URL, no LLM provider,
no API keys. That's the point.

---

## Development

```bash
pip install -e ".[dev]"
pytest tests/ -v
```

Tests mock `envbert_predict()` directly — no real model download needed
to run the suite.

## License

MIT
