Metadata-Version: 2.4
Name: envbert-agent
Version: 0.2.0
Summary: Agentic environmental due-diligence text classification using EnvBert and LangGraph
Author: Afreen Aman, Deepak John Reji
License: MIT
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3 :: Only
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Requires-Python: >=3.9
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: EnvBert>=1.0.9
Requires-Dist: langgraph
Requires-Dist: langdetect
Requires-Dist: transformers<4.40,>=4.30
Requires-Dist: torch
Provides-Extra: ollama
Requires-Dist: langchain-community; extra == "ollama"
Provides-Extra: openai
Requires-Dist: langchain-openai; extra == "openai"
Provides-Extra: azure
Requires-Dist: langchain-openai; extra == "azure"
Provides-Extra: dev
Requires-Dist: pytest; extra == "dev"
Requires-Dist: black; extra == "dev"
Requires-Dist: ruff; extra == "dev"
Dynamic: license-file

# EnvBert-Agent

EnvBert-Agent is an agentic AI pipeline for environmental due diligence text classification.

It combines:
- EnvBert (domain-specific transformer backbone)
- Optional LLM fallback
- Agentic routing & confidence arbitration
- Quality control & explainability
- Modular LangGraph orchestration

## Features

- Environmental domain classification using EnvBert
- Confidence-based fallback to LLM
- Agentic workflow orchestration via LangGraph
- CLI interface for quick usage
- Python SDK interface for integration
- Designed for due diligence, remediation, and compliance workflows

---

## Requirements

- Python 3.9+
- Transformers (version constrained for TensorFlow compatibility)
- TensorFlow (required by EnvBert backbone)
- Ollama 


## Ollama Setup

1. Download and install from:
https://ollama.com/download

2. Pull a model

```bash
ollama pull llama3
```

3. Common Error

```code
ConnectionRefusedError: localhost:11434
```

It means Ollama is not running.

4. Fix:

```bash
ollama serve
```
By default, it runs at:

http://localhost:11434

Keep this running in a separate terminal.


## Using Azure OpenAI

Set environment variables:

Windows:
```
set AZURE_OPENAI_API_KEY=...
set AZURE_OPENAI_ENDPOINT=...
set AZURE_OPENAI_DEPLOYMENT_NAME=...
```

macOS/Linux:
```
export AZURE_OPENAI_API_KEY=...
export AZURE_OPENAI_ENDPOINT=...
export AZURE_OPENAI_DEPLOYMENT_NAME=...
```


## Installation

Install from PyPI:

```bash
pip install envbert-agent
ollama serve
envbert-agent "BEHP was detected in groundwater.
```

## Python Usage

```python
from envbert_agent import run

text = "BEHP was detected in groundwater"
result = run(text)

print(result)

from envbert_agent.config import LLMConfig
config = LLMConfig(provider="azure")

result = run(
    "BEHP was detected in groundwater",
    llm_config=config
)

print(result)
```

## CLI Usage

After installation:

```bash
envbert-agent "BEHP was detected in groundwater"
envbert-agent "BEHP was detected in groundwater" --provider azure
```
Example output:

```bash
[INPUT]
raw_text: BEHP was the only SVOC detected in groundwater at concentrations exceeding the MCL.
clean_text: BEHP was the only SVOC detected in groundwater at concentrations exceeding the MCL.
language: en
quality_score: 1.0

[CLASSIFICATION]
envbert_label: Remediation Standards
envbert_confidence: 0.437928160341927
route: llm
llm_label: Contaminants
llm_confidence: 0.9
final_label: Contaminants
final_confidence: 0.9

[META]
llm_reasoning: The text mentions a specific contaminant (BEHP) and its concentration exceeding the Maximum Contaminated Level (MCL), indicating the presence of contaminants in groundwater.
decision_trace: LLM fallback
key_phrases: ['behp', 'svoc', 'detected', 'groundwater', 'concentrations', 'exceeding']

[MONITORING]
drift_flag: False
```

```bash
[INPUT]
raw_text: soil and groundwater are both contaminated on the site
clean_text: soil and groundwater are both contaminated on the site
language: en
quality_score: 1.0

[CLASSIFICATION]
envbert_label: Contaminated media
envbert_confidence: 0.7678613004581079
route: accept
llm_label: N/A
llm_confidence: N/A
final_label: Contaminated media
final_confidence: 0.7678613004581079

[META]
llm_reasoning: N/A
decision_trace: EnvBert backbone
key_phrases: ['soil', 'groundwater', 'both', 'contaminated', 'site']

[MONITORING]
drift_flag: False
```

## Direct CLI Invocation from Python

```python
from envbert_agent.cli import main

main(["BEHP was detected in groundwater"])
```

## Architecture: Graph Edges & Flow

```text
START
  ↓
[preprocess] ────────────────┐
  ↓                           │
[envbert] ────────────────────┤
  ↓                           │
[arbitrate]                   │  Conditional Router:
  ├─ (quality < 0.4)          │      route = "review"
  ├─ (confidence ≥ 0.75) ─────┼──────→ route = "accept"
  └─ (confidence < 0.75) ─────┘      route = "llm"
       ↓
    ┌──────────┬──────────┬──────────┐
    ↓          ↓          ↓
  (review)   (accept)   (llm)
    ↓          ↓          ↓
    │ ─────────→[llm]←─────
            ↓
        [evaluate]
            ↓
        [explain]
            ↓
        [monitor]
            ↓
           END
```

## License

MIT License

See the LICENSE file for details.


## ⚠️ Notes

- This package depends on the EnvBert backbone.

- Transformers version is constrained for TensorFlow compatibility.

- Future versions may migrate to a PyTorch backend for improved compatibility and lighter installation footprint.
