Metadata-Version: 2.4
Name: helium-agent
Version: 0.1.1
Summary: A terminal-focused AI agent with RAG and local tool capabilities
Requires-Python: >=3.10
Description-Content-Type: text/markdown
Requires-Dist: rich
Requires-Dist: requests
Requires-Dist: numpy
Requires-Dist: prompt-toolkit
Requires-Dist: python-dotenv
Requires-Dist: pyyaml
Requires-Dist: duckduckgo-search
Requires-Dist: beautifulsoup4
Requires-Dist: fastapi
Requires-Dist: uvicorn[standard]
Requires-Dist: playwright
Provides-Extra: voice
Requires-Dist: speechrecognition>=3.10.0; extra == "voice"
Requires-Dist: kokoro>=0.9.0; extra == "voice"
Requires-Dist: sounddevice>=0.4.0; extra == "voice"
Requires-Dist: pyaudio>=0.2.11; extra == "voice"
Requires-Dist: openwakeword>=0.6.0; extra == "voice"
Requires-Dist: mlx-whisper>=0.1.0; extra == "voice"

<img src="assets/Helium-agent-logo.png" width="1080" height="480" alt="Helium Agent Logo"/>

# Helium Agent

> **Important:** Voice support is still under development so kindly use TEXT mode.

Helium is a local-first AI assistant with a voice pipeline, tool-calling agent loop, structured memory, RAG support and an optional web chat UI. It is designed for macOS and Apple Silicon, with local STT through MLX Whisper, wake-word detection through OpenWakeWord, TTS through Kokoro, and an LLM brain served from your own llama.cpp or Ollama-compatible local stack.

> If you directly want to try go to [Docker](#docker) Section

## What It Can Do

- **Answer everyday questions:** Just like any other agent it can repsond to any mundane queries you might have. It won't judge you.
- **Tool calling:** Helium can calls tools it has to perform complex operations in order to respond to your queries.
- **Research:** For queries that include an `in-depth` knowledge and information retrieval Helium will take help of its research tool to provide with most accurate repsonse with proper citations.
- **Web Search:** It can use `DuckDuckGoSearch` API to get web results and if necessary it will use `playwright` to dig deeper into complex websites all to make sure you get the best answer.
- **RAG:** Currently a simple RAG pipeline is integrated where only 1 file at a time can be given to Helium and it will respond accordingly. [Future plans to scale this]
- **Bash execution:** Helium can perform `safe` bash operations in its terminal.
- **Long-term memory:** It uses a `in-memory sqlite` database which is currently session-scoped to remember important facts.

## Project Structure

```text
Helium/
├── main.py                 # Voice assistant entry point
├── assistant.py            # Assistant-facing orchestration helpers
├── requirements.txt        # Python dependencies
├── requirements-rag.txt    # Heavy RAG dependencies
├── docker-compose.yml      # Terminal + RAG containers
├── Dockerfile.api          # FastAPI backend image
├── Dockerfile.frontend     # React frontend image
├── Dockerfile.terminal     # Terminal UI image
├── api/
│   └── main.py             # FastAPI WebSocket chat API
├── config/
│   ├── settings.py         # Typed defaults and settings loader
│   └── settings.toml       # Local service, wake, speech, and assistant settings
├── core/
│   ├── llm.py              # LLM response generation and tool loop
│   └── orchestrator.py     # Assistant orchestration layer
├── engine/
│   ├── stt.py              # Speech-to-text handling
│   ├── tts.py              # Text-to-speech handling
│   └── wake_word.py        # Wake-word detection
├── frontend/
│   ├── src/                # React chat interface
│   ├── nginx/              # Static app server config
│   └── package.json        # Vite scripts and frontend dependencies
├── memory/
│   └── graph.py            # Local memory graph support
├── rag_service/            # standalone document intelligence FastAPI service
├── tools/
│   ├── registry.py         # Tool definitions and prompt context
│   ├── file_ops.py         # File creation tool
│   ├── memory_ops.py       # Memory tools
│   ├── system_ops.py       # System tools
│   ├── web_search.py       # Web-search tool entry point
│   ├── search/             # Search providers, planning, ranking, fetching, extraction
│   └── research/           # Research planner, models, pipeline, execution
├── utils/
│   ├── audio.py            # macOS sound cues
│   ├── health.py           # Service health checks
│   ├── history.py          # Command/conversation history helpers
│   └── parser.py           # Robust JSON/tool-call parsing
├── tests/                  # Unit tests for parser, tools, search, memory, and wake word logic
└── .env.example            # Example env file
```

## Prerequisites

Helium is optimized for **macOS on Apple Silicon** because the voice pipeline uses `mlx-whisper` and macOS audio cues. Some server-only pieces can run in containers, but microphone capture and local audio playback are best run directly on macOS.

You will need:

- Python 3.11+
- A working microphone with terminal/app permission [Not needed currently]
- PortAudio dependencies for `pyaudio` and `sounddevice` [Not needed currently]
- A local LLM service, usually `llama.cpp`
- If you have an API endpoint to any LLM you can use that too.
- Optional local SearxNG for local-first web search [No longer needed]
- Node/Bun only if you are developing the frontend outside Docker

Default service URLs are configured in [`config/settings.py`](config/settings.py) and can be overridden in [`config/settings.toml`](config/settings.toml).

## Docker

> Use this if you just want to chat without worrying the technical complexities but make sure to have you `env` configured accordingly.
>
> It will take care of RAG pipeline automatically.

You can build and run the entire terminal application using:

```bash
docker compose up --build
```

You might need to wait for a bit. So, go have a coffee while it is building.

This will run the image:

```bash
docker compose run --rm --service-ports helium
```

The API container is configured to reach host services through `host.docker.internal`. Keep llama.cpp instance running on the host, then update [`docker-compose.yml`](docker-compose.yml) if your ports differ.

## Dev Installation

> ## IMPORTANT: Use this only if you want to run it manually otherwise go to [Docker](#docker) section.

1. Clone the repository:

   ```bash
   git clone <repository-url>
   cd helium-agent
   ```

2. Create and activate a virtual environment:

   ```bash
   python -m venv .venv
   source .venv/bin/activate
   ```

3. Install Python dependencies:

   ```bash
   pip install -r requirements.txt
   pip install -r requirements-rag.txt
   ```

4. Doctor command for RAG check:

   ```bash
   python -m rag_service doctor
   ```

If audio dependencies fail to build, install PortAudio first, then rerun the Python dependency install.

## Local Services

> Note: You can either use llama.cpp or any LLM provider API.

### Start llama.cpp

Run a compatible instruction-tuned GGUF model on port `3000`:

```bash
./llama-server -m /path/to/your/model.gguf -c 4096 --port 3000
```

Helium expects the default completion endpoint to be OPENAI compatible version:

```text
http://127.0.0.1:3000/v1/chat/completion
```

### Use LLM API

If you have an API to any LLM provider then you can use them directly by adding the `API Key` to a `.env` file in the directory.

```text
LLM_API_KEY=your-llm-url-llm-api-key
LLM_API_URL=your-llm-url
```

Look at `.env.example` for more detail.

### Start Playwright

Helium comes with playwright compatibility. So, if you want to get more in-depth results from web you can turn on this feature by updating `use_playwright=true` in `config/settings.toml`

Then install playwright and chromium.

```bash
pip install playwright
playright install chromium
```

> These are not added in `requirements.txt` because Helium aims to be lightweight. But you can do whatever you want!

> Note: Playwright is heavy as it downloads chromium so it can take some of your memory. Use with caution.

### Start RAG pipeline

Helium comes with its own RAG pipeline. This allows you to add files with `@` prefix to the file path to your file. Then you can ask anything about that file.

Currently it is good enough to answer what is inside it, summarize it, and other basic questions. Later I intend to deepen the understanding of the file using local embeddings.

This is an **optional** feature. Look into `rag_service` directory for more detail.

## Run The Assistant

> Note: Only TEXT mode is ready for use.

1. Confirm the LLM service is running.
2. Confirm your web services are running if you want better results.
3. Start Helium:

   ```bash
   python main.py --mode text
   ```

4. Wait for:

Animation to load and welcome message to be shown.

5. Type your query and enjoy Helium.

Example requests:

```text
What is the latest news on AI?
Remember that I prefer concise responses.
Create a file named hello.txt that says hi.
Open Safari.
Compare India and China GDP in 2025.
Why is the Indian Rupee falling recently?
Give me a report on the latest AI regulation changes in the EU.
```

RAG request example:

```text
@README.md what does this project do?
@docs/plan.pdf summarize the risks
```

## Run The Web UI

The web UI has two parts:

- FastAPI backend: WebSocket endpoint at `ws://localhost:8080/ws/chat`
- React/Vite frontend: browser chat interface under [`frontend/`](frontend/)

### Backend

```bash
uvicorn api.main:app --host 0.0.0.0 --port 8080
```

### Frontend

```bash
cd frontend
bun install
bun run dev
```

The frontend opens a WebSocket to port `8080`, so keep the API running while using the browser UI.

## Configuration

Most runtime behavior lives in [`config/settings.toml`](config/settings.toml):

- `services.llama_cpp_url`
- `wake_word.threshold`
- `wake_word.push_to_talk`
- `speech.whisper_model`
- `speech.timeout_seconds`
- `speech.follow_up_timeout_seconds`
- `assistant.tts_voice`
- `assistant.follow_up_mode`
- `assistant.confirm_risky_tools`
- `assistant.persona`

When a key is missing, Helium falls back to defaults in [`config/settings.py`](config/settings.py).

## Testing

Run the test suite from the repository root:

```bash
python -m unittest discover -s tests
```

For the frontend:

```bash
cd frontend
bun run lint
bun run build
```

## Troubleshooting

- **No wake detection:** Check microphone permissions, input device selection, and `wake_word.threshold`.
- **False wakes:** Increase `wake_word.threshold` or `wake_word.required_hits`.
- **No transcription:** Confirm `mlx-whisper`, microphone access, and PortAudio dependencies are working.
- **No LLM response:** Confirm the llama.cpp completion endpoint matches `services.llama_cpp_url`.
- **Search is weak or failing:** Start SearxNG or verify `services.searxng_url`; DDGS fallback may be less consistent.
- **Web UI cannot connect:** Make sure the FastAPI backend is running on port `8080`.
- **Tool call JSON errors:** Check terminal logs. [`utils/parser.py`](utils/parser.py) includes recovery logic, but malformed model output can still skip a tool step.
