Plan 1 (Phases 1–6) established the core: LLM-native tool calling, episodic memory, pairwise evaluation, and per-participant observability UI. Plan 2 extends Gemma's capabilities with file attachments, new tools, and a persistent knowledge base.
| Phase | Title | Status |
|---|---|---|
| 7 | File attachments + multimodal | ▶ NEXT |
| 8 | Date/time + location tools | ▶ NEXT |
| 9 | Semantic Scholar tool | planned |
| 10 | RAG — persistent document knowledge base | planned |
| 11 | Documentation | planned |
| Type | Extensions | Handling |
|---|---|---|
| Image | .jpg .jpeg .png .gif .webp | Base64-encoded → images field in Ollama message |
Text extracted via pypdf → prepended to user message | ||
| Text | .txt .md .py .js .ts .yaml .json .csv | Read directly → prepended to user message |
| Other | — | Error chip: "unsupported format" |
A paperclip button (⌁) sits to the left of the query input. Clicking it opens a QFileDialog. Drag-and-drop onto the input area also works — dragEnterEvent / dropEvent on the input container.
Attached files appear as small chips above the query input. Each chip shows the filename and an ✕ to remove it. Multiple files are allowed. Chips are cleared after the query is sent.
┌─────────────────────────────────────────────────┐
│ 📎 diagram.png ✕ 📎 notes.pdf ✕ │ ← attachment chips
├─────────────────────────────────────────────────┤
│ What does this architecture diagram show? Send │ ← query input
└─────────────────────────────────────────────────┘
The existing StreamingResponseWidget shows a small attachment summary line: "[attached: diagram.png, notes.pdf]" below the query badge.
Processing happens in the UI thread at send time (files are small, extraction is fast). No background worker needed unless PDFs are very large.
def _process_attachment(path: str) -> dict:
ext = Path(path).suffix.lower()
if ext in {".jpg", ".jpeg", ".png", ".gif", ".webp"}:
data = base64.b64encode(Path(path).read_bytes()).decode()
return {"type": "image", "name": Path(path).name, "data": data}
elif ext == ".pdf":
text = _extract_pdf_text(path) # pypdf
return {"type": "text", "name": Path(path).name, "data": text}
elif ext in {".txt", ".md", ".py", ".js", ...}:
text = Path(path).read_text(errors="replace")
return {"type": "text", "name": Path(path).name, "data": text}
else:
return {"type": "error", "name": Path(path).name}
query.received payload gains an attachments field:
{
"query": "What does this architecture diagram show?",
"session_id": "...",
"query_id": "...",
"attachments": [
{"type": "image", "name": "diagram.png", "data": "<base64>"},
{"type": "text", "name": "notes.pdf", "data": "extracted text..."}
]
}
When attachments are present, the user message is built differently:
[Attached: notes.pdf]
<extracted text (truncated to max_attachment_chars)>
<user query>
images list alongside the content string. Ollama passes these to Gemma's vision encoder.
{
"role": "user",
"content": "What does this architecture diagram show?",
"images": ["<base64>"]
}
images list.max_attachment_chars: 8000 # truncation limit per text attachment
pypdf>=4.0
| File | Purpose |
|---|---|
src/local/ui/attachment_bar.py | Chip strip widget + file processing logic |
| File | Change |
|---|---|
src/local/ui/main_window.py | Add paperclip button, drag-drop, wire AttachmentBar, include attachments in query payload |
src/local/agents/generator_agent.py | _build_messages reads attachments from envelope payload |
config/generator.yaml | Add max_attachment_chars |
requirements.txt | Add pypdf |
pypdf to requirements, installAttachmentBar widget (chips, file picker, drag-drop)MainWindow input areaquery.received payload with attachments_build_messages to handle image and text attachmentsReturns current local date, time, timezone, and day of week. Single system call.
Tool name: get_datetime
Schema: no parameters required
Result: "Tuesday 2026-06-03 09:17:42 PDT (UTC-7)"
No config file needed. Follows the existing tool pattern: announces schema on startup, responds to tool.request.get_datetime.
Reads from config/location.yaml and returns a structured location string. No live geolocation — user sets their own context.
# config/location.yaml
city: "Cupertino"
state: "California"
country: "United States"
timezone: "America/Los_Angeles"
coordinates: "37.3230° N, 122.0322° W" # optional, for distance queries
Tool name: get_location
Result: "Cupertino, California, United States (America/Los_Angeles, 37.3230° N, 122.0322° W)"
| File | Purpose |
|---|---|
src/local/tools/datetime_tool.py | DateTimeTool |
src/local/tools/location_tool.py | LocationTool |
config/location.yaml | User location config |
Tool name: search_papers
Parameters:
query (string, required) — research topic or keywords
limit (integer, optional, default 5) — max papers to return
Uses the Semantic Scholar Graph API (https://api.semanticscholar.org/graph/v1/paper/search). Free, no API key required for basic use (rate limited to 100 req/5min). Optional key for higher limits, stored in .env.
[2026-06-03] Papers: "transformer attention mechanisms"
1. Attention Is All You Need (2017) — Vaswani et al.
Citations: 98,432 | https://semanticscholar.org/paper/...
Transformers dispense with recurrence entirely, relying instead on...
2. ...
max_results: 5
timeout: 15
fields: "title,authors,year,abstract,citationCount,url"
A separate ChromaDB collection (collective.documents) holds chunked, embedded document passages. A search_documents tool lets Gemma query it semantically, same pattern as search_memory.
Two paths:
python scripts/ingest.py path/to/file.pdf — chunks, embeds, stores. Runs offline, no stack needed.Fixed-size chunks (512 tokens) with 64-token overlap. Each chunk stored with metadata: source_file, chunk_index, page (for PDFs), ingested_at.
Tool name: search_documents
Parameters:
query (string, required) — what to look for in the knowledge base
| File | Purpose |
|---|---|
src/local/tools/search_documents_tool.py | RAG search tool |
src/local/services/document_service.py | ChromaDB collection wrapper + chunking |
scripts/ingest.py | CLI ingestion script |
config/documents.yaml | Chunk size, overlap, collection name |
| Phase | Deliverable | Key dependency |
|---|---|---|
| 7 | File attachments, image + text, AttachmentBar UI | pypdf |
| 8 | DateTimeTool, LocationTool, location.yaml | — |
| 9 | SemanticScholarTool, search_papers schema | httpx (already present) |
| 10 | DocumentService, search_documents tool, ingest script | Phase 7 (PDF extraction reuse) |
| 11 | Architecture + developer docs | Stable feature surface |
AttachmentBar. Build 7 before 10.