images field. PDFs and text files are extracted and prepended as
context. Gemma sees everything natively — no prompt preprocessing or intermediate summarization.
| Type | Extensions | Handling |
|---|---|---|
| Image | .jpg .jpeg .png .gif .webp |
Base64-encode bytes → images field in Ollama user message |
.pdf |
Text extracted via pypdf → prepended to user message content | |
| Text | .txt .md .py .js .ts .yaml .json .csv |
Read directly → prepended to user message content |
| Other | — | Red error chip: "unsupported format" — excluded from payload |
src/local/ui/attachment_bar.py┌──────────────────────────────────────────────────────────────┐
│ ⌁ 📎 diagram.png ✕ 📎 notes.pdf ✕ │
└──────────────────────────────────────────────────────────────┘
The AttachmentBar is a QWidget containing a horizontal flow of chips plus a
paperclip button (⌁) on the left. It is hidden when no files are attached and
becomes visible as soon as the first file is added.
| Method | Description |
|---|---|
add_files(paths: list[str]) |
Process and add files; called by button picker and drag-drop handler |
attachments() → list[dict] |
Returns current [{type, name, data}, …]; only valid attachments included |
clear() |
Remove all chips and reset internal list; called after send |
QPushButton: 📎 filename.ext ✕✕ removes that chip and its attachment from the listattachments() so they are never sentdef _process_file(path: str) -> dict:
ext = Path(path).suffix.lower()
name = Path(path).name
if ext in {".jpg", ".jpeg", ".png", ".gif", ".webp"}:
data = base64.b64encode(Path(path).read_bytes()).decode()
return {"type": "image", "name": name, "data": data}
elif ext == ".pdf":
text = _extract_pdf_text(path) # pypdf PdfReader
return {"type": "text", "name": name, "data": text}
elif ext in {".txt", ".md", ".py", ".js", ".ts", ".yaml", ".json", ".csv"}:
text = Path(path).read_text(errors="replace")
return {"type": "text", "name": name, "data": text}
else:
return {"type": "error", "name": name}
def _extract_pdf_text(path: str) -> str:
from pypdf import PdfReader
reader = PdfReader(path)
return "\n".join(page.extract_text() or "" for page in reader.pages)
The parent input_container in MainWindow sets setAcceptDrops(True) and
overrides dragEnterEvent / dropEvent. Dropped files are forwarded to
self._attachment_bar.add_files(paths). The AttachmentBar itself does not
handle drag events — it only receives already-resolved paths.
src/local/ui/main_window.py┌──────────────────────────────────────────────────────────────────┐
│ 📎 diagram.png ✕ 📎 notes.pdf ✕ │ ← AttachmentBar
│ (hidden │
│ when empty)│
├──────────────────────────────────────────────────────────────────┤
│ Type a query and press Enter… Send │ ← input row
└──────────────────────────────────────────────────────────────────┘
The paperclip button lives inside AttachmentBar (leftmost element). The query
QLineEdit and Send button remain unchanged in their own row below.
_build_conversation_page() changesself._attachment_bar = AttachmentBar()self._attachment_bar into input_container above the existing input rowdragEnterEvent / dropEvent on input_container_send_query() changesdef _send_query(self) -> None:
query = self._query_input.text().strip()
if not query:
return
query_id = str(uuid.uuid4())
attachments = self._attachment_bar.attachments()
envelope = MessageEnvelope.create(
message_type="query",
subject=QUERY_RECEIVED,
sender_id="ui",
payload={
"query": query,
"session_id": self._session_id,
"query_id": query_id,
"attachments": attachments, # [] when none
},
correlation_id=query_id,
metadata={"session_id": self._session_id},
)
self._publisher.publish(envelope)
self._query_input.clear()
self._attachment_bar.clear()
When attachments are present at send time, a dim summary line is shown in the
StreamingResponseWidget below the query timestamp badge:
[attached: diagram.png, notes.pdf]
This requires passing attachment_names: list[str] when creating the widget. The
widget adds a QLabel (objectName attachmentSummary) that is hidden when the list
is empty. BusLogger does not need to change — the names are known in the UI at send
time and passed directly.
src/local/agents/generator_agent.py_handle_query()Extract attachments from the envelope payload and pass them to _build_messages:
attachments = payload.get("attachments") or []
messages = self._build_messages(query, session_id, attachments)
_build_messages(query, session_id, attachments)def _build_messages(
self, query: str, session_id: str | None, attachments: list[dict] | None = None
) -> list[dict]:
history = self._conv.get_history(session_id)
messages: list[dict] = []
if self._system_prompt:
messages.append({"role": "system", "content": self._system_prompt})
messages.extend(history)
# Build user message
content_parts = []
image_b64s = []
max_chars = get_config("generator").get("max_attachment_chars", 8000)
for att in (attachments or []):
if att.get("type") == "text":
text = (att.get("data") or "")[:max_chars]
content_parts.append(f'[Attached: {att["name"]}]\n{text}')
elif att.get("type") == "image":
image_b64s.append(att.get("data", ""))
content_parts.append(query)
user_msg: dict = {"role": "user", "content": "\n\n".join(content_parts)}
if image_b64s:
user_msg["images"] = image_b64s
messages.append(user_msg)
return messages
config/generator.yamlmax_attachment_chars: 8000 # truncation limit per text attachment
requirements.txtpypdf>=4.0
Story file: tests/stories/s9_multimodal.yaml
| Turn | Input | Assertion | Mode |
|---|---|---|---|
| 1 | Attach a .txt file containing a known unique phrase; query: "What does the attached file say?" |
Answer contains the unique phrase from the file | Automated (fixture file) |
| 2 | Attach a .png screenshot; query: "Describe what you see in the image" |
Answer is non-empty and does not start with an error marker | Manual smoke test (vision assertion not deterministic) |
GeneratorAgent._generate to verify that
_build_messages produces the correct content string and that the
query.received envelope carries the expected attachments list.
pypdf>=4.0 to requirements.txt; run pip install pypdfsrc/local/ui/attachment_bar.py — chips, paperclip button, file processingAttachmentBar into MainWindow._build_conversation_page(); add drag-drop_send_query() to include attachments in payload and call clear()StreamingResponseWidgetGeneratorAgent._build_messages() to handle text and image attachmentsmax_attachment_chars: 8000 to config/generator.yaml| File | Change |
|---|---|
src/local/ui/attachment_bar.py | NEW — chip strip, paperclip button, file processing |
src/local/ui/main_window.py | Wire AttachmentBar, drag-drop, attachments in send payload, response card summary |
src/local/agents/generator_agent.py | _build_messages handles text prepend + image list |
config/generator.yaml | Add max_attachment_chars |
requirements.txt | Add pypdf>=4.0 |
tests/stories/s9_multimodal.yaml | NEW — acceptance story |