Extends the single-topic document library into a named, multi-collection store. Each collection has its own description used in the tool schema so Gemma can route queries to the right collection. The UI gains two-level navigation (collections → sources) and CRUD for collections.
sha256(source_file::chunk_index).
New: sha256(collection::source_file::chunk_index).
The same filename in two collections gets distinct IDs — no upsert collision.
collection metadata field.
update() replaces metadata entirely. Instead:
get chunks (with documents + embeddings), upsert with new IDs + merged metadata, delete old IDs.
Embeddings are reused — no Ollama call needed.
list_collections() reads from config, not from Chroma metadata.
Empty collections (no chunks yet) still appear. Chunk counts come from Chroma as a secondary query.
_handle_request extracts collection arg; passes to DocumentService.search().
Tests added in tests/test_search_library_tool.py.
--list shows per-collection breakdown, --delete requires --collection.
collection metadata field (e.g. "mba"). Single ChromaDB collection
(collective.documents) retained. Moving a source = delete old chunks + upsert new chunks
with updated collection — embeddings reused from Chroma, no re-embedding.
sha256(f"{collection}::{source_file}::{chunk_index}").
Finance.pdf in "mba" and Finance.pdf in "econ" are fully independent rows with no ID collision.
list_collections() reads the collections list from config — not from Chroma.
Chroma is queried only for chunk/source counts. Empty collections appear in the UI immediately.
Renaming/deleting a collection updates both the config and the Chroma metadata field.
collection enum parameter when >1 collection exists.
One collection → schema identical to current (description = collection description, no enum).
Multiple collections → collection string enum added; each value's description is the collection description.
Current config/documents.yaml:
collection: collective.documents topic: "MBA Textbooks" chunk_size: 1500 …
New config/documents.yaml:
collection: collective.documents # ChromaDB collection name — unchanged
chunk_size: 1500
chunk_overlap: 200
n_results: 5
embed_model: nomic-embed-text
chroma_path: .chroma
collections:
- name: mba
display_name: "MBA Textbooks"
description: "MBA textbooks covering strategy, finance, marketing, and operations"
The old topic key is removed. If collections is absent, the tool falls back to a generic
description with no enum parameter (same as today with no topic).
Collections view (default):
Sources view (drill-down on collection row click):
| Change | Detail |
|---|---|
| _chunk_id(collection, source, idx) | Add collection to hash key. Old signature was (source, idx) — update all callers. |
| ingest_file(path, collection, …) | Pass collection through to _ingest_pdf and _upsert_chunks. Stored in chunk metadata. |
| ingest_text(text, source, collection, …) | Same — collection required parameter. |
| search(query, collection=None, n=None) | collection=None → where={"type":"document"} (all). collection=X → $and filter on type + collection. |
| list_sources(collection=None) | Returns list[str] of unique source_file values. If collection given, filter by metadata; else all. |
| list_sources_detail(collection=None) | Returns list[dict] with source_file + chunk_count. Used by UI for table display. |
| list_collections() | Reads collections list from documents.yaml config. Returns list[dict] with name, display_name, description + chunk_count from Chroma. |
| delete_source(source, collection) | Filter by source_file AND collection, delete matching IDs. |
| move_source(source, from_col, to_col) | Get chunks (docs + embeddings + metas) → new IDs with to_col → upsert → delete old IDs. No Ollama call. |
| delete_collection(name) | Delete all chunks where collection == name. Then remove from documents.yaml. |
| count(collection=None) | Count chunks matching type=document (+ optional collection filter). |
move_source implementation: result = coll.get(where={…}, include=["documents","embeddings","metadatas"]) → build new_ids with new collection → coll.upsert(ids=new_ids, documents=…, embeddings=…, metadatas=new_metas) → coll.delete(ids=old_ids). Metadata merge is simply {**old_meta, "collection": to_col}.
| Change | Detail |
|---|---|
| _build_schema() | Reads collections list from documents.yaml. Zero/one collection → no enum, description = collection description (or generic). Two+ collections → add collection enum parameter; each enum value's description = collection description. |
| _handle_request() | Extract args["collection"] (optional). Pass to _search(). |
| _search(query, collection=None) | Pass collection to docs.search(). Result header shows collection name if provided. |
# Single collection (no enum)
{
"name": "search_library",
"description": "Search MBA Textbooks: MBA textbooks covering strategy, finance…",
"parameters": { "query": {"type": "string"} }
}
# Multiple collections
{
"name": "search_library",
"description": "Search the document library. Choose the collection that best matches your query.",
"parameters": {
"query": {"type": "string"},
"collection": {
"type": "string",
"enum": ["mba", "econ"],
"description": "mba: MBA textbooks (strategy, finance, ops); econ: macro/micro research papers"
}
}
}
list_collections(); ✎ inline rename dialog; 🗑 delete with confirmation; "+ Collection" dialog (name + display_name + description); click row to drill inlist_sources_detail(collection); "Move to…" QComboBox per row; 🗑 delete source; "+ Files" / "+ Folder" ingests into current collection; editable description bar with Save → triggers schema re-announce_IngestWorker gains collection field; passed to ingest_file(path, collection)| Change | Detail |
|---|---|
| --collection NAME | Required for ingest; optional for --list (show all if omitted); required for --delete. |
| --list | With no --collection: show all collections from config with source/chunk counts. With --collection: show sources in that collection. |
| --delete FILE --collection NAME | Delete named source from named collection. |
| File | Change |
|---|---|
| config/documents.yaml | Remove topic; add collections list |
| src/local/services/document_service.py | collection in chunk ID; collection param throughout; list_collections, list_sources_detail, move_source, delete_collection |
| src/local/tools/search_library_tool.py | Dynamic schema; collection arg extraction; result labeling |
| src/local/ui/documents_window.py | Full rewrite — QStackedWidget, collections view, sources view, move/rename/delete |
| scripts/ingest.py | --collection flag; collection-aware --list and --delete |
| tests/test_document_service.py | Tests for collection CRUD, move_source, same filename in two collections, count per collection |
| tests/test_search_library_tool.py | New file — schema generation (1 vs N collections), collection arg routing, result labeling |