Design options · skills search · generated 13 May 2026
This document translates plan/skills-search.md into design and deployment choices. The target feature indexes the external huggingface/skills repository into Meilisearch, then exposes relevant skill results through the existing Agent Finder CLI and REST surfaces.
The plan’s strongest architecture separates content building from serving. A Hugging Face Job generates portable artifacts; the Agent Finder Space consumes them and syncs Meilisearch when stale.
agentfinder skills build parses SKILL.md, chunks sections, writes artifact.latest/ and commits/<sha>/ contain NDJSON, manifest, and _SUCCESS.This is the best initial production option because the source-of-truth artifact is portable, inspectable, reproducible, and independent of Meilisearch internal backup formats.
If Meilisearch or the skills artifact is unavailable, the existing Hugging Face Spaces backend should still serve /search; health/log metadata should report skills-index unavailability.
These are not open options unless implementation uncovers a blocker. They define the baseline around which deployment options should be evaluated.
Integration code should use the Python meilisearch package, not raw HTTP, for index configuration, load, and search.
Meilisearch dumps/snapshots can exist for operational backup, but should not be the long-lived content handoff format.
agentfinder owns schema, ingestion, Meili settings, sync, and API integration. huggingface/skills remains external source content.
Parse each SKILL.md, split by Markdown headings, preserve heading ancestry, and chunk long sections with overlap.
Search requests should ask Meili for _rankingScore and carry it into Agent Finder result score and/or metadata.
Indexed skills are an additive backend. Existing Hugging Face Space search, MCP discovery, and generated Space skills continue to work.
The initial document model is intentionally simple: one searchable document per skill section, with source metadata repeated on every hit.
| Option | Shape | Pros | Cons | Recommendation |
|---|---|---|---|---|
| A. Section documents | One doc per Markdown heading section from SKILL.md; split very long sections into parts. | Good search granularity; preserves context through heading_path; easy to debug in NDJSON. | Multiple hits can point to the same skill; needs grouping later if UI wants skill-level summaries. | Use first |
| B. Whole skill documents | One doc per SKILL.md. | Simplest load and result shape; no duplicate skill hits. | Poorer relevance for large skills; less precise snippet/context; more noise in ranking. | Useful fallback only |
| C. AST-aware chunks | Markdown AST parsing with semantic chunking of lists, code blocks, and sections. | Better chunk boundaries and future display quality. | More implementation complexity; not needed for first production validation. | Later improvement |
| D. Include supporting files | Index SKILL.md plus related text files. | Broader recall for rich skills with references. | Schema and relevance tuning become harder; more source-file edge cases. | Reserve kind=supporting_file for later |
The build step writes two required files and one completion marker. Consumers should only read a directory after _SUCCESS is present.
/bucket/agentfinder/hf-skills/
latest/
hf-skills.ndjson
manifest.json
_SUCCESS
commits/
<commit>/
hf-skills.ndjson
manifest.json
_SUCCESSThe Space compares loaded state to source_commit and schema_version. If unchanged, it avoids reloading. If stale, it configures the index and loads hf-skills.ndjson.
{
"schema_version": 1,
"source_repo": "huggingface/skills",
"source_branch": "main",
"source_commit": "<sha>",
"document_count": 435,
"index": "hf_skills",
"artifact": "hf-skills.ndjson"
}There are two distinct deployment choices: where artifacts are built, and where Meilisearch runs. The plan favors artifact build in an HF Job and leaves Meili placement as an open question.
| Option | Description | Strengths | Risks / costs | Best use |
|---|---|---|---|---|
| 1. HF Job builds artifact; Space syncs; colocated Meili | Job writes NDJSON to shared storage. Agent Finder Space starts Meilisearch locally or in the same deployment envelope and syncs on startup. | Simple operational boundary; no separate external service; artifact remains canonical. | Space startup complexity; process supervision; persistence semantics depend on Space storage; resource contention. | Likely first hosted implementation |
| 2. HF Job builds artifact; Space syncs; external Meili | Agent Finder Space connects to separately hosted Meilisearch with durable storage. | Cleaner runtime separation; better durability and scaling; easier backups. | Needs service hosting, networking, secrets, and lifecycle management outside the Space. | Production scale / shared Meili |
| 3. HF Job builds and loads Meili directly | Job writes artifact and, if MEILI_URL is reachable, also runs agentfinder skills load. | Space startup stays fast; index is ready before serving. | Job needs network access and secrets; failure path can be split between artifact write and index load. | Good once Meili endpoint is stable |
| 4. Space builds from GitHub on startup | Space clones huggingface/skills, builds docs, and loads Meili during boot. | Few moving pieces; no separate artifact builder. | Slow and fragile startup; repeated work; GitHub dependency at serve time; harder to make atomic. | Local experiments only |
| 5. GitHub Actions builds artifact | A CI workflow builds NDJSON and uploads to bucket or release assets. | Familiar automation; easy logs; independent of HF Job availability. | Cross-repo triggering and credentials; less aligned with intended HF-hosted data path. | Fallback or mirror builder |
| 6. Meili dump/snapshot as deploy artifact | Build imports data into Meili, exports dump/snapshot, runtime restores it. | Potentially fast restore for same Meili version. | Version-coupled; less inspectable; explicitly rejected as source-of-truth in plan. | Operational backup only |
The existing /search endpoint already returns generated Space skills for application/ai-skill. Indexed repository skills need a routing or merge policy.
Add a source selector such as source=huggingface-skills so clients can explicitly search indexed skills without changing current default behavior.
Blend indexed skills with current Space-generated skills for omitted or application/ai-skill searches. Requires score normalization and deduplication policy.
Add a dedicated route for skills search. This isolates behavior but weakens the single Agent Finder discovery surface.
{
"identifier": "huggingface-skills:<doc-id>",
"displayName": "<skill_name>",
"description": "<title or skill_description>",
"mediaType": "application/ai-skill",
"url": "https://github.com/huggingface/skills/blob/<commit>/.../SKILL.md",
"score": 0.9679,
"metadata": {
"source": "huggingface/skills",
"sourceType": "meilisearch",
"skill": "huggingface-datasets",
"path": "skills/huggingface-datasets/SKILL.md",
"commit": "<commit>",
"rankingScore": 0.9679
}
}The proposed package splits pure document construction, artifact I/O, Meili operations, and sync policy. That keeps CLI, HTTP, and deployment hooks thin.
| Module | Responsibility | Design note |
|---|---|---|
skills_index/documents.py | Clone/walk/parse/chunk huggingface/skills into documents. | Mostly pure functions over fixture directories; easiest to test thoroughly. |
skills_index/artifacts.py | Read/write NDJSON and manifest.json. | Owns schema version and completion marker behavior. |
skills_index/meili.py | Configure, load, and search Meilisearch using the SDK. | Owns searchable/displayed/filterable/sortable attributes and ranking score flags. |
skills_index/sync.py | Compare artifact manifest with loaded index state and load only when stale. | Encapsulates startup/admin refresh idempotence. |
hf_skills.py | High-level Hugging Face skills source entrypoints. | Good place for API-facing search adapter that returns Agent Finder SearchResult. |
cli.py additions | agentfinder skills build/load/search/sync. | CLI should call the same core modules used by HTTP/deploy paths. |
latest/ artifact, manifest, and _SUCCESS; Space consumes it.AGENTFINDER_MEILI_TEST_URL.ty already enforces._SUCCESS.MEILI_MASTER_KEY and HF tokens must remain environment/request scoped.| Question | Suggested default | Reason |
|---|---|---|
| Merge indexed skills with Space-generated skills by default? | Not initially; require explicit source/filter. | Preserves current behavior and avoids score normalization problems. |
| Should result URLs point to GitHub blob, raw URL, or Agent Finder route? | Start with GitHub blob URL; add Agent Finder route later if clients need direct installable full skill/section documents. | Blob URLs are human-readable, stable for commits, and already in the planned schema. |
Index only SKILL.md or supporting files too? | Only SKILL.md in first production pass. | Lower schema/relevance complexity; supporting files can use reserved kind=supporting_file later. |
| Where should Meilisearch run? | Start colocated if operationally simple; move external when durability/scale requires it. | Keeps first deployment contained while preserving artifact portability. |
| Where does loaded manifest metadata live? | Sidecar file first; use Meili metadata/index document if SDK support is clean. | Sidecar is simple and avoids coupling to uncertain SDK metadata APIs. |
Is top-level score spec-compatible? | Keep top-level score because current SearchResult has it; also include metadata.rankingScore. | Matches existing repository model while retaining metadata fallback. |
The biggest deployment decision is whether Meili is colocated with the Space or separately hosted.
Bucket promotion must be atomic or guarded by _SUCCESS; consumers must never read in-progress builds.
Meili ranking scores are not automatically comparable with Hugging Face semantic Space scores, so merging needs care.