Design options · skills search · generated 13 May 2026

Adding first-class skills search to Agent Finder

This document translates plan/skills-search.md into design and deployment choices. The target feature indexes the external huggingface/skills repository into Meilisearch, then exposes relevant skill results through the existing Agent Finder CLI and REST surfaces.

Source: huggingface/skillsArtifact: NDJSON + manifestIndex: hf_skillsRuntime: agentfinder SpaceSearch: Meilisearch SDK
NDJSONcanonical artifact
4implementation phases
400+expected skill docs
01

Recommended production shape

The plan’s strongest architecture separates content building from serving. A Hugging Face Job generates portable artifacts; the Agent Finder Space consumes them and syncs Meilisearch when stale.

1huggingface/skillsExternal content repository remains the source input only.
2HF Job buildsagentfinder skills build parses SKILL.md, chunks sections, writes artifact.
3Bucket storeslatest/ and commits/<sha>/ contain NDJSON, manifest, and _SUCCESS.
4Space syncsOn startup or refresh, compare manifest against loaded state.
5Meili servesAgent Finder queries combine or route to indexed skill results.
Recommendation

Use artifact-first indexing with Space sync-on-start

This is the best initial production option because the source-of-truth artifact is portable, inspectable, reproducible, and independent of Meilisearch internal backup formats.

recommendeddecoupledreproducible
Keep current service alive

Skills search should degrade independently

If Meilisearch or the skills artifact is unavailable, the existing Hugging Face Spaces backend should still serve /search; health/log metadata should report skills-index unavailability.

02

Settled design decisions from the plan

These are not open options unless implementation uncovers a blocker. They define the baseline around which deployment options should be evaluated.

Use Meilisearch SDK

Integration code should use the Python meilisearch package, not raw HTTP, for index configuration, load, and search.

Use NDJSON as canonical artifact

Meilisearch dumps/snapshots can exist for operational backup, but should not be the long-lived content handoff format.

Own indexing in this repository

agentfinder owns schema, ingestion, Meili settings, sync, and API integration. huggingface/skills remains external source content.

Start with one document per section

Parse each SKILL.md, split by Markdown headings, preserve heading ancestry, and chunk long sections with overlap.

Expose ranking score

Search requests should ask Meili for _rankingScore and carry it into Agent Finder result score and/or metadata.

Preserve existing Spaces behavior

Indexed skills are an additive backend. Existing Hugging Face Space search, MCP discovery, and generated Space skills continue to work.

03

Index design options

The initial document model is intentionally simple: one searchable document per skill section, with source metadata repeated on every hit.

OptionShapeProsConsRecommendation
A. Section documentsOne doc per Markdown heading section from SKILL.md; split very long sections into parts.Good search granularity; preserves context through heading_path; easy to debug in NDJSON.Multiple hits can point to the same skill; needs grouping later if UI wants skill-level summaries.Use first
B. Whole skill documentsOne doc per SKILL.md.Simplest load and result shape; no duplicate skill hits.Poorer relevance for large skills; less precise snippet/context; more noise in ranking.Useful fallback only
C. AST-aware chunksMarkdown AST parsing with semantic chunking of lists, code blocks, and sections.Better chunk boundaries and future display quality.More implementation complexity; not needed for first production validation.Later improvement
D. Include supporting filesIndex SKILL.md plus related text files.Broader recall for rich skills with references.Schema and relevance tuning become harder; more source-file edge cases.Reserve kind=supporting_file for later
04

Artifact and sync design

Portable artifact contract

The build step writes two required files and one completion marker. Consumers should only read a directory after _SUCCESS is present.

bucket layout
/bucket/agentfinder/hf-skills/
  latest/
    hf-skills.ndjson
    manifest.json
    _SUCCESS
  commits/
    <commit>/
      hf-skills.ndjson
      manifest.json
      _SUCCESS

Manifest controls idempotence

The Space compares loaded state to source_commit and schema_version. If unchanged, it avoids reloading. If stale, it configures the index and loads hf-skills.ndjson.

manifest shape
{
  "schema_version": 1,
  "source_repo": "huggingface/skills",
  "source_branch": "main",
  "source_commit": "<sha>",
  "document_count": 435,
  "index": "hf_skills",
  "artifact": "hf-skills.ndjson"
}
05

Deployment options

There are two distinct deployment choices: where artifacts are built, and where Meilisearch runs. The plan favors artifact build in an HF Job and leaves Meili placement as an open question.

OptionDescriptionStrengthsRisks / costsBest use
1. HF Job builds artifact; Space syncs; colocated MeiliJob writes NDJSON to shared storage. Agent Finder Space starts Meilisearch locally or in the same deployment envelope and syncs on startup.Simple operational boundary; no separate external service; artifact remains canonical.Space startup complexity; process supervision; persistence semantics depend on Space storage; resource contention.Likely first hosted implementation
2. HF Job builds artifact; Space syncs; external MeiliAgent Finder Space connects to separately hosted Meilisearch with durable storage.Cleaner runtime separation; better durability and scaling; easier backups.Needs service hosting, networking, secrets, and lifecycle management outside the Space.Production scale / shared Meili
3. HF Job builds and loads Meili directlyJob writes artifact and, if MEILI_URL is reachable, also runs agentfinder skills load.Space startup stays fast; index is ready before serving.Job needs network access and secrets; failure path can be split between artifact write and index load.Good once Meili endpoint is stable
4. Space builds from GitHub on startupSpace clones huggingface/skills, builds docs, and loads Meili during boot.Few moving pieces; no separate artifact builder.Slow and fragile startup; repeated work; GitHub dependency at serve time; harder to make atomic.Local experiments only
5. GitHub Actions builds artifactA CI workflow builds NDJSON and uploads to bucket or release assets.Familiar automation; easy logs; independent of HF Job availability.Cross-repo triggering and credentials; less aligned with intended HF-hosted data path.Fallback or mirror builder
6. Meili dump/snapshot as deploy artifactBuild imports data into Meili, exports dump/snapshot, runtime restores it.Potentially fast restore for same Meili version.Version-coupled; less inspectable; explicitly rejected as source-of-truth in plan.Operational backup only
06

Agent Finder API integration options

The existing /search endpoint already returns generated Space skills for application/ai-skill. Indexed repository skills need a routing or merge policy.

Option A · Source filter first

Add a source selector such as source=huggingface-skills so clients can explicitly search indexed skills without changing current default behavior.

recommended firstlow risk

Option B · Merge by default later

Blend indexed skills with current Space-generated skills for omitted or application/ai-skill searches. Requires score normalization and deduplication policy.

laterbetter UX

Option C · Separate endpoint

Add a dedicated route for skills search. This isolates behavior but weakens the single Agent Finder discovery surface.

debug/adminless spec-aligned
proposed result shape from plan
{
  "identifier": "huggingface-skills:<doc-id>",
  "displayName": "<skill_name>",
  "description": "<title or skill_description>",
  "mediaType": "application/ai-skill",
  "url": "https://github.com/huggingface/skills/blob/<commit>/.../SKILL.md",
  "score": 0.9679,
  "metadata": {
    "source": "huggingface/skills",
    "sourceType": "meilisearch",
    "skill": "huggingface-datasets",
    "path": "skills/huggingface-datasets/SKILL.md",
    "commit": "<commit>",
    "rankingScore": 0.9679
  }
}
07

Proposed code layout

The proposed package splits pure document construction, artifact I/O, Meili operations, and sync policy. That keeps CLI, HTTP, and deployment hooks thin.

ModuleResponsibilityDesign note
skills_index/documents.pyClone/walk/parse/chunk huggingface/skills into documents.Mostly pure functions over fixture directories; easiest to test thoroughly.
skills_index/artifacts.pyRead/write NDJSON and manifest.json.Owns schema version and completion marker behavior.
skills_index/meili.pyConfigure, load, and search Meilisearch using the SDK.Owns searchable/displayed/filterable/sortable attributes and ranking score flags.
skills_index/sync.pyCompare artifact manifest with loaded index state and load only when stale.Encapsulates startup/admin refresh idempotence.
hf_skills.pyHigh-level Hugging Face skills source entrypoints.Good place for API-facing search adapter that returns Agent Finder SearchResult.
cli.py additionsagentfinder skills build/load/search/sync.CLI should call the same core modules used by HTTP/deploy paths.
08

Implementation phases and acceptance

Phase 1Local pipelineBuild NDJSON + manifest, load 400+ docs into local Meili, CLI search returns scored hits.
Phase 2API integrationAdd Meili-backed adapter behind source/filter option. Existing Spaces tests keep passing.
Phase 3Artifact syncStartup/manual sync loads empty/stale index and skips unchanged artifacts.
Phase 4HF JobJob writes latest/ artifact, manifest, and _SUCCESS; Space consumes it.
09

Testing and operations

Test strategy

  • Pure document-building tests over small temporary fixture directories.
  • Manifest and NDJSON read/write roundtrip tests.
  • CLI smoke tests for build against local fixtures.
  • Meilisearch integration smoke test gated by AGENTFINDER_MEILI_TEST_URL.
  • Avoid tests for type-only DTO properties that ty already enforces.

Operational checks

  • Health metadata should say whether skills search is configured, reachable, and loaded.
  • Logs should include artifact commit, schema version, document count, and load result.
  • Refresh should be idempotent and refuse artifact directories without _SUCCESS.
  • Secrets: MEILI_MASTER_KEY and HF tokens must remain environment/request scoped.
10

Open questions and suggested defaults

QuestionSuggested defaultReason
Merge indexed skills with Space-generated skills by default?Not initially; require explicit source/filter.Preserves current behavior and avoids score normalization problems.
Should result URLs point to GitHub blob, raw URL, or Agent Finder route?Start with GitHub blob URL; add Agent Finder route later if clients need direct installable full skill/section documents.Blob URLs are human-readable, stable for commits, and already in the planned schema.
Index only SKILL.md or supporting files too?Only SKILL.md in first production pass.Lower schema/relevance complexity; supporting files can use reserved kind=supporting_file later.
Where should Meilisearch run?Start colocated if operationally simple; move external when durability/scale requires it.Keeps first deployment contained while preserving artifact portability.
Where does loaded manifest metadata live?Sidecar file first; use Meili metadata/index document if SDK support is clean.Sidecar is simple and avoids coupling to uncertain SDK metadata APIs.
Is top-level score spec-compatible?Keep top-level score because current SearchResult has it; also include metadata.rankingScore.Matches existing repository model while retaining metadata fallback.
11

Primary risks

!

Meili placement uncertainty

The biggest deployment decision is whether Meili is colocated with the Space or separately hosted.

!

Partial artifact reads

Bucket promotion must be atomic or guarded by _SUCCESS; consumers must never read in-progress builds.

!

Ranking comparability

Meili ranking scores are not automatically comparable with Hugging Face semantic Space scores, so merging needs care.