Repository explainer · generated 13 May 2026

How hf-discover works

This repository is a thin ARD registry adapter over Hugging Face Spaces. It turns Hugging Face semantic Space search into ARD catalog entries, a CLI, a REST API, and generated SKILL.md artifacts.

Python ≥ 3.14FastAPITyper CLIPydantic v2Hugging Face Spaces backend
6source modules
3artifact kinds
1backend registry
01

Mental model

The core of the project is an adapter pipeline: query Hugging Face’s agent-oriented semantic search, keep only running Spaces, and project each Space into ARD’s value-or-reference catalog envelope.

1Client requestCLI command or POST /search with query text and optional media type.
2Kind selectionapplication/ai-skill, application/mcp-server+json, raw Space JSON, or all.
3HF search/api/spaces/semantic-search with agents=true; MCP adds filter=mcp-server.
4Runtime gateOnly Spaces whose runtime stage is exactly RUNNING are emitted.
5Catalog entriesResults become skill URLs, inline MCP descriptors, or inline raw Space metadata.
02

Main features

The README frames the repository as a small adapter, but it already exposes the same core search through multiple surfaces.

Spaces semantic search

Calls Hugging Face’s semantic Spaces API with agents=true, optional SDK/tag filters, and optional request-scoped token forwarding.

corehf_search.py

ARD REST API

POST /search accepts the ARD-ish search envelope and returns SearchResponse models with compact JSON output.

HTTPserver.py

Generated skills

GET /skills/huggingface/{owner}/{space}/SKILL.md wraps a Space’s agents.md with skill frontmatter and source metadata.

skillhf_spaces.py

MCP server discovery

Spaces tagged mcp-server can be returned as application/mcp-server+json entries pointing at the Gradio MCP endpoint.

MCP/gradio_api/mcp/

CLI surface

discover spaces search and the top-level discover search command share the same search core and can print tables or JSON.

TyperRich

Release/deploy automation

CI and release scripts run locked sync, Ruff format/lint, ty, pytest, package build checks, and deployment hooks for a Hugging Face Space.

GitHub Actionsuv
03

Code structure

The source package is intentionally small. The best reading path is models → HF transport → conversion core → API/CLI wrappers.

FilePrimary roleImportant structures
src/discover/models.pyProtocol-shaped Pydantic models.CatalogEntry, SearchQuery, SearchRequest, SearchResult, SearchResponse
src/discover/hf_search.pyLow-level Hugging Face HTTP client and JSON normalization.HfSemanticSpaceSearcher, SpaceSearchResult, SpaceRuntime
src/discover/hf_spaces.pyCore domain adapter: URLs, identifiers, filtering, result conversion, skill generation.search_hf_spaces, space_to_*_result, build_space_skill_markdown
src/discover/server.pyFastAPI wrapper over the core adapter.create_app, search_discover, token helpers, skill/agents routes
src/discover/cli.pyTyper commands and Rich table formatting.search_alias, spaces_search, serve, _search_response
tests/test_hf_spaces.pyBehavioral coverage with stubs and a local HTTP server.Conversion tests, search filtering, token precedence, OpenAPI examples, generated skill frontmatter

Repository map

src/discover/
├─ models.py ARD envelopes
├─ hf_search.py HF semantic-search client
├─ hf_spaces.py adapter core
├─ server.py FastAPI app
└─ cli.py Typer entry point

spec/ ARD draft + HF search notes
deploy/huggingface-space/ Docker Space runtime
scripts/ release/version checks
04

Data model and artifact kinds

The models mirror the ARD draft: every catalog entry must have identity fields and exactly one content delivery mechanism, either url or data.

Value-or-reference rule

CatalogEntry.validate_value_or_reference rejects entries with both url and data, or neither. This keeps REST results predictable for clients.

models.py · core invariant
if (self.url is None) == (self.data is None):
    raise ValueError("exactly one of url or data must be present")
KindMedia typeDelivery
Generated skillapplication/ai-skillurl to generated SKILL.md
MCP serverapplication/mcp-server+jsonInline data with HTTP transport and MCP URL
Raw HF Spaceapplication/vnd.huggingface.space+jsonInline data with Space metadata
Legacy Space aliasapplication/huggingface-space+jsonAccepted by HTTP routing as raw Space output
05

Request paths

CLI search

The CLI builds a SearchResponse through _search_response, then prints either JSON or a Rich table with score, type, name, SDK, stage, endpoint, and description.

usage
uv run discover spaces search "generate image" --limit 5
uv run discover spaces search "image generation" --kind mcp --json
uv run discover serve --port 8080

HTTP search

search_discover maps request media type to a result kind. Unsupported media types intentionally return an empty result set rather than attempting a best-effort conversion.

POST /search
{
  "query": {
    "text": "remove background from image",
    "mediaType": "application/ai-skill"
  },
  "pageSize": 5
}
06

Important boundaries and safeguards

Network I/O is isolated

hf_search.py owns the semantic search HTTP request. server.py owns fetching remote agents.md and runs it in a threadpool so blocking urllib calls do not block the async route.

Token handling is request-scoped

The server checks X-HF-Authorization, then Authorization, then HF_TOKEN. Header values override configured tokens and are only forwarded downstream.

Runtime readiness is strict

Even when the search call includes non-running Spaces, search_hf_spaces emits only results with runtime stage RUNNING.

Domain selection prefers runtime metadata

When Hugging Face runtime metadata includes a domain, generated app and MCP URLs use it. Otherwise code falls back to the standard {owner}-{space}.hf.space slug.

07

Tests as executable documentation

The single test file covers the meaningful behavior of the adapter without requiring live Hugging Face calls for most cases.

BehaviorEvidence in testsWhy it matters
Skill, Space, and MCP conversionstest_space_to_search_result_*, MCP URL testsEnsures entries satisfy ARD artifact shapes.
Running-only filteringtest_search_hf_spaces_only_returns_running_spacesPrevents dead or building Spaces from being advertised.
HF query constructionLocal HTTPServer captures path and Authorization headerVerifies agents=true, filters, non-running flag, and token forwarding.
HTTP media-type routingtest_discover_search_*Documents empty unsupported-media behavior and MCP routing.
Token precedencehf_token_from_headers and route forwarding testsSupports the README’s request-scoped HF_TOKEN requirement.
OpenAPI clarityExamples and auth header documentation testsMakes the HTTP surface discoverable for clients.
08

Where to change things

Add a new artifact kind

Start in hf_spaces.py with a converter and media type, then update server._result_kind, CLI _result_type, docs, and tests.

Change result metadata

Edit _space_metadata and the relevant space_to_*_result function. Watch the value-or-reference invariant in CatalogEntry.

Change search behavior

Update search_hf_spaces for filtering/limiting semantics, or HfSemanticSpaceSearcher.search_spaces for Hugging Face query parameters.

Change generated skills

Modify build_space_skill_markdown. Tests already assert required frontmatter and source instruction inclusion.

Change HTTP API

Use create_app and search_discover in server.py. Keep CLI and HTTP wrapping the same core logic.

Release or deploy

Use scripts/check-release.sh, GitHub Actions, and deploy/huggingface-space/. README documents the release flow.

09

Gotchas and caveats

  • include_non_running is not an output policy. It is passed to Hugging Face search, but emitted results are still filtered to RUNNING.
  • Default kind differs by layer. Core search_hf_spaces defaults to skill; CLI and HTTP omitted media type default to all.
  • Skill URLs are adapter URLs. Search results point back to this server’s generated skill route, not directly to Hugging Face.
  • MCP is tag-based. MCP entries are only returned for Spaces tagged mcp-server.
  • No persistence layer. This service is a live adapter/proxy. It does not cache search results, store tokens, or maintain a registry database.