GET /v1/resources. For deep NPU/GPU counters, click
Run winml perf on the Models page — results include HWMonitor data
(requires --monitor flag in winml perf).
claude-sdk or anthropic-sdk.
winml serve <model>%APPDATA%\Claude\claude_desktop_config.jsonscripts/mcp_server.py avoids importing PyTorch, keeping startup under 4s.
.mcp.json in the project rootGET /v1/models. Each model becomes a uniquely-named tool:
classification_image_<model>(image_path, top_k) — reads local file, multipart uploadclassification_text_<model>(text, top_k) — JSON POSTlist_models() — enumerate loaded models
GET /v1/tools returns OpenAI function-calling format. GET /v1/mcp-schema returns MCP format. Both are auto-generated from the model manifest.
.claude/skills/winmlcli-inference.md.cursorrules or .windsurfrulesPOST /v1/predict — JSON inputs (text, base64 image, raw tensors)POST /v1/predict/file — Image file upload (multipart)GET /v1/tools — OpenAI function-calling tool definitionsGET /v1/mcp-schema — MCP tool definitionsGET /v1/models — All loaded models and statusGET /v1/schema — Request/response schema for current modelGET /v1/models/{id}/schema — Schema for a specific modelGET /v1/hub — Model catalog (built-in + user cache)GET /v1/health — Server status and uptimePOST /v1/models — Load a new model (multi-model mode)DELETE /v1/models/{id} — Unload a modelPOST /v1/ep — Switch execution providerGET /v1/resources — Memory and request statsGET /v1/models/{id}/stats — Live latency stats for a modelGET /v1/logs — Poll recent log linesPOST /v1/cli/{command} — Run any winml CLI command