Not connected
🎯
Provide input and click Run Inference
📋 Request / Response

Models

Active

0

WinML CLI Catalog

Loading catalog…
💡 Live gauges use GET /v1/resources. For deep NPU/GPU counters, click Run winml perf on the Models page — results include HWMonitor data (requires --monitor flag in winml perf).
Process RSS
MB (Python + ORT + model)
Request Count
total requests
Uptime
seconds
Inference Latency — last 20 requests
p50 ms p95 ms
20 requests agonow
Process Memory (RSS) — rolling
current: MB
oldestnow

Tools

💡 OpenAI-compatible function-calling tool definitions. Auto-generated from the model manifest.
Loading tools…

MCP Server

💡 Model Context Protocol (MCP) tools for Claude integration. Use with claude-sdk or anthropic-sdk.
Loading MCP schema…
Client Integration Guide
WinML CLI exposes inference via MCP, OpenAI-compatible tools, REST API, and more. Pick your integration method below.
Skillrecommended
MCP
OpenAI Tools
Agent Frameworks
REST API
Reference
Claude Desktop
Full MCP support with auto-discovered tools
  1. Start WinML CLI serve: winml serve <model>
  2. Edit %APPDATA%\Claude\claude_desktop_config.json
  3. Add the config below, then restart Claude Desktop
The standalone scripts/mcp_server.py avoids importing PyTorch, keeping startup under 4s.
Claude Code (.mcp.json)
Project-scoped MCP for Claude Code IDE / CLI
  1. Place .mcp.json in the project root
  2. Claude Code auto-discovers MCP servers on startup
Claude Code CLI (one-liner)
Register via command line
How it works
MCP server starts alongside Claude and auto-discovers loaded models via GET /v1/models. Each model becomes a uniquely-named tool:
Image tasks → classification_image_<model>(image_path, top_k) — reads local file, multipart upload
Text tasks → classification_text_<model>(text, top_k) — JSON POST
Always available → list_models() — enumerate loaded models
Anthropic SDK (Python)
Use WinML CLI tools with Claude API tool_use
OpenAI SDK (Python)
Works with any OpenAI-compatible provider
GET /v1/tools returns OpenAI function-calling format. GET /v1/mcp-schema returns MCP format. Both are auto-generated from the model manifest.
Tool-calling loop pattern
Full agentic loop: Claude decides when to call WinML CLI
WinML CLI exposes standard REST endpoints that any agent framework can call. Below are copy-paste examples for the most popular frameworks.
LangChain / LangGraph
Custom tool with @tool decorator
Semantic Kernel
Native function plugin
AutoGen / AG2
Register function for ConversableAgent
CrewAI
Custom BaseTool subclass
OpenAI Agents SDK
Function tool for the Agents framework
Image inference (file upload)
Multipart upload — best for image tasks
Image inference (base64 JSON)
Base64-encoded image in JSON body
Text inference
JSON body — for NLP / text classification
Python (httpx)
Programmatic access without any SDK
What is a Skill?
Skills are reusable prompt templates that AI coding agents (Claude Code, Cursor, Windsurf, etc.) can load automatically. Create a skill to give the agent persistent knowledge of your WinML CLI server — it learns the available endpoints, input formats, and usage patterns without needing MCP or tool registration.
Claude Code
Save as .claude/skills/winmlcli-inference.md
Cursor / Windsurf
Save as .cursorrules or .windsurfrules
Generic (any agent)
System prompt snippet — paste into any AI agent's instructions
When to use Skills vs MCP vs Tools
Skill — Agent calls REST API via shell/code. Zero setup, works everywhere. Best for quick integration and custom workflows.
MCP — Agent auto-discovers typed tools. Richer UX, but requires MCP-compatible client and server process.
OpenAI Tools — Programmatic tool-calling loop via SDK. Best for custom applications and pipelines.
Combine them: Skill for domain knowledge + MCP for native tool calling = best of both.
API Endpoints
Inference
POST /v1/predict — JSON inputs (text, base64 image, raw tensors)
POST /v1/predict/file — Image file upload (multipart)

Discovery
GET /v1/tools — OpenAI function-calling tool definitions
GET /v1/mcp-schema — MCP tool definitions
GET /v1/models — All loaded models and status
GET /v1/schema — Request/response schema for current model
GET /v1/models/{id}/schema — Schema for a specific model
GET /v1/hub — Model catalog (built-in + user cache)

Management
GET /v1/health — Server status and uptime
POST /v1/models — Load a new model (multi-model mode)
DELETE /v1/models/{id} — Unload a model
POST /v1/ep — Switch execution provider
GET /v1/resources — Memory and request stats
GET /v1/models/{id}/stats — Live latency stats for a model
GET /v1/logs — Poll recent log lines
POST /v1/cli/{command} — Run any winml CLI command
Server Info
Loading...