Metadata-Version: 2.4
Name: xinference
Version: 2.8.0
Summary: Model Serving Made Easy
Home-page: https://github.com/xorbitsai/inference
Author: Qin Xuye
Author-email: qinxuye@xprobe.io
Maintainer: Qin Xuye
Maintainer-email: qinxuye@xprobe.io
License: Apache License 2.0
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Programming Language :: Python :: Implementation :: CPython
Classifier: Topic :: Software Development :: Libraries
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: xoscar>=0.9.6
Requires-Dist: torch
Requires-Dist: gradio<6.0.0
Requires-Dist: pillow
Requires-Dist: click<8.2.0
Requires-Dist: tqdm>=4.27
Requires-Dist: tabulate
Requires-Dist: requests
Requires-Dist: aiohttp
Requires-Dist: pydantic
Requires-Dist: fastapi>=0.110.3
Requires-Dist: uvicorn
Requires-Dist: huggingface-hub>=0.19.4
Requires-Dist: typing_extensions
Requires-Dist: modelscope>=1.19.0
Requires-Dist: sse_starlette>=1.6.5
Requires-Dist: openai>=1.40.0
Requires-Dist: python-jose[cryptography]
Requires-Dist: bcrypt>=4.0.0
Requires-Dist: aioprometheus[starlette]>=23.12.0
Requires-Dist: nvidia-ml-py
Requires-Dist: pynvml>=12
Requires-Dist: async-timeout
Requires-Dist: peft<=0.17.1
Requires-Dist: timm
Requires-Dist: setproctitle
Requires-Dist: uv
Provides-Extra: dev
Requires-Dist: cython>=0.29; extra == "dev"
Requires-Dist: pytest>=3.5.0; extra == "dev"
Requires-Dist: pytest-cov>=2.5.0; extra == "dev"
Requires-Dist: pytest-timeout>=1.2.0; extra == "dev"
Requires-Dist: pytest-forked>=1.0; extra == "dev"
Requires-Dist: pytest-asyncio>=0.14.0; extra == "dev"
Requires-Dist: pytest-mock>=3.11.1; extra == "dev"
Requires-Dist: ipython>=6.5.0; extra == "dev"
Requires-Dist: sphinx>=3.0.0; extra == "dev"
Requires-Dist: pydata-sphinx-theme>=0.3.0; extra == "dev"
Requires-Dist: sphinx-intl>=0.9.9; extra == "dev"
Requires-Dist: jieba>=0.42.0; extra == "dev"
Requires-Dist: flake8>=3.8.0; extra == "dev"
Requires-Dist: black; extra == "dev"
Requires-Dist: openai>=1.40.0; extra == "dev"
Requires-Dist: anthropic; extra == "dev"
Requires-Dist: langchain; extra == "dev"
Requires-Dist: langchain-community; extra == "dev"
Requires-Dist: langchain-openai; extra == "dev"
Requires-Dist: orjson; extra == "dev"
Requires-Dist: sphinx-tabs; extra == "dev"
Requires-Dist: sphinx-design; extra == "dev"
Provides-Extra: otel
Requires-Dist: opentelemetry-api>=1.20.0; extra == "otel"
Requires-Dist: opentelemetry-sdk>=1.20.0; extra == "otel"
Requires-Dist: opentelemetry-exporter-otlp-proto-http>=1.20.0; extra == "otel"
Requires-Dist: opentelemetry-exporter-otlp-proto-grpc>=1.20.0; extra == "otel"
Requires-Dist: opentelemetry-instrumentation-fastapi>=0.41b0; extra == "otel"
Requires-Dist: opentelemetry-instrumentation-httpx>=0.41b0; extra == "otel"
Provides-Extra: all
Requires-Dist: anthropic; extra == "all"
Requires-Dist: xllamacpp>=0.2.0; extra == "all"
Requires-Dist: transformers>=4.53.3; extra == "all"
Requires-Dist: torch; extra == "all"
Requires-Dist: accelerate>=0.28.0; extra == "all"
Requires-Dist: sentencepiece; extra == "all"
Requires-Dist: transformers_stream_generator; extra == "all"
Requires-Dist: bitsandbytes; sys_platform == "linux" and extra == "all"
Requires-Dist: protobuf; extra == "all"
Requires-Dist: einops; extra == "all"
Requires-Dist: tiktoken; extra == "all"
Requires-Dist: optimum; extra == "all"
Requires-Dist: attrdict; extra == "all"
Requires-Dist: timm>=0.9.16; extra == "all"
Requires-Dist: torchvision; extra == "all"
Requires-Dist: peft; extra == "all"
Requires-Dist: eva-decord; extra == "all"
Requires-Dist: jj-pytorchvideo; extra == "all"
Requires-Dist: qwen-vl-utils!=0.0.9; extra == "all"
Requires-Dist: qwen_omni_utils; extra == "all"
Requires-Dist: datamodel_code_generator; extra == "all"
Requires-Dist: jsonschema; extra == "all"
Requires-Dist: blobfile; extra == "all"
Requires-Dist: vllm>=0.2.6; sys_platform == "linux" and extra == "all"
Requires-Dist: xxhash; extra == "all"
Requires-Dist: mlx-lm>=0.21.5; (sys_platform == "darwin" and platform_machine == "arm64") and extra == "all"
Requires-Dist: mlx-vlm>=0.3.4; (sys_platform == "darwin" and platform_machine == "arm64") and extra == "all"
Requires-Dist: mlx-whisper; (sys_platform == "darwin" and platform_machine == "arm64") and extra == "all"
Requires-Dist: f5-tts-mlx; (sys_platform == "darwin" and platform_machine == "arm64") and extra == "all"
Requires-Dist: mlx-audio; (sys_platform == "darwin" and platform_machine == "arm64") and extra == "all"
Requires-Dist: qwen_vl_utils!=0.0.9; extra == "all"
Requires-Dist: tomli; extra == "all"
Requires-Dist: sentence-transformers>=3.1.0; extra == "all"
Requires-Dist: FlagEmbedding; extra == "all"
Requires-Dist: datasets>=3.4.0; extra == "all"
Requires-Dist: FlagEmbedding; extra == "all"
Requires-Dist: datasets>=3.4.0; extra == "all"
Requires-Dist: diffusers>=0.32.0; extra == "all"
Requires-Dist: controlnet_aux; extra == "all"
Requires-Dist: deepcache; extra == "all"
Requires-Dist: verovio>=4.3.1; extra == "all"
Requires-Dist: transformers>=4.53.3; extra == "all"
Requires-Dist: tiktoken>=0.6.0; extra == "all"
Requires-Dist: accelerate>=0.28.0; extra == "all"
Requires-Dist: torch; extra == "all"
Requires-Dist: torchvision; extra == "all"
Requires-Dist: gguf; extra == "all"
Requires-Dist: diffusers>=0.32.0; extra == "all"
Requires-Dist: imageio-ffmpeg; extra == "all"
Requires-Dist: funasr==1.2.7; extra == "all"
Requires-Dist: omegaconf~=2.3.0; extra == "all"
Requires-Dist: nemo_text_processing<=1.1.0; sys_platform == "linux" and extra == "all"
Requires-Dist: WeText; extra == "all"
Requires-Dist: librosa; extra == "all"
Requires-Dist: xxhash; extra == "all"
Requires-Dist: torch>=2.0.0; extra == "all"
Requires-Dist: torchaudio>=2.0.0; extra == "all"
Requires-Dist: ChatTTS>=0.2.1; extra == "all"
Requires-Dist: tiktoken; extra == "all"
Requires-Dist: lightning>=2.0.0; extra == "all"
Requires-Dist: hydra-core>=1.3.2; extra == "all"
Requires-Dist: inflect; extra == "all"
Requires-Dist: conformer; extra == "all"
Requires-Dist: diffusers>=0.32.0; extra == "all"
Requires-Dist: gdown; extra == "all"
Requires-Dist: pyarrow; extra == "all"
Requires-Dist: HyperPyYAML; extra == "all"
Requires-Dist: onnxruntime>=1.16.0; extra == "all"
Requires-Dist: pyworld>=0.3.4; extra == "all"
Requires-Dist: loguru; extra == "all"
Requires-Dist: natsort; extra == "all"
Requires-Dist: loralib; extra == "all"
Requires-Dist: ormsgpack; extra == "all"
Requires-Dist: cachetools; extra == "all"
Requires-Dist: silero-vad; extra == "all"
Requires-Dist: vector-quantize-pytorch<=1.17.3,>=1.14.24; extra == "all"
Requires-Dist: torchdiffeq; extra == "all"
Requires-Dist: x_transformers>=1.31.14; extra == "all"
Requires-Dist: pypinyin; extra == "all"
Requires-Dist: tomli; extra == "all"
Requires-Dist: vocos; extra == "all"
Requires-Dist: librosa; extra == "all"
Requires-Dist: jieba; extra == "all"
Requires-Dist: soundfile; extra == "all"
Requires-Dist: cached_path; extra == "all"
Requires-Dist: unidic-lite; extra == "all"
Requires-Dist: cn2an; extra == "all"
Requires-Dist: mecab-python3; extra == "all"
Requires-Dist: num2words; extra == "all"
Requires-Dist: pykakasi; extra == "all"
Requires-Dist: fugashi; extra == "all"
Requires-Dist: g2p_en; extra == "all"
Requires-Dist: anyascii; extra == "all"
Requires-Dist: gruut[de,es,fr]; extra == "all"
Requires-Dist: kokoro>=0.7.15; extra == "all"
Requires-Dist: misaki[en,zh]>=0.7.15; extra == "all"
Requires-Dist: langdetect; extra == "all"
Requires-Dist: pyloudnorm; extra == "all"
Requires-Dist: json5; extra == "all"
Requires-Dist: munch; extra == "all"
Requires-Dist: matplotlib; extra == "all"
Requires-Dist: flatten_dict; extra == "all"
Requires-Dist: julius; extra == "all"
Requires-Dist: tensorboard; extra == "all"
Requires-Dist: randomname; extra == "all"
Requires-Dist: argbind; extra == "all"
Provides-Extra: intel
Requires-Dist: torch==2.1.0a0; extra == "intel"
Requires-Dist: intel_extension_for_pytorch==2.1.10+xpu; extra == "intel"
Provides-Extra: llama-cpp
Requires-Dist: xllamacpp>=0.2.0; extra == "llama-cpp"
Provides-Extra: transformers
Requires-Dist: transformers>=4.53.3; extra == "transformers"
Requires-Dist: torch; extra == "transformers"
Requires-Dist: accelerate>=0.28.0; extra == "transformers"
Requires-Dist: sentencepiece; extra == "transformers"
Requires-Dist: transformers_stream_generator; extra == "transformers"
Requires-Dist: bitsandbytes; sys_platform == "linux" and extra == "transformers"
Requires-Dist: protobuf; extra == "transformers"
Requires-Dist: einops; extra == "transformers"
Requires-Dist: tiktoken; extra == "transformers"
Requires-Dist: optimum; extra == "transformers"
Requires-Dist: attrdict; extra == "transformers"
Requires-Dist: timm>=0.9.16; extra == "transformers"
Requires-Dist: torchvision; extra == "transformers"
Requires-Dist: peft; extra == "transformers"
Requires-Dist: eva-decord; extra == "transformers"
Requires-Dist: jj-pytorchvideo; extra == "transformers"
Requires-Dist: qwen-vl-utils!=0.0.9; extra == "transformers"
Requires-Dist: qwen_omni_utils; extra == "transformers"
Requires-Dist: datamodel_code_generator; extra == "transformers"
Requires-Dist: jsonschema; extra == "transformers"
Requires-Dist: blobfile; extra == "transformers"
Provides-Extra: transformers-quantization
Requires-Dist: bitsandbytes; sys_platform == "linux" and extra == "transformers-quantization"
Requires-Dist: gptqmodel; extra == "transformers-quantization"
Requires-Dist: autoawq!=0.2.6; sys_platform != "darwin" and extra == "transformers-quantization"
Requires-Dist: datasets>=3.4.0; extra == "transformers-quantization"
Provides-Extra: vllm
Requires-Dist: vllm>=0.2.6; sys_platform == "linux" and extra == "vllm"
Requires-Dist: xxhash; extra == "vllm"
Provides-Extra: sglang
Requires-Dist: sglang[srt]>=0.4.2.post4; sys_platform == "linux" and extra == "sglang"
Provides-Extra: mlx
Requires-Dist: mlx-lm>=0.21.5; (sys_platform == "darwin" and platform_machine == "arm64") and extra == "mlx"
Requires-Dist: mlx-vlm>=0.3.4; (sys_platform == "darwin" and platform_machine == "arm64") and extra == "mlx"
Requires-Dist: mlx-whisper; (sys_platform == "darwin" and platform_machine == "arm64") and extra == "mlx"
Requires-Dist: f5-tts-mlx; (sys_platform == "darwin" and platform_machine == "arm64") and extra == "mlx"
Requires-Dist: mlx-audio; (sys_platform == "darwin" and platform_machine == "arm64") and extra == "mlx"
Requires-Dist: qwen_vl_utils!=0.0.9; extra == "mlx"
Requires-Dist: tomli; extra == "mlx"
Provides-Extra: embedding
Requires-Dist: sentence-transformers>=3.1.0; extra == "embedding"
Requires-Dist: FlagEmbedding; extra == "embedding"
Requires-Dist: datasets>=3.4.0; extra == "embedding"
Provides-Extra: rerank
Requires-Dist: FlagEmbedding; extra == "rerank"
Requires-Dist: datasets>=3.4.0; extra == "rerank"
Provides-Extra: image
Requires-Dist: diffusers>=0.32.0; extra == "image"
Requires-Dist: controlnet_aux; extra == "image"
Requires-Dist: deepcache; extra == "image"
Requires-Dist: verovio>=4.3.1; extra == "image"
Requires-Dist: transformers>=4.53.3; extra == "image"
Requires-Dist: tiktoken>=0.6.0; extra == "image"
Requires-Dist: accelerate>=0.28.0; extra == "image"
Requires-Dist: torch; extra == "image"
Requires-Dist: torchvision; extra == "image"
Requires-Dist: gguf; extra == "image"
Provides-Extra: video
Requires-Dist: diffusers>=0.32.0; extra == "video"
Requires-Dist: imageio-ffmpeg; extra == "video"
Provides-Extra: audio
Requires-Dist: funasr==1.2.7; extra == "audio"
Requires-Dist: omegaconf~=2.3.0; extra == "audio"
Requires-Dist: nemo_text_processing<=1.1.0; sys_platform == "linux" and extra == "audio"
Requires-Dist: WeText; extra == "audio"
Requires-Dist: librosa; extra == "audio"
Requires-Dist: xxhash; extra == "audio"
Requires-Dist: torch>=2.0.0; extra == "audio"
Requires-Dist: torchaudio>=2.0.0; extra == "audio"
Requires-Dist: ChatTTS>=0.2.1; extra == "audio"
Requires-Dist: tiktoken; extra == "audio"
Requires-Dist: lightning>=2.0.0; extra == "audio"
Requires-Dist: hydra-core>=1.3.2; extra == "audio"
Requires-Dist: inflect; extra == "audio"
Requires-Dist: conformer; extra == "audio"
Requires-Dist: diffusers>=0.32.0; extra == "audio"
Requires-Dist: gdown; extra == "audio"
Requires-Dist: pyarrow; extra == "audio"
Requires-Dist: HyperPyYAML; extra == "audio"
Requires-Dist: onnxruntime>=1.16.0; extra == "audio"
Requires-Dist: pyworld>=0.3.4; extra == "audio"
Requires-Dist: loguru; extra == "audio"
Requires-Dist: natsort; extra == "audio"
Requires-Dist: loralib; extra == "audio"
Requires-Dist: ormsgpack; extra == "audio"
Requires-Dist: cachetools; extra == "audio"
Requires-Dist: silero-vad; extra == "audio"
Requires-Dist: vector-quantize-pytorch<=1.17.3,>=1.14.24; extra == "audio"
Requires-Dist: torchdiffeq; extra == "audio"
Requires-Dist: x_transformers>=1.31.14; extra == "audio"
Requires-Dist: pypinyin; extra == "audio"
Requires-Dist: tomli; extra == "audio"
Requires-Dist: vocos; extra == "audio"
Requires-Dist: librosa; extra == "audio"
Requires-Dist: jieba; extra == "audio"
Requires-Dist: soundfile; extra == "audio"
Requires-Dist: cached_path; extra == "audio"
Requires-Dist: unidic-lite; extra == "audio"
Requires-Dist: cn2an; extra == "audio"
Requires-Dist: mecab-python3; extra == "audio"
Requires-Dist: num2words; extra == "audio"
Requires-Dist: pykakasi; extra == "audio"
Requires-Dist: fugashi; extra == "audio"
Requires-Dist: g2p_en; extra == "audio"
Requires-Dist: anyascii; extra == "audio"
Requires-Dist: gruut[de,es,fr]; extra == "audio"
Requires-Dist: kokoro>=0.7.15; extra == "audio"
Requires-Dist: misaki[en,zh]>=0.7.15; extra == "audio"
Requires-Dist: langdetect; extra == "audio"
Requires-Dist: pyloudnorm; extra == "audio"
Requires-Dist: json5; extra == "audio"
Requires-Dist: munch; extra == "audio"
Requires-Dist: matplotlib; extra == "audio"
Requires-Dist: flatten_dict; extra == "audio"
Requires-Dist: julius; extra == "audio"
Requires-Dist: tensorboard; extra == "audio"
Requires-Dist: randomname; extra == "audio"
Requires-Dist: argbind; extra == "audio"
Provides-Extra: doc
Requires-Dist: ipython>=6.5.0; extra == "doc"
Requires-Dist: sphinx>=3.0.0; extra == "doc"
Requires-Dist: pydata-sphinx-theme>=0.3.0; extra == "doc"
Requires-Dist: sphinx-intl>=0.9.9; extra == "doc"
Requires-Dist: sphinx-tabs; extra == "doc"
Requires-Dist: sphinx-design; extra == "doc"
Requires-Dist: prometheus_client; extra == "doc"
Requires-Dist: timm; extra == "doc"
Provides-Extra: musa
Requires-Dist: mthreads-ml-py>=2.2.8; extra == "musa"
Requires-Dist: torchada>=0.1.11; extra == "musa"
Provides-Extra: benchmark
Requires-Dist: psutil; extra == "benchmark"
Provides-Extra: anthropic
Requires-Dist: anthropic; extra == "anthropic"
Dynamic: description
Dynamic: description-content-type
Dynamic: license-file

<div align="center">
<img src="./assets/xorbits-logo.png" width="180px" alt="xorbits" />

# Xorbits Inference: Model Serving Made Easy 🤖

<p align="center">
  <a href="https://xinference.io/en">Xinference Enterprise</a> ·
  <a href="https://inference.readthedocs.io/en/latest/getting_started/installation.html#installation">Self-hosting</a> ·
  <a href="https://inference.readthedocs.io/">Documentation</a>
</p>

[![PyPI Latest Release](https://img.shields.io/pypi/v/xinference.svg?style=for-the-badge)](https://pypi.org/project/xinference/)
[![License](https://img.shields.io/pypi/l/xinference.svg?style=for-the-badge)](https://github.com/xorbitsai/inference/blob/main/LICENSE)
[![Build Status](https://img.shields.io/github/actions/workflow/status/xorbitsai/inference/python.yaml?branch=main&style=for-the-badge&label=GITHUB%20ACTIONS&logo=github)](https://actions-badge.atrox.dev/xorbitsai/inference/goto?ref=main)
[![Docker Pulls](https://img.shields.io/docker/pulls/xprobe/xinference?style=for-the-badge&logo=docker)](https://hub.docker.com/r/xprobe/xinference)
[![Discord](https://img.shields.io/badge/join_Discord-5462eb.svg?logo=discord&style=for-the-badge&logoColor=%23f5f5f5)](https://discord.gg/Xw9tszSkr5)
[![Twitter](https://img.shields.io/twitter/follow/xorbitsio?logo=x&style=for-the-badge)](https://twitter.com/xorbitsio)

<p align="center">
  <a href="./README.md"><img alt="README in English" src="https://img.shields.io/badge/English-454545?style=for-the-badge"></a>
  <a href="./README_zh_CN.md"><img alt="简体中文版自述文件" src="https://img.shields.io/badge/中文介绍-d9d9d9?style=for-the-badge"></a>
  <a href="./README_ja_JP.md"><img alt="日本語のREADME" src="https://img.shields.io/badge/日本語-d9d9d9?style=for-the-badge"></a>
</p>

</div>
<br />


Xorbits Inference(Xinference) is a powerful and versatile library designed to serve language, 
speech recognition, and multimodal models. With Xorbits Inference, you can effortlessly deploy 
and serve your or state-of-the-art built-in models using just a single command. Whether you are a 
researcher, developer, or data scientist, Xorbits Inference empowers you to unleash the full 
potential of cutting-edge AI models.

<div align="center">
<i><a href="https://discord.gg/Xw9tszSkr5">👉 Join our Discord community!</a></i>
</div>

## 🔥 Hot Topics
### Framework Enhancements
- Agent-native Serving: Xinference integrates with [Xagent](https://github.com/xorbitsai/xagent) to enable dynamic planning, tool use, and autonomous multi-step reasoning — moving beyond static pipelines.
- Auto batch: Multiple concurrent requests are automatically batched, significantly improving throughput: [#4197](https://github.com/xorbitsai/inference/pull/4197)
- [Xllamacpp](https://github.com/xorbitsai/xllamacpp): New llama.cpp Python binding, maintained by Xinference team, supports continuous batching and is more production-ready.: [#2997](https://github.com/xorbitsai/inference/pull/2997)
- Distributed inference: running models across workers: [#2877](https://github.com/xorbitsai/inference/pull/2877)
- VLLM enhancement: Shared KV cache across multiple replicas: [#2732](https://github.com/xorbitsai/inference/pull/2732)
### New Models
- Built-in support for [MiniMax-M2.7](https://www.minimax.io/models/text/m27): [#4843](https://github.com/xorbitsai/inference/pull/4843)
- Built-in support for [GLM-5.1](https://z.ai/blog/glm-5.1): [#4832](https://github.com/xorbitsai/inference/pull/4832)
- Built-in support for [Qwen3.6](https://github.com/QwenLM/Qwen3.6): [#4831](https://github.com/xorbitsai/inference/pull/4831)
- Built-in support for [Gemma-4](https://deepmind.google/models/gemma/gemma-4/): [#4768](https://github.com/xorbitsai/inference/pull/4768)
- Built-in support for [Qwen3-TTS](https://github.com/QwenLM/Qwen3-TTS): [#4781](https://github.com/xorbitsai/inference/pull/4781)
- Built-in support for [Qwen-3.5](https://github.com/QwenLM/Qwen3.5): [#4639](https://github.com/xorbitsai/inference/pull/4639)
- Built-in support for [GLM-5](https://github.com/zai-org/GLM-5): [#4638](https://github.com/xorbitsai/inference/pull/4638)
- Built-in support for [MiniMax-M2.5](https://github.com/MiniMax-AI/MiniMax-M2.5): [#4630](https://github.com/xorbitsai/inference/pull/4630)
### Integrations
- [Xagent](https://github.com/xorbitsai/xagent): an enterprise agent platform for building and running AI agents with planning, memory, and tool use — not limited to rigid workflows.
- [Dify](https://docs.dify.ai/advanced/model-configuration/xinference): an LLMOps platform that enables developers (and even non-developers) to quickly build useful applications based on large language models, ensuring they are visual, operable, and improvable.
- [FastGPT](https://github.com/labring/FastGPT): a knowledge-based platform built on the LLM, offers out-of-the-box data processing and model invocation capabilities, allows for workflow orchestration through Flow visualization.
- [RAGFlow](https://github.com/infiniflow/ragflow): is an open-source RAG engine based on deep document understanding.
- [MaxKB](https://github.com/1Panel-dev/MaxKB): MaxKB = Max Knowledge Brain, it is a powerful and easy-to-use AI assistant that integrates Retrieval-Augmented Generation (RAG) pipelines, supports robust workflows, and provides advanced MCP tool-use capabilities.


## Key Features
🌟 **Model Serving Made Easy**: Simplify the process of serving large language, speech 
recognition, and multimodal models. You can set up and deploy your models
for experimentation and production with a single command.

⚡️ **State-of-the-Art Models**: Experiment with cutting-edge built-in models using a single 
command. Inference provides access to state-of-the-art open-source models!

🖥 **Heterogeneous Hardware Utilization**: Make the most of your hardware resources with
[ggml](https://github.com/ggerganov/ggml). Xorbits Inference intelligently utilizes heterogeneous
hardware, including GPUs and CPUs, to accelerate your model inference tasks.

⚙️ **Flexible API and Interfaces**: Offer multiple interfaces for interacting
with your models, supporting OpenAI compatible RESTful API (including Function Calling API), RPC, CLI 
and WebUI for seamless model management and interaction.

🌐 **Distributed Deployment**: Excel in distributed deployment scenarios, 
allowing the seamless distribution of model inference across multiple devices or machines.

🔌 **Built-in Integration with Third-Party Libraries**: Xorbits Inference seamlessly integrates
with popular third-party libraries including [LangChain](https://python.langchain.com/docs/integrations/providers/xinference), [LlamaIndex](https://gpt-index.readthedocs.io/en/stable/examples/llm/XinferenceLocalDeployment.html#i-run-pip-install-xinference-all-in-a-terminal-window), [Dify](https://docs.dify.ai/advanced/model-configuration/xinference), and [Chatbox](https://chatboxai.app/).

## Why Xinference
| Feature                                        | Xinference | FastChat | OpenLLM | RayLLM |
|------------------------------------------------|------------|----------|---------|--------|
| OpenAI-Compatible RESTful API                  | ✅ | ✅ | ✅ | ✅ |
| vLLM Integrations                              | ✅ | ✅ | ✅ | ✅ |
| More Inference Engines (GGML, TensorRT)        | ✅ | ❌ | ✅ | ✅ |
| More Platforms (CPU, Metal)                    | ✅ | ✅ | ❌ | ❌ |
| Multi-node Cluster Deployment                  | ✅ | ❌ | ❌ | ✅ |
| Image Models (Text-to-Image)                   | ✅ | ✅ | ❌ | ❌ |
| Text Embedding Models                          | ✅ | ❌ | ❌ | ❌ |
| Multimodal Models                              | ✅ | ❌ | ❌ | ❌ |
| Audio Models                                   | ✅ | ❌ | ❌ | ❌ |
| More OpenAI Functionalities (Function Calling) | ✅ | ❌ | ❌ | ❌ |

## Using Xinference

- **Self-hosting Xinference Community Edition</br>**
Quickly get Xinference running in your environment with this [starter guide](#getting-started).
Use our [documentation](https://inference.readthedocs.io/) for further references and more in-depth instructions.

- **Xinference for enterprise / organizations</br>**
We provide additional enterprise-centric features. [send us an email](mailto:business@xprobe.io?subject=[GitHub]Business%20License%20Inquiry) to discuss enterprise needs. </br>

## Staying Ahead

Star Xinference on GitHub and be instantly notified of new releases.

![star-us](assets/stay_ahead.gif)

## Getting Started

* [Docs](https://inference.readthedocs.io/en/latest/index.html)
* [Built-in Models](https://inference.readthedocs.io/en/latest/models/builtin/index.html)
* [Custom Models](https://inference.readthedocs.io/en/latest/models/custom.html)
* [Deployment Docs](https://inference.readthedocs.io/en/latest/getting_started/using_xinference.html)
* [Examples and Tutorials](https://inference.readthedocs.io/en/latest/examples/index.html)

### Jupyter Notebook

The lightest way to experience Xinference is to try our [Jupyter Notebook on Google Colab](https://colab.research.google.com/github/xorbitsai/inference/blob/main/examples/Xinference_Quick_Start.ipynb).

### Docker 

Nvidia GPU users can start Xinference server using [Xinference Docker Image](https://inference.readthedocs.io/en/latest/getting_started/using_docker_image.html). Prior to executing the installation command, ensure that both [Docker](https://docs.docker.com/get-docker/) and [CUDA](https://developer.nvidia.com/cuda-downloads) are set up on your system.

```bash
docker run --name xinference -d -p 9997:9997 -e XINFERENCE_HOME=/data -v </on/your/host>:/data --gpus all xprobe/xinference:latest xinference-local -H 0.0.0.0
```

### K8s via helm

Ensure that you have GPU support in your Kubernetes cluster, then install as follows.

```
# add repo
helm repo add xinference https://xorbitsai.github.io/xinference-helm-charts

# update indexes and query xinference versions
helm repo update xinference
helm search repo xinference/xinference --devel --versions

# install xinference
helm install xinference xinference/xinference -n xinference --version 0.0.1-v<xinference_release_version>
```

For more customized installation methods on K8s, please refer to the [documentation](https://inference.readthedocs.io/en/latest/getting_started/using_kubernetes.html).

### Quick Start

Install Xinference by using pip as follows. (For more options, see [Installation page](https://inference.readthedocs.io/en/latest/getting_started/installation.html).)

```bash
pip install "xinference[all]"
```

To start a local instance of Xinference, run the following command:

```bash
$ xinference-local
```

Once Xinference is running, there are multiple ways you can try it: via the web UI, via cURL,
 via the command line, or via the Xinference’s python client. Check out our [docs]( https://inference.readthedocs.io/en/latest/getting_started/using_xinference.html#run-xinference-locally) for the guide.

![web UI](assets/screenshot.png)

## Getting involved

| Platform                                                                                        | Purpose                                     |
|-------------------------------------------------------------------------------------------------|---------------------------------------------|
| [Github Issues](https://github.com/xorbitsai/inference/issues)                                  | Reporting bugs and filing feature requests. |
| [Discord](https://discord.gg/Xw9tszSkr5) | Collaborating with other Xinference users.  |
| [Twitter](https://twitter.com/xorbitsio)                                                        | Staying up-to-date on new features.         |

## Citation

If this work is helpful, please kindly cite as:

```bibtex
@inproceedings{lu2024xinference,
    title = "Xinference: Making Large Model Serving Easy",
    author = "Lu, Weizheng and Xiong, Lingfeng and Zhang, Feng and Qin, Xuye and Chen, Yueguo",
    booktitle = "Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: System Demonstrations",
    month = nov,
    year = "2024",
    address = "Miami, Florida, USA",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2024.emnlp-demo.30",
    pages = "291--300",
}
```

## Contributors

<a href="https://github.com/xorbitsai/inference/graphs/contributors">
  <img src="https://contrib.rocks/image?repo=xorbitsai/inference" />
</a>

## Star History

[![Star History Chart](https://api.star-history.com/svg?repos=xorbitsai/inference&type=Date)](https://star-history.com/#xorbitsai/inference&Date)
