Metadata-Version: 2.4
Name: its_hub
Version: 0.1.3a1
Summary: A Python library for inference-time scaling LLMs
Author-email: Kai Xu and the Red Hat AI Innovation Team <xuk@redhat.com>
License-Expression: Apache-2.0
Project-URL: Homepage, https://github.com/Red-Hat-AI-Innovation-Team/its_hub
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Operating System :: OS Independent
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.11
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: openai>=1.75.0
Requires-Dist: tqdm>=4.65.0
Requires-Dist: typing-extensions>=4.0.0
Requires-Dist: reward-hub==0.1.0
Requires-Dist: backoff>=2.2.0
Provides-Extra: dev
Requires-Dist: pytest>=7.0.0; extra == "dev"
Requires-Dist: pytest-cov>=4.1.0; extra == "dev"
Dynamic: license-file

# A Python library for inference-time scaling LLMs

[![Tests](https://github.com/Red-Hat-AI-Innovation-Team/its_hub/actions/workflows/tests.yml/badge.svg)](https://github.com/Red-Hat-AI-Innovation-Team/its_hub/actions/workflows/tests.yml)
[![codecov](https://codecov.io/gh/Red-Hat-AI-Innovation-Team/its_hub/graph/badge.svg?token=6WD8NB9YPN)](https://codecov.io/gh/Red-Hat-AI-Innovation-Team/its_hub)

Example: Using the particle filtering from `[1]` for inference-time scaling

```python
from its_hub.utils import SAL_STEP_BY_STEP_SYSTEM_PROMPT
from its_hub.lms import OpenAICompatibleLanguageModel, StepGeneration
from its_hub.algorithms import ParticleFiltering
from its_hub.integration.reward_hub import LocalVllmProcessRewardModel

# NOTE launched via `CUDA_VISIBLE_DEVICES=0 vllm serve Qwen/Qwen2.5-Math-1.5B-Instruct --dtype float16`
lm = OpenAICompatibleLanguageModel(
    endpoint="http://0.0.0.0:8000/v1", 
    api_key="NO_API_KEY", 
    model_name="Qwen/Qwen2.5-Math-1.5B-Instruct", 
    system_prompt=SAL_STEP_BY_STEP_SYSTEM_PROMPT, 
)
prompt = r"Let $a$ be a positive real number such that all the roots of \[x^3 + ax^2 + ax + 1 = 0\]are real. Find the smallest possible value of $a.$" # question from MATH500
budget = 8

sg = StepGeneration("\n\n", 32, r"\boxed")
prm = LocalVllmProcessRewardModel(
    model_name="Qwen/Qwen2.5-Math-PRM-7B", device="cuda:1", aggregation_method="prod"
)
scaling_alg = ParticleFiltering(sg, prm)

scaling_alg.infer(lm, prompt, budget, show_progress=True) # => gives output
```

`[1]`: Isha Puri, Shivchander Sudalairaj, Guangxuan Xu, Kai Xu, Akash Srivastava. “A Probabilistic Inference Approach to Inference-Time Scaling of LLMs using Particle-Based Monte Carlo Methods”, 2025.

## Installation

Latest release from PyPI

```sh
pip install its_hub
```

Latest main branch
```sh
pip install git+https://github.com/Red-Hat-AI-Innovation-Team/its_hub.git
```

## Benchmark

There is a script at `scripts/benchmark.py` that can be used to benchmark inference-time scaling algorithms.
The CLI of the script is self-contained so the usage can be checked via `python scripts/benchmark.py --help`.
Example command:
```
python scripts/benchmark.py --benchmark aime-2024 --model_name Qwen/Qwen2.5-Math-1.5B-Instruct --alg particle-filtering --rm_device cuda:1 --endpoint http://0.0.0.0:8000/v1 --shuffle_seed 1110 --does_eval --budgets 1,2,4,8,16,32,64 --rm_agg_method model
```

## Development

```sh
git clone https://github.com/Red-Hat-AI-Innovation-Team/its_hub.git
cd its_hub
pip install -e ".[dev]"
pytest tests
```
