Metadata-Version: 2.4
Name: strands-sglang
Version: 0.3.7
Summary: SGLang model provider for Strands Agents SDK with Token-in/Token-out support for agentic RL training.
Project-URL: Homepage, https://github.com/horizon-rl/strands-sglang/
Project-URL: Documentation, https://github.com/horizon-rl/strands-sglang/#readme
Project-URL: Repository, https://github.com/horizon-rl/strands-sglang/
Project-URL: Issues, https://github.com/horizon-rl/strands-sglang/issues
Author-email: Yuan He <yuanhe.cs.ai@gmail.com>
License: Apache-2.0
License-File: LICENSE
Keywords: agentic-rl,agents,ai,llm,reinforcement-learning,rl,sglang,strands
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: >=3.10
Requires-Dist: aiohttp>=3.8.0
Requires-Dist: jinja2
Requires-Dist: pybase64
Requires-Dist: strands-agents
Requires-Dist: transformers<5.0.0,>=4.0.0
Provides-Extra: dev
Requires-Dist: build>=0.10.0; extra == 'dev'
Requires-Dist: ipykernel>=7.1.0; extra == 'dev'
Requires-Dist: mypy>=1.10.0; extra == 'dev'
Requires-Dist: pre-commit; extra == 'dev'
Requires-Dist: pytest-asyncio>=0.21.0; extra == 'dev'
Requires-Dist: pytest-cov>=4.0.0; extra == 'dev'
Requires-Dist: pytest-mock>=3.0.0; extra == 'dev'
Requires-Dist: pytest-timeout>=2.0.0; extra == 'dev'
Requires-Dist: pytest>=7.0.0; extra == 'dev'
Requires-Dist: ruff>=0.6.0; extra == 'dev'
Requires-Dist: strands-agents-tools; extra == 'dev'
Requires-Dist: strands-agents[openai]; extra == 'dev'
Requires-Dist: tiktoken>=0.5.0; extra == 'dev'
Requires-Dist: twine>=4.0.0; extra == 'dev'
Description-Content-Type: text/markdown

# Strands-SGLang

[![CI](https://github.com/horizon-rl/strands-sglang/actions/workflows/test.yml/badge.svg)](https://github.com/horizon-rl/strands-sglang/actions/workflows/test.yml)
[![PyPI](https://img.shields.io/pypi/v/strands-sglang.svg)](https://pypi.org/project/strands-sglang/)
[![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](LICENSE)
[![Ask DeepWiki](https://deepwiki.com/badge.svg)](https://deepwiki.com/horizon-rl/strands-sglang)
[![Notion](https://img.shields.io/badge/Notion-Blog-000?logo=notion&logoColor=fff)](https://www.yuanhe.wiki/Bridging-Agent-Scaffolding-and-RL-Training-with-Strands-SGLang-2e655dc580e680e28c78f6d743ab987f)
[![Stranes-Agents](https://img.shields.io/badge/Strands-Featured-111111?logo=data:image/svg+xml;base64,PHN2ZyB3aWR0aD0iMjkwIiBoZWlnaHQ9IjQ2MyIgdmlld0JveD0iMCAwIDI5MCA0NjMiIGZpbGw9Im5vbmUiIHhtbG5zPSJodHRwOi8vd3d3LnczLm9yZy8yMDAwL3N2ZyI%2BCjxwYXRoIGQ9Ik05Ny4yOTAyIDUyLjc4ODRDODUuMDY3NCA0OS4xNjY3IDcyLjIyMzQgNTYuMTM4OSA2OC42MDE3IDY4LjM2MTZDNjQuOTgwMSA4MC41ODQzIDcxLjk1MjQgOTMuNDI4MyA4NC4xNzQ5IDk3LjA1MDFMMjM1LjExNyAxMzkuNzc1QzI0NS4yMjMgMTQyLjc2OSAyNDYuMzU3IDE1Ni42MjggMjM2Ljg3NCAxNjEuMjI2TDMyLjU0NiAyNjAuMjkxQy0xNC45NDM5IDI4My4zMTYgLTkuMTYxMDcgMzUyLjc0IDQxLjQ4MzUgMzY3LjU5MUwxODkuNTUxIDQxMS4wMDlMMTkwLjEyNSA0MTEuMTY5QzIwMi4xODMgNDE0LjM3NiAyMTQuNjY1IDQwNy4zOTYgMjE4LjE5NiAzOTUuMzU1QzIyMS43ODQgMzgzLjEyMiAyMTQuNzc0IDM3MC4yOTYgMjAyLjU0MSAzNjYuNzA5TDU0LjQ3MzggMzIzLjI5MUM0NC4zNDQ3IDMyMC4zMjEgNDMuMTg3OSAzMDYuNDM2IDUyLjY4NTcgMzAxLjgzMUwyNTcuMDE0IDIwMi43NjZDMzA0LjQzMiAxNzkuNzc2IDI5OC43NTggMTEwLjQ4MyAyNDguMjMzIDk1LjUxMkw5Ny4yOTAyIDUyLjc4ODRaIiBmaWxsPSIjOTg5ODk4Ii8%2BCjxwYXRoIGQ9Ik0yNTkuMTQ3IDAuOTgxODEyQzI3MS4zODkgLTIuNTc0OTggMjg0LjE5NyA0LjQ2NTcxIDI4Ny43NTQgMTYuNzA3NEMyOTEuMzExIDI4Ljk0OTIgMjg0LjI3IDQxLjc1NyAyNzIuMDI4IDQ1LjMxMzhMNzEuMTcyNyAxMDMuNjcxQzQwLjcxNDIgMTEyLjUyMSAzNy4xOTc2IDE1NC4yNjIgNjUuNzQ1OSAxNjguMDgzTDI0MS4zNDMgMjUzLjA5M0MzMDcuODcyIDI4NS4zMDIgMjk5Ljc5NCAzODIuNTQ2IDIyOC44NjIgNDAzLjMzNkwzMC40MDQxIDQ2MS41MDJDMTguMTcwNyA0NjUuMDg4IDUuMzQ3MDggNDU4LjA3OCAxLjc2MTUzIDQ0NS44NDRDLTEuODIzOSA0MzMuNjExIDUuMTg2MzcgNDIwLjc4NyAxNy40MTk3IDQxNy4yMDJMMjE1Ljg3OCAzNTkuMDM1QzI0Ni4yNzcgMzUwLjEyNSAyNDkuNzM5IDMwOC40NDkgMjIxLjIyNiAyOTQuNjQ1TDQ1LjYyOTcgMjA5LjYzNUMtMjAuOTgzNCAxNzcuMzg2IC0xMi43NzcyIDc5Ljk4OTMgNTguMjkyOCA1OS4zNDAyTDI1OS4xNDcgMC45ODE4MTJaIiBmaWxsPSIjMDBGRjc3Ii8%2BCjwvc3ZnPgo%3D&logoWidth=14)](https://strandsagents.com/latest/documentation/docs/community/model-providers/sglang/)

SGLang model provider for [Strands Agents SDK](https://github.com/strands-agents/sdk-python) with Token-in/Token-out rollouts for on-policy agentic RL training (no retokenization drift).

## Features

This package is designed to make the serving-oriented agent scaffold [Strands Agents SDK](https://github.com/strands-agents/sdk-python) training-ready by exposing end-to-end, token-level rollouts from SGLang while reusing Strands’ customizable agent loop.

- **Token-In/Token-Out** rollouts (token IDs + logprobs/masks): no retokenization drift
- **Strict, on-policy tool-call parsing**: no heuristic repair or post-processing; tool calls are parsed exactly as generated by models
- **Native SGLang `/generate`**: high-throughput, non-streaming rollouts

> For RL environment integration, please refer to [`strands-env`](https://github.com/horizon-rl/strands-env)


## Requirements

- Python 3.10+
- Strands Agents SDK
- SGLang server running with your model
- HuggingFace tokenizer for the model

## Installation

```bash
pip install strands-sglang strands-agents-tools
```

Or install from source with development dependencies:

```bash
git clone https://github.com/horizon-rl/strands-sglang.git
cd strands-sglang
pip install -e ".[dev]"
```

## Quick Start

### 1. Start SGLang Server

```bash
python -m sglang.launch_server \
    --model-path Qwen/Qwen3.5-4B \
    --port 30000 \
    --host 0.0.0.0
```

### 2. Basic Agent

```python
import asyncio
from transformers import AutoTokenizer
from strands import Agent
from strands_tools import calculator
from strands_sglang import SGLangClient, SGLangModel
from strands_sglang.tool_parsers import get_tool_parser

async def main():
    client = SGLangClient(base_url="http://localhost:30000")
    tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen3.5-4B")
    model = SGLangModel(client=client, tokenizer=tokenizer, tool_parser=get_tool_parser("qwen_xml"))
    agent = Agent(model=model, tools=[calculator])

    result = await agent.invoke_async("What is 25 * 17?")
    print(result)

    # Access token data for RL training
    print(f"Tokens: {model.token_manager.token_ids}")
    print(f"Loss mask: {model.token_manager.loss_mask}")
    print(f"Logprobs: {model.token_manager.logprobs}")

asyncio.run(main())
```

## Training with `slime`

For RL training with [slime](https://github.com/THUDM/slime/), `SGLangModel` eliminates the retokenization step, see an concrete example at [slime/examples/strands_sglang](https://github.com/THUDM/slime/tree/main/examples/strands_sglang):

```python
import logging
from strands import Agent, tool
from strands_sglang import SGLangModel, ToolLimiter, get_client_from_slime_args
from strands_sglang.tool_parsers import HermesToolParser
from slime.rollout.sglang_rollout import GenerateState
from slime.utils.types import Sample

SYSTEM_PROMPT = "..."
MAX_TOOL_ITERS = 5
MAX_TOOL_CALLS = None  # No limit


@tool
def execute_python_code(code: str):
    """Execute Python code and return the output."""
    ...


async def generate(args, sample: Sample, sampling_params) -> Sample:
    """Generate with tokens captured during generation, no retokenization."""
    assert not args.partial_rollout, "Partial rollout not supported."

    state = GenerateState(args)
    model = SGLangModel(
        tokenizer=state.tokenizer,
        client=get_client_from_slime_args(args),  # this is lru-cached client
        tool_parser=HermesToolParser(),  # tool parsing for wrapped JSON tool calls
        sampling_params=sampling_params,
    )

    tool_limiter = ToolLimiter(max_tool_iters=MAX_TOOL_ITERS, max_tool_calls=MAX_TOOL_CALLS)
    agent = Agent(
        model=model,
        tools=[execute_python_code],
        hooks=[tool_limiter],
        callback_handler=None,
        system_prompt=SYSTEM_PROMPT,
    )

    # Don't set --apply-chat-template in rollout args, it will make user prompt wrapped twice
    prompt = sample.prompt if isinstance(sample.prompt, str) else sample.prompt[0]["content"]

    try:
        await agent.invoke_async(prompt)
        sample.status = Sample.Status.COMPLETED
    except Exception as e:
        # Default all failed rollouts to TRUNCATED; customize your logic here if needed
        sample.status = Sample.Status.TRUNCATED
        logger.warning(f"TRUNCATED: {type(e).__name__}: {e}")

    # Extract token trajectory from token_manager
    tm = model.token_manager
    prompt_len = len(tm.segments[0])  # system + user are first segment
    sample.tokens = tm.token_ids
    sample.loss_mask = tm.loss_mask[prompt_len:]
    sample.rollout_log_probs = tm.logprobs[prompt_len:]
    sample.response_length = len(sample.tokens) - prompt_len
    sample.response = model.tokenizer.decode(sample.tokens[prompt_len:], skip_special_tokens=False)

    # Record tool call stats for reward computation if needed
    # Multiple parallel tool calls count as one tool_iter
    sample.tool_iters = tool_limiter.tool_iter_count
    sample.tool_calls = tool_limiter.tool_call_count

    model.reset()
    agent.cleanup()
    return sample
```

## Testing

```bash
# Unit tests
pytest tests/unit/ -v

# Integration tests (requires SGLang server)
pytest tests/integration/ -v --sglang-base-url=http://localhost:30000
```

## Contributing

Contributions welcome! Install pre-commit hooks for code style and commit message validation:

```bash
pip install -e ".[dev]"
pre-commit install -t pre-commit -t commit-msg
```

This project uses [Conventional Commits](https://www.conventionalcommits.org/). Commit messages must follow the format:

```
<type>(<scope>): <description>

# Examples:
feat(client): add retry backoff configuration
fix(sglang): handle empty response from server
docs: update usage examples
```

Allowed types: `feat`, `fix`, `docs`, `style`, `refactor`, `perf`, `test`, `build`, `ci`, `chore`, `revert`

## Related Projects

- [agent-core-rl-toolkit](https://github.com/awslabs/agentcore-rl-toolkit)  - RL training toolkit with Bedrock AgentCore
- [strands-vllm](https://github.com/agents-community/strands-vllm) - Community vLLM provider for Strands Agents SDK

## License

Apache License 2.0 - see [LICENSE](LICENSE).
