Metadata-Version: 2.4
Name: strands-env
Version: 0.1.0
Summary: RL environment abstraction for Strands agents — step, observe, reward.
Author-email: Yuan He <yuanhe.cs.ai@gmail.com>
License: Apache-2.0
License-File: LICENSE
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Education
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.10
Requires-Dist: strands-agents-tools
Requires-Dist: strands-sglang>=0.1.2
Provides-Extra: dev
Requires-Dist: build>=0.10.0; extra == 'dev'
Requires-Dist: pre-commit; extra == 'dev'
Requires-Dist: pytest-asyncio>=0.21.0; extra == 'dev'
Requires-Dist: pytest-cov>=4.0.0; extra == 'dev'
Requires-Dist: pytest-mock>=3.0.0; extra == 'dev'
Requires-Dist: pytest>=7.0.0; extra == 'dev'
Requires-Dist: ruff>=0.1.0; extra == 'dev'
Requires-Dist: twine>=4.0.0; extra == 'dev'
Description-Content-Type: text/markdown

# strands-env

RL environments for [Strands](https://github.com/strands-agents/sdk-python) agents — step, observe, reward.

> `strands-agents` is designed for serving, not training. `strands-env` integrates [`strands-sglang`](https://github.com/horizon-rl/strands-sglang) to bridge this gap.

## Define an environment

Subclass `Environment` and customize your tools:

```python
from strands_tools import calculator
from strands_env.core.environment import Environment

class MathEnv(Environment):
    def get_tools(self):
        return [calculator]
```

## Run it

```python
env = MathEnv(model_factory=factory, reward_fn=reward_fn)
result = await env.step(Action(message="What is 2^10?", task_context=TaskContext(ground_truth="1024")))

result.observation.final_response   # "1024"
result.observation.tokens           # TokenObservation (SGLang only, for on-policy RL training)
result.reward.reward                # 1.0
result.termination_reason           # task_complete
```

Each `step()` runs a full agent loop (reasoning + tool calls), not a single model call. Strands' hook-based design makes it easy to customize what happens within each step.

## Install

```bash
pip install strands-env
```

For development:

```bash
git clone <repo-url> && cd strands-env
pip install -e ".[dev]"
```

See [`examples/math_env.py`](examples/math_env.py) for a complete runnable example:

```bash
python examples/math_env.py --backend sglang --sglang-base-url http://localhost:30000
python examples/math_env.py --backend bedrock --model-id us.anthropic.claude-sonnet-4-20250514
```

## Development

```bash
# Lint
ruff check src/ && ruff format --check src/

# Unit tests
pytest tests/unit/ -v

# Integration tests (requires running SGLang server)
pytest tests/integration/ -v --sglang-base-url=http://localhost:30000
```

## License

Apache License 2.0 - see [LICENSE](LICENSE).
