Metadata-Version: 2.4
Name: isage-agentic-tooluse-benchmark
Version: 0.1.0.1
Summary: SAGE Tool Use Benchmark - Tool selection and use evaluation framework
Author-email: IntelliStream Team <shuhao_zhang@hust.edu.cn>
License-Expression: MIT
Project-URL: Homepage, https://github.com/intellistream/sage-agentic-tooluse-benchmark
Project-URL: Documentation, https://github.com/intellistream/sage-agentic-tooluse-benchmark#readme
Project-URL: Repository, https://github.com/intellistream/sage-agentic-tooluse-benchmark
Project-URL: Issues, https://github.com/intellistream/sage-agentic-tooluse-benchmark/issues
Keywords: sage,benchmark,tool-selection,tool-use,planning,timing-detection,evaluation,intellistream
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3 :: Only
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.11
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: isage-common
Requires-Dist: isage-libs
Requires-Dist: pyyaml>=6.0
Requires-Dist: pandas>=2.0.0
Requires-Dist: numpy<2.3.0,>=1.26.0
Requires-Dist: typer<1.0.0,>=0.15.0
Requires-Dist: rich<14.0.0,>=13.0.0
Provides-Extra: dev
Requires-Dist: pytest>=7.0.0; extra == "dev"
Requires-Dist: pytest-asyncio>=0.21.0; extra == "dev"
Requires-Dist: pytest-cov>=4.0.0; extra == "dev"
Requires-Dist: black>=23.0.0; extra == "dev"
Requires-Dist: ruff==0.14.6; extra == "dev"
Requires-Dist: pre-commit>=3.0.0; extra == "dev"
Requires-Dist: mypy>=1.0.0; extra == "dev"
Requires-Dist: types-PyYAML>=6.0.0; extra == "dev"
Provides-Extra: all
Requires-Dist: isage-benchmark-agent[dev]; extra == "all"
Dynamic: license-file

# SAGE Agentic Tool Use Benchmark

Configuration-driven experiment framework for evaluating tool selection and use capabilities.

## Features

- **Tool Selection Evaluation**: Tool retrieval and ranking benchmarks
- **Planning Evaluation**: Multi-step planning with tool composition
- **Timing Detection**: Timing judgment for tool invocation decisions

## Quick Start

```bash
# Install
pip install isage-agentic-tooluse-benchmark

# Run tool selection experiment
sage-agentic-tooluse-bench tool-selection --config config/tool_selection_exp.yaml

# Run planning experiment
sage-agentic-tooluse-bench planning --config config/planning_exp.yaml
```

## Documentation

See [benchmark_agent/README.md](src/sage/benchmark/benchmark_agent/README.md) for detailed documentation.

## Development

```bash
# Clone
git clone https://github.com/intellistream/sage-agentic-tooluse-benchmark.git
cd sage-agentic-tooluse-benchmark

# Setup virtual environment
python -m venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate

# Install in development mode
pip install -e ".[dev]"

# Run tests
pytest
```

## License

MIT License - see [LICENSE](LICENSE) for details.
