Metadata-Version: 2.3
Name: costly
Version: 2.3.1
Summary: Estimate costs and running times of complex LLM workflows/experiments/pipelines in advance before spending money, via simulations.
Author: abhimanyu
Author-email: abhimanyupallavisudhir@gmail.com
Requires-Python: >=3.11,<4.0
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Requires-Dist: Faker (>=28.0.0,<29.0.0)
Requires-Dist: instructor (==1.4.2)
Requires-Dist: jsonlines (>=4.0.0,<5.0.0)
Requires-Dist: litellm (>=1.59.6,<2.0.0)
Requires-Dist: openai (>=1.45.1,<2.0.0)
Requires-Dist: pydantic (>=2.8.2,<3.0.0)
Requires-Dist: pytest (>=8.1.1,<9.0.0)
Requires-Dist: pytest-asyncio (>=0.23.7,<0.24.0)
Requires-Dist: pytest-env (>=1.1.3,<2.0.0)
Requires-Dist: pytest-mock (>=3.14.0,<4.0.0)
Requires-Dist: python-dotenv (>=0.19.0,<0.20.0)
Requires-Dist: tiktoken (>=0.7.0,<0.8.0)
Requires-Dist: typing-extensions (>=4.12.2,<5.0.0)
Description-Content-Type: text/markdown

# costly
Estimate costs and running times of complex LLM workflows/experiments/pipelines in advance before spending money, via simulations. Just put `@costly()` on the load-bearing function; make sure all functions that call it pass `**kwargs` to it and call your complex function with `simulate=True` and some `cost_log: Costlog` object. See [examples.ipynb](examples.ipynb) for more details.

(Actually you don't have to pass `cost_log` and `simulate` throughout; you can just define a global `cost_log` and `simulate` at the top of the file that contains your `@costly` functions and pass them as defaults to your `@costly` functions.)

(Pass `@costly(disable_costly=True)` to disable costly for a function. E.g. you may set `disable_costly` to be a global variable and pass it to your `@costly` decorators.)

https://github.com/abhimanyupallavisudhir/costly

## Installation

```bash
pip install costly
```

## Usage

See [examples.ipynb](examples.ipynb) for a full walkthrough; some examples below.

```python

from costly import Costlog, costly, CostlyResponse
from costly.estimators.llm_api_estimation import LLM_API_Estimation as estimator


@costly()
def chatgpt(input_string: str, model: str) -> str:
    from openai import OpenAI

    client = OpenAI()
    response = client.chat.completions.create(
        model=model, messages=[{"role": "user", "content": input_string}]
    )
    output_string = response.choices[0].message.content
    return output_string


@costly(
    input_tokens=lambda kwargs: LLM_API_Estimation.messages_to_input_tokens(
        kwargs["messages"], kwargs["model"]
    ),
)
def chatgpt_messages(messages: list[dict[str, str]], model: str) -> str:
    from openai import OpenAI

    client = OpenAI()
    response = client.chat.completions.create(model=model, messages=messages)
    output_string = response.choices[0].message.content
    return output_string


@costly()
def chatgpt(input_string: str, model: str) -> str:
    from openai import OpenAI

    client = OpenAI()
    response = client.chat.completions.create(
        model=model,
        messages=[
            {"role": "user", "content": input_string},
        ],
    )

    return CostlyResponse(
        output=response.choices[0].message.content,
        cost_info={
            "input_tokens": response.usage.prompt_tokens,
            "output_tokens": response.usage.completion_tokens,
        },
    ) # in usage, this will still just return the output, not the whole CostlyResponse object

```

## Testing

```bash
poetry run pytest -s -m "not slow"
poetry run pytest -s -m "slow"

Tests for instructor currently fail.
```

## TODO

- [x] Make it work with async
- [x] Decide and document what the best way to "propagate" `description` (for breakdown purposes) through function calls is. Have the user manually write `def f(...): ... g(description = kwargs.get("description") + ["f"]`? Add a `@description("blabla")` decorator? Add a `@description` decorator that automatically appends the function name and arguments into `description`?
- [x] Better solution for token counting for Chat messages (search `HACK` in the repo)
- [x] make instructor tests pass https://community.openai.com/t/how-to-calculate-the-tokens-when-using-function-call/266573/11
- [ ] Support for locally run LLMs -- ideally need a cost & time estimator that takes into account your machine details, GPU pricing etc.
- [ ] support more models

Instructor tests don't really pass but I can kinda live with this. Lmk if anyone has a good way to count tokens from messages that includes tool calling (I'm using [this](https://community.openai.com/t/how-to-calculate-the-tokens-when-using-function-call/266573/11)).
```
FAILED tests/test_estimators/test_llm_api_estimation.py::test_estimate_contains_exact_instructor[PERSONINFO_gpt-4o] - AssertionError: ['Input tokens estimate 65 not within 20pc of truth 83']
FAILED tests/test_estimators/test_llm_api_estimation.py::test_estimate_contains_exact_instructor[PERSONINFO_gpt-4o-mini] - AssertionError: ['Input tokens estimate 65 not within 20pc of truth 83']
FAILED tests/test_estimators/test_llm_api_estimation.py::test_estimate_contains_exact_instructor[PERSONINFO_gpt-4-turbo] - AssertionError: ['Input tokens estimate 65 not within 20pc of truth 85']
FAILED tests/test_estimators/test_llm_api_estimation.py::test_estimate_contains_exact_instructor[PERSONINFO_gpt-3.5-turbo] - AssertionError: ['Input tokens estimate 65 not within 20pc of truth 85']
FAILED tests/test_estimators/test_llm_api_estimation.py::test_estimate_contains_exact_instructor[FOOMODEL_gpt-4o] - AssertionError: ['Input tokens estimate 77 not within 20pc of truth 108']
FAILED tests/test_estimators/test_llm_api_estimation.py::test_estimate_contains_exact_instructor[FOOMODEL_gpt-4o-mini] - AssertionError: ['Input tokens estimate 77 not within 20pc of truth 108']
FAILED tests/test_estimators/test_llm_api_estimation.py::test_estimate_contains_exact_instructor[FOOMODEL_gpt-4-turbo] - AssertionError: ['Input tokens estimate 77 not within 20pc of truth 113']
FAILED tests/test_estimators/test_llm_api_estimation.py::test_estimate_contains_exact_instructor[FOOMODEL_gpt-3.5-turbo] - AssertionError: ['Input tokens estimate 77 not within 20pc of truth 113']
FAILED tests/test_estimators/test_llm_api_estimation.py::test_estimate_contains_exact_instructor[BARMODEL_gpt-4o] - AssertionError: ['Input tokens estimate 70 not within 20pc of truth 168']
FAILED tests/test_estimators/test_llm_api_estimation.py::test_estimate_contains_exact_instructor[BARMODEL_gpt-4o-mini] - AssertionError: ['Input tokens estimate 70 not within 20pc of truth 168']
FAILED tests/test_estimators/test_llm_api_estimation.py::test_estimate_contains_exact_instructor[BARMODEL_gpt-4-turbo] - AssertionError: ['Input tokens estimate 70 not within 20pc of truth 178']
FAILED tests/test_estimators/test_llm_api_estimation.py::test_estimate_contains_exact_instructor[BARMODEL_gpt-4] - AssertionError: ['Input tokens estimate 70 not within 20pc of truth 126']
FAILED tests/test_estimators/test_llm_api_estimation.py::test_estimate_contains_exact_instructor[BARMODEL_gpt-3.5-turbo] - AssertionError: ['Input tokens estimate 70 not within 20pc of truth 178']
```

