Metadata-Version: 2.4
Name: llmservice
Version: 0.2.2
Summary: A lightweight, production-ready service layer for modular, rate-aware LLM integrations
Author: Enes Kuzucu
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.7
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Operating System :: OS Independent
Requires-Python: >=3.8
Description-Content-Type: text/markdown
Requires-Dist: langchain
Requires-Dist: langchain-community
Requires-Dist: langchain-openai
Requires-Dist: proteas
Requires-Dist: string2dict
Requires-Dist: indented_logger
Requires-Dist: pyyaml
Requires-Dist: tqdm
Requires-Dist: python-magic
Requires-Dist: python-dotenv
Requires-Dist: pytest
Dynamic: author
Dynamic: classifier
Dynamic: description
Dynamic: description-content-type
Dynamic: requires-dist
Dynamic: requires-python
Dynamic: summary



![LLMSERVICE Logo](https://raw.githubusercontent.com/karaposu/llmkit/refs/heads/main/assets/text_logo_transp.png)


-----------------

A clean, production-ready service layer that centralizes prompts, invocations, and post-processing, ensuring rate-aware, maintainable, and scalable LLM logic in your application.

|             |                                                                                                                                                                                |
| ----------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| **Package** | [![PyPI Latest Release](https://img.shields.io/pypi/v/llmservice.svg)](https://pypi.org/project/llmservice/) [![PyPI Downloads](https://img.shields.io/pypi/dm/llmservice.svg?label=PyPI%20downloads)](https://pypi.org/project/llmservice/) |

## Installation

Install LLMService via pip:

```bash
pip install llmservice
```

## Table of Contents

- [Installation](#installation)
- [What makes it unique?](#what-makes-it-unique)
- [Main Features](#main-features)
- [Architecture](#architecture)
- [Usage](#usage)
  - [Config & Installation](#config--installation)
  - [Step 1: Subclassing `BaseLLMService` and create methods](#step-1-subclassing-basellmservice-and-create-methods)
  - [Step 2: Import your llm layer and use the methods](#step-2-import-your-llm-layer-and-use-the-methods)
  - [Step 3: Life is about joyment. Do not miss life.](#step-3-life-is-about-joyment-do-not-miss-life)
- [Postprocessing Pipeline](#postprocessing-pipeline)
  - [Method 1: SemanticIsolator](#method-1-semanticisolator)
  - [Method 2: ConvertToDict](#method-2-converttodict)
  - [Method 3: ExtractValue](#method-3-extractvalue)
  - [Using Pipeline Methods together](#using-pipeline-methods-together)
- [Async support](#async-support)
  - [Translating a 100 pages book (chunked to pieces)](#translating-a-100-pages-book-chunked-to-pieces)

 

## What makes it unique?

| Feature                             | LLMService                                                                                                                                | LangChain                                                                                                          |
| ----------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------ |
| **Result Handling**                 | Returns a single `GenerationResult` dataclass encapsulating success/failure, rich metadata (tokens, cost, latency), and pipeline outcomes | Composes chains of tools and agents; success/failure handling is dispersed via callbacks and exceptions            |
| **Rate-Limit & Throughput Control** | Built-in sliding-window RPM/TPM counters and an adjustable semaphore for concurrency, automatically pausing when you hit your API quota   | Relies on external throttlers or underlying client logic; no native RPM/TPM management                             |
| **Cost Monitoring**                 | Automatic per-model token-level cost calculation and aggregated usage stats for real-time billing insights                                | No built-in cost monitoring—you must implement your own wrappers or middleware                                     |
| **Post-Processing Pipelines**       | Declarative configs for JSON parsing, semantic extraction, validation, and transformation without ad-hoc parsing code                     | Encourages embedding output parsers inside chains or writing ad-hoc post-chain functions, scattering parsing logic |
| **Dependencies**                    | Minimal footprint: only Tenacity, your LLM client, and optionally YAML for prompts                                                        | Broad ecosystem: agents, retrievers, vector stores, callback managers, and other heavy dependencies                |
| **Extensibility**                   | Provides a clear `BaseLLMService` subclassing interface so you encapsulate each business operation and never call the engine directly     | You wire together chains or agents at call-site, mixing business logic with prompt orchestration                   |



LLMService delivers a well-structured alternative to more monolithic frameworks like LangChain.

> "LangChain isn't a library, it's a collection of demos held together by duct tape, fstrings, and prayers." 


## Main Features

* **Minimal Footprint & Low Coupling**  
  Designed for dependency injection—your application code never needs to know about LLM logic.

* **Result Monad Pattern**  
  Returns a `GenerationResult` dataclass for every invocation, encapsulating success/failure status, raw and processed outputs, error details, retry information, and per-step results—giving you full control over custom workflows.

* **Declarative Post-Processing Pipelines**  
  Chain semantic extraction, JSON parsing, string validation, and more via simple, declarative configurations.

* **Rate-Limit-Aware Asynchronous Requests**  
  Dynamically queue and scale workers based on real-time RPM/TPM metrics to maximize throughput without exceeding API quotas.

* **Transparent Cost & Usage Monitoring**  
  Automatically track input/output tokens and compute per-model cost, exposing detailed metadata with each response.

* **Automated Retry & Exponential Backoff**  
  Handle transient errors (rate limits, network hiccups) with configurable retries and exponential backoff powered by Tenacity.

* **Custom Exception Handling**  
  Provide clear, operation-specific fallbacks (e.g., insufficient quota, unsupported region) for graceful degradation.



## Architecture

LLMService provides an abstract `BaseLLMService` class to guide users in implementing their own service layers. It includes `llmhandler`which manages interactions with different LLM providers and `generation_engine` which handles the process of prompt crafting, LLM invocation, and post-processing

![LLMService Architecture](https://raw.githubusercontent.com/karaposu/LLMService/refs/heads/main/assets/architecture.png) 

![schemas](https://raw.githubusercontent.com/karaposu/LLMService/refs/heads/main/assets/schemas.png)  

# Usage 

## Step 0: Config & Installation

- Put your `OPENAI_API_KEY` inside `.env` file

- Install LLMService via pip:

```bash
pip install llmservice
```

## Step 1: Subclassing `BaseLLMService` and create methods

Create a new Python file (e.g., `myllmservice.py`) and extend the `BaseLLMService` class. And all llm using logic of your business logic will be defined here as methods. 


```python

class MyLLMService(BaseLLMService):
  def translate_to_latin(self, input_paragraph: str) -> GenerationResult:
          my_prompt=f"translate this text to latin {input_paragraph}"

          generation_request = GenerationRequest(
              formatted_prompt=my_prompt,
               model="gpt-4o", 
          )

          # Execute the generation synchronously
          generation_result = self.execute_generation(generation_request)
          return generation_result
```

## Step 2: Import your llm layer and use the methods 

```python
# in your app.py
from myllmservice import MyLLMService

if __name__ == '__main__':
    service = MyLLMService()
    result = service.translate_to_latin("Hello, how are you?")
    print(result)
    
    # in this case the result will be a generation_result object which inludes all the information you need. 
```

## Step 3: Some simple fact  
Dont forget to live your life man. Remember all code is legacy the moment it is written.  


# Postprocessing Pipeline  
There are 5 custom methods integrated into LLMservice. These postprocessing methods are the most commonly used methods so 
we are supporting them natively. 

## Method 1: Semantic Isolation

Use the **SemanticIsolator** step whenever you need to extract a specific semantic element (for example, a code snippet, a name, or any targeted fragment) from an LLM’s output.

For example, imagine you asked LLM to write you a SQL snippet and it returns:

```text
Here is your answer:
SELECT * FROM users;
Do you need anything else?

```

And lets say you plan to use the output directly in your database connection. But in this case you cant run it because 
it contains text like "Here is your answer:"

So in such scenario where just need the pure semantic elemet this postprocessing step is useful.  

Here is sample usage for above example:

```python
# in your  myllmservice 


 def create_sql_code(self, user_question: str,  database_desc,) -> GenerationResult:
    
        formatted_prompt = f"""Here is my database description: {database_desc},
                            and here is what the user wants to learn: {user_question}.
                            I want you to generate a SQL query. answer should contain only SQL code."""

        pipeline_config = [
            {
                'type': 'SemanticIsolation',   
                'params': {
                    'semantic_element_for_extraction': 'SQL code'
                }
            }
        ]
        
        generation_request = GenerationRequest(
            formatted_prompt=formatted_prompt,
            model="gpt-4o", 
            pipeline_config=pipeline_config,
        )

        generation_result = self.execute_generation(generation_request)
        return generation_result

```

The **SemanticIsolator** postprocessing step fixes this by running a second query that extracts **only** the semantic element you provided (in this case SQL code). 


## Method 2: ConvertToDict

When you ask an LLM to output a JSON-like response, you typically convert it into a dictionary (for example, using `json.loads()`). However, if the output is missing quotes or otherwise isn’t strictly valid JSON, `json.loads()` will fail. **ConvertToDict** leverages the `string2dict` module to handle these edge cases—even with missing quotes or minor formatting issues, it can parse the string into a proper Python `dict`.

Below are some LLM outputs where `json.loads()` fails but **ConvertToDict** succeeds:



 sample_1:
```
  '{\n    "key": "SELECT DATE_FORMAT(bills.bill_date, \'%Y-%m\') AS month, SUM(bills.total) AS total_spending FROM bills WHERE YEAR(bills.bill_date) = 2023 GROUP BY DATE_FORMAT(bills.bill_date, \'%Y-%m\') ORDER BY month;"\n}'
```
 sample_2:
```
  "{\n    'key': 'SELECT DATE_FORMAT(bill_date, \\'%Y-%m\\') AS month, SUM(total) AS total_spendings FROM bills WHERE YEAR(bill_date) = 2023 GROUP BY month ORDER BY month;'\n}"
```
 sample_3:
```
  '{   \'key\': "https://dfasdfasfer.vercel.app/"}'
  
```

Usage :
 
```python
pipeline_config = [
           
             {
                'type': 'ConvertToDict', 
                'params': {}
             } 
]

```


## Method 3: ExtractValue


Use this pipeline step **with** the `ConvertToDict` method to extract a single field from a JSON-like response. Simply specify the field name as a parameter.

For example, if your LLM returns:

```json
{"answer": "<LLM-generated answer>"}
```

add the following to your pipeline config:

```
  {
                'type': 'ExtractValue',  
                 'params': {'key': 'answer'}
 }
```

This configuration first ensures the output is parsed into a Python `dict`, then automatically returns the value associated with `"answer"`.

  
## Using Pipeline Methods Together

A common scenario is to chain multiple pipeline steps to extract a specific value from an LLM response:

1. **SemanticIsolation**  
   Extracts the JSON-like snippet from a larger text response.  
2. **ConvertToDict**  
   Normalizes that snippet into a Python `dict`, even if it isn’t strictly valid JSON.  
3. **ExtractValue**  
   Retrieves the value associated with a given key from the dictionary.


```
pipeline_config = [
            {
                'type': 'SemanticIsolation',   
                'params': { 'semantic_element_for_extraction': 'SQL code' }
            }, 
            {
                'type': 'ConvertToDict', 
                'params': {}
             },
            {
                'type': 'ExtractValue',      
                'params': {'key': 'answer'}
            }
          ]

```


## Async Support

LLMService includes first-class asynchronous methods, with built-in rate and concurrency controls. You can configure `max_rpm` `max_tpm` and `max_concurrent_requests` (which indirectly governs TPM over the same window). Here’s an example for your `myllm_service.py`:


```

class MyLLMService(BaseLLMService):
    def __init__(self):
        super().__init__(default_model_name="gpt-4o-mini")
       
        self.set_rate_limits(max_rpm=120, max_tpm=10_000)
        self.set_concurrency(100)

  async def translate_to_latin_async(self, input_paragraph: str) -> GenerationResult:
        
          my_prompt=f"translate this to to latin {input_paragraph}"

          generation_request = GenerationRequest(
              formatted_prompt=my_prompt,
              model="gpt-4o-mini",
              operation_name="translate_to_latin",
          )

          generation_result = await self.execute_generation_async(generation_request)
          return generation_result
```

# Translating a 100 pages book with various configs 

For this experiement we are using a text which is already chunked into pieces


| Model Name    | Method  | Max Concurrency | Max RPM | Max TPM | Elapsed Time | Total Cost |
|---------------|---------|-----------------|---------|---------|--------------|------------|
| gpt4o-mini    | synch   | –               | –       | –       |              |            |
| gpt4o-mini    | asynch  | 10              | 100     | 10000   |              |            |
| gpt4o-mini    | asynch  | 50              | 100     | 10000   |              |            |
| gpt4o-mini    | asynch  | 100             | 150     | 20000   |              |            |
| gpt4o-mini    | asynch  | 200             | 300     | 30000   |              |            |
| gpt4o         | synch   | -               |         |         |              |            |
| gpt4o         | asynch  |                 |         |         |              |            |
| gpt4.1-nano   | synch   |                 |         |         |              |            |
| gpt4.1-nano   | asynch  |                 |         |         |              |            |




