Metadata-Version: 2.4
Name: llm-distiller
Version: 0.1.0
Summary: Model distiller automator — recursively drives an LLM with seed prompts and stores compressed outputs in SQLite
Project-URL: Homepage, https://github.com/daedalus/llm-distiller
Project-URL: Repository, https://github.com/daedalus/llm-distiller
Project-URL: Issues, https://github.com/daedalus/llm-distiller/issues
Author-email: Dario Clavijo <clavijodario@gmail.com>
License: MIT
License-File: LICENSE
Keywords: ai,bloom-filter,huggingface,llm,model-distillation,ngrams,openai,sqlite,tfidf,torch
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.11
Requires-Dist: bitarray
Requires-Dist: colorama
Requires-Dist: huggingface-hub
Requires-Dist: joblib
Requires-Dist: openai
Requires-Dist: pyyaml
Requires-Dist: scikit-learn
Requires-Dist: torch
Requires-Dist: transformers
Requires-Dist: zstd
Provides-Extra: all
Requires-Dist: hatch; extra == 'all'
Requires-Dist: hypothesis; extra == 'all'
Requires-Dist: mypy; extra == 'all'
Requires-Dist: pytest; extra == 'all'
Requires-Dist: pytest-asyncio; extra == 'all'
Requires-Dist: pytest-cov; extra == 'all'
Requires-Dist: pytest-mock; extra == 'all'
Requires-Dist: ruff; extra == 'all'
Provides-Extra: dev
Requires-Dist: hatch; extra == 'dev'
Requires-Dist: mypy; extra == 'dev'
Requires-Dist: ruff; extra == 'dev'
Provides-Extra: lint
Requires-Dist: mypy; extra == 'lint'
Requires-Dist: ruff; extra == 'lint'
Provides-Extra: test
Requires-Dist: hypothesis; extra == 'test'
Requires-Dist: pytest; extra == 'test'
Requires-Dist: pytest-asyncio; extra == 'test'
Requires-Dist: pytest-cov; extra == 'test'
Requires-Dist: pytest-mock; extra == 'test'
Provides-Extra: unsloth
Requires-Dist: unsloth; extra == 'unsloth'
Description-Content-Type: text/markdown

## The llm-distiller ##

### Running ###

```
usage: main.py [-h] [--prompt PROMPT] [--model MODEL] [--db DB] [--max-depth MAX_DEPTH] [--max-tokens MAX_TOKENS] [--compression-level {1,2,3,4,5,6,7,8,9}] [--seed SEED]
               [--bloom-size BLOOM_SIZE] [--bloom-hash-count BLOOM_HASH_COUNT] [--max-ngrams MAX_NGRAMS] [--no-color] [--retrieve-to-bloom] [--use-unsloth] [--api-url API_URL]
               [--api-key API_KEY] [--system-prompt SYSTEM_PROMPT] [--threads THREADS] [--secrets-file SECRETS_FILE] [--load-prompts-from-file LOAD_PROMPTS_FROM_FILE]
               [--api-hf-provider API_HF_PROVIDER] [--compression-algo COMPRESSION_ALGO] [--prompt-prefixes PROMPT_PREFIXES [PROMPT_PREFIXES ...]] [--batch-size BATCH_SIZE]
               [--remove-prompt] [--ngram-mode] [--min-tfidf-score MIN_TFIDF_SCORE] [--save-to-textfile SAVE_TO_TEXTFILE] [--q-mode] [--randomize-prompts] [--randomize-model-retry]
               [--randomize-remote-endpoint] [--strip-think-tag-form-prompt] [--exp-backoff] [--stream]

LLM Distiller with Bloom filter and SQLite storage.

options:
  -h, --help            show this help message and exit
  --prompt PROMPT       Root word or prompt to distill.
  --model MODEL         Huggingface model name (default: distilgpt2).
  --db DB               Path to SQLite database (default: words/data.db).
  --max-depth MAX_DEPTH
                        Max recursion depth (default: 10).
  --max-tokens MAX_TOKENS
                        Max tokens (default: 1024).
  --compression-level {1,2,3,4,5,6,7,8,9}
                        Zlib compression level (1-9, default: 6).
  --seed SEED           Torch manual seed (optional).
  --bloom-size BLOOM_SIZE
                        Bloom filter size (default: 100,000,000).
  --bloom-hash-count BLOOM_HASH_COUNT
                        Bloom filter hash count (default: 6).
  --max-ngrams MAX_NGRAMS
                        Max ngrams (default: 10).
  --no-color            Disable colored output.
  --retrieve-to-bloom   Retrieve words from the database to the Bloom filter.
  --use-unsloth         Use unsloth
  --api-url API_URL     OpenAI compatible API url.
  --api-key API_KEY     API key for auth.
  --system-prompt SYSTEM_PROMPT
                        System prompt
  --threads THREADS     Number of CPU threads for PyTorch (default: auto)
  --secrets-file SECRETS_FILE
                        Specify the secrets json file.
  --load-prompts-from-file LOAD_PROMPTS_FROM_FILE
                        Specify the prompts file file.
  --api-hf-provider API_HF_PROVIDER
                        Specify the hugging face inference provider
  --compression-algo COMPRESSION_ALGO
                        Specify the compresion algo to use.
  --prompt-prefixes PROMPT_PREFIXES [PROMPT_PREFIXES ...]
                        List of strings with spaces allowed
  --batch-size BATCH_SIZE
                        Number of prompts to process in parallel (default: 1)
  --remove-prompt       Remove the prompt from generation.
  --ngram-mode          ngram mode from generation.
  --min-tfidf-score MIN_TFIDF_SCORE
                        Specify the min_tfidf_score.
  --save-to-textfile SAVE_TO_TEXTFILE
                        Specify a text file to save generated text.
  --q-mode              Q-mode.
  --randomize-prompts   Randomize prompts when read from file.
  --randomize-model-retry
                        Randomize model to retry.
  --randomize-remote-endpoint
                        Randomize remote endpoint.
  --strip-think-tag-form-prompt
                        Strip the <think> and </think> tags from prompts.
  --exp-backoff         Set exponential backoff.
  --stream              Set stream.
```


With a local endpoint:

```
#!/bin/bash
set -x

PROMPT='make a list of the most important people in history'
MODEL=meta-llama/llama-4-scout-17b-16e-instruct

python main.py "$PROMPT" \
     --compression-level 9 \
     --max-tokens=2048 \
     --max-depth 100 \
     --seed=0 \
     --model=$MODEL \
     --use-unsloth \
     --db /content/drive/MyDrive/IA/data.db \
     --prompt-prefixes 'please explain' 'please elaborate' 'think about' 'formulate a theory about' 'demonstrate that' \
     --batch-size 8 \
     --remove-prompt \
     --min-tfidf-score=0.1
```

With a remote inference endpoint:
```
#!/bin/bash
set -x

PROMPT='make a list of the most important people in history'
PROVIDER=https://api.groq.com/openai/v1/
MODEL=meta-llama/llama-4-scout-17b-16e-instruct

python main.py "$PROMPT" \
  --api-url=$PROVIDER \
  --model=$MODEL \
  --max-tokens=4096 \
  --prompt-prefixes "please explain" "please elaborate" "think about" "formulate a theory about" "demonstrate that" \
  --remove-prompt \
  --secrets-file=.secrets.json \
  --min-tfidf-score=0.1
```

The license is MIT.
