Metadata-Version: 2.4
Name: llm-forge-playground
Version: 0.1.0
Summary: Small CLI playground for Qwen via an OpenAI-compatible LLM Forge gateway.
Author-email: The AIC Project <dev@the-aic-project.org>
License: MIT
Project-URL: Homepage, https://github.com/the-aic-project/openai-playground
Project-URL: Source, https://github.com/the-aic-project/openai-playground
Project-URL: Issues, https://github.com/the-aic-project/openai-playground/issues
Keywords: llm,qwen,cli,openai-compatible,llm-forge
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Environment :: Console
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Software Development :: Libraries
Requires-Python: >=3.9
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: openai>=1.58.0
Requires-Dist: requests>=2.32.0
Provides-Extra: dev
Requires-Dist: ruff>=0.8.0; extra == "dev"
Dynamic: license-file

## LLM Forge Playground (Qwen via OpenAI-compatible gateway)

[![CI](https://github.com/the-aic-project/openai-playground/actions/workflows/ci.yaml/badge.svg)](https://github.com/the-aic-project/openai-playground/actions/workflows/ci.yaml)
[![Lint](https://github.com/the-aic-project/openai-playground/actions/workflows/lint.yaml/badge.svg)](https://github.com/the-aic-project/openai-playground/actions/workflows/lint.yaml)
[![PyPI version](https://img.shields.io/pypi/v/llm-forge-playground.svg)](https://pypi.org/project/llm-forge-playground/)
[![Python 3.9+](https://img.shields.io/pypi/pyversions/llm-forge-playground.svg)](https://pypi.org/project/llm-forge-playground/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)

A small **interactive** Python CLI for talking to a Qwen model behind an **OpenAI-compatible** LLM Forge gateway. Responses **stream into the console** by default. You change model, temperature, streaming, and other settings **on the fly**—no restart needed.

- **Dynamic config** — Change any setting in-session with `set <key> <value>`.
- **Streaming by default** — Tokens print as they arrive.
- **Discoverable** — Type `?` or `help` for all commands.
- **Scenarios** — Run predefined tests anytime with `scenario 1` … `scenario 4`.

---

### Quickstart

**Using uv** (recommended):

```bash
uv tool install llm-forge-playground
# or in a project:
uv add llm-forge-playground
llm-forge-cli
```

**Using pipenv**:

```bash
pipenv install llm-forge-playground
pipenv run llm-forge-cli
```

**Using pip**:

```bash
pip install llm-forge-playground
llm-forge-cli
```

Set your gateway token (and optional base URL), then type a message. Use `?` for in-session commands.

```bash
export LLM_FORGE_ACCESS_TOKEN="your_jwt_here"
export LLM_FORGE_BASE_URL="https://your-gateway.example.com/v1"  # optional
llm-forge-cli
```

---

### Environment configuration

Set these before starting the CLI:

- **`LLM_FORGE_BASE_URL`**: Gateway base URL (default: `http://localhost:8000/v1`).
- **`LLM_FORGE_ACCESS_TOKEN`**: JWT / bearer token (**recommended**).

Or use basic-auth login (client obtains a token once):

- **`LLM_FORGE_AUTH_URL`**, **`LLM_FORGE_USERNAME`**, **`LLM_FORGE_PASSWORD`**

Example:

```bash
export LLM_FORGE_BASE_URL="https://your-gateway.example.com/v1"
export LLM_FORGE_ACCESS_TOKEN="your_jwt_here"
```

If neither a token nor auth credentials are set, the program exits with a clear error.

---

### Installation (from source)

```bash
git clone https://github.com/the-aic-project/openai-playground.git
cd openai-playground
uv sync   # or: pipenv install -e .  /  pip install -e .
llm-forge-cli
```

---

### Running the CLI

```bash
llm-forge-cli
```

or:

```bash
python -m llm_forge_playground.cli
```

You enter an **interactive REPL**. The prompt shows `[stream]` or `[sync]` so you know the current mode.

- **Type a message** — It’s sent to the model; the reply streams (or prints when sync).
- **Change settings anytime** — No restart. Use the commands below.

#### In-session commands

| Input | Description |
|-------|-------------|
| `?` or `help` | List all commands and set keys |
| `config` | Show current model, temp, stream, thinking, max_tokens, etc. |
| `set <key> <value>` | Change a setting (e.g. `set stream off`, `set temp 0.3`, `set model other/model`) |
| `clear` | Clear conversation history; start a new thread |
| `scenario 1` … `scenario 4` | Run a predefined test (see below) |
| `exit`, `quit`, `q` | Exit |

You can use **`/`** or not: `set stream on` and `/set stream on` both work.

#### Set keys and aliases

- **model** (alias: **m**) — Model name.
- **temp** (alias: **t**) — Temperature, e.g. `0.7`.
- **top_p** — Top-p sampling.
- **max_tokens** (alias: **mt**) — Max tokens to generate.
- **top_k** — Extra body `top_k`; use `none` to clear.
- **thinking** — `on` / `off` (client-side flag; may have no effect on vLLM Qwen).
- **stream** (alias: **s**) — `on` / `off` (stream responses or wait for full reply).

Examples:

```
set stream off
set temp 0.2
set model Qwen/Qwen3.5-397B-A17B-FP8
set thinking on
```

#### Predefined scenarios (run anytime)

- **scenario 1** — Short explanation, no thinking, no streaming.
- **scenario 2** — Same question with thinking (non-streaming).
- **scenario 3** — Same question, streaming, no thinking.
- **scenario 4** — Longer reasoning task, thinking, higher max_tokens.

Each scenario sends one user message and appends the reply to the current conversation.

#### Scripting: run one scenario and exit

```bash
llm-forge-cli --scenario 2
```

Runs scenario 2 and exits (no REPL).

---

### How thinking and streaming work

- **Thinking** — The client sends `extra_body["chat_template_kwargs"]["enable_thinking"]`. On many vLLM Qwen gateways, whether reasoning tokens are returned is decided **server-side** (parser/config), not by this flag.
- **Streaming** — When `stream` is on, the CLI uses `chat.completions.create(stream=True, ...)` and prints each content delta as it arrives; the full reply is still stored in conversation history.

---

### Publishing to PyPI (maintainers)

The [Release](.github/workflows/release.yaml) workflow (tag → release) uses GitHub’s built-in **`GITHUB_TOKEN`**—no secret to add. The [Publish to PyPI](.github/workflows/publish.yaml) workflow runs when a release is published; to enable it, add a PyPI API token as a GitHub secret:

1. **Create a PyPI token**
   - Log in at [pypi.org](https://pypi.org), go to **Account settings → API tokens**.
   - Click **Add API token**. Name it (e.g. `github-actions`) and set scope to the project (e.g. **Project: llm-forge-playground**) or “Entire account” if you prefer.
   - Copy the token; it starts with `pypi-` and is shown only once.

2. **Add the token to GitHub**
   - In your repo: **Settings → Secrets and variables → Actions**.
   - Click **New repository secret**. Name: `PYPI_API_TOKEN`, Value: paste the `pypi-...` token. Save.

3. **Publish**
   - **Option A:** Push a version tag; the [Release](.github/workflows/release.yaml) workflow creates a GitHub Release (with generated notes), which triggers [Publish to PyPI](.github/workflows/publish.yaml):
     ```bash
     git tag v0.1.0
     git push origin v0.1.0
     ```
   - **Option B:** Create a release from the GitHub **Releases** UI (e.g. “Draft new release” → choose tag or create one → “Publish release”).
   - **Option C:** Run the **Publish to PyPI** workflow manually from the **Actions** tab.

---

### Example code (project abstractions)

```python
from llm_forge_playground.config import AppConfig
from llm_forge_playground.client import build_openai_client, chat_once

config = AppConfig.from_env(
    stream=True,           # default
    enable_thinking=False,
)
client = build_openai_client(config)

messages = [{"role": "user", "content": "Explain what a knowledge graph is in two sentences."}]
reply = chat_once(client, config, messages)
```

You can change `config.model`, `config.temperature`, etc. between calls; no need to recreate the client unless you change base URL or auth.
