Metadata-Version: 2.4
Name: llm-web-crawler
Version: 0.1.0
Summary: LLM data collection and synthetic fine-tuning dataset pipeline
Requires-Python: >=3.11
Requires-Dist: aiofiles>=23
Requires-Dist: beautifulsoup4>=4.12
Requires-Dist: datasets>=2.20
Requires-Dist: httpx[http2]>=0.27
Requires-Dist: huggingface-hub>=0.23
Requires-Dist: jinja2>=3.1
Requires-Dist: kaggle>=1.6
Requires-Dist: litellm>=1.40
Requires-Dist: loguru>=0.7
Requires-Dist: lxml>=5
Requires-Dist: markdownify>=0.12
Requires-Dist: psutil>=5.9
Requires-Dist: pyarrow>=16
Requires-Dist: pydantic-settings>=2.3
Requires-Dist: pydantic>=2.7
Requires-Dist: python-dotenv>=1.0
Requires-Dist: questionary>=2.0
Requires-Dist: rich>=13
Requires-Dist: sqlmodel>=0.0.18
Requires-Dist: tenacity>=8.3
Requires-Dist: tiktoken>=0.7
Requires-Dist: typer>=0.12
Requires-Dist: xmltodict>=0.13
Provides-Extra: dev
Requires-Dist: mypy>=1.10; extra == 'dev'
Requires-Dist: pytest-asyncio>=0.23; extra == 'dev'
Requires-Dist: pytest-cov>=5; extra == 'dev'
Requires-Dist: pytest>=8; extra == 'dev'
Requires-Dist: respx>=0.21; extra == 'dev'
Requires-Dist: ruff>=0.4; extra == 'dev'
Provides-Extra: unsloth
Requires-Dist: unsloth>=2024.0; extra == 'unsloth'
Description-Content-Type: text/markdown

# DataForge

LLM data collection and synthetic fine-tuning dataset pipeline.

## Installation

### Via pip (any platform)
```bash
pip install dataforge
```

### From source
```bash
git clone https://github.com/yourusername/website-explorer.git
cd website-explorer
pip install -e .
```

### Standalone executables
Download pre-built executables for your platform from [Releases](https://github.com/yourusername/website-explorer/releases):
- **Linux**: `dataforge-linux-x64`
- **Windows**: `dataforge-windows-x64.exe`
- **macOS**: `dataforge-macos-x64`

## Usage

```bash
dataforge
```

## Development

Install development dependencies:
```bash
pip install -e ".[dev]"
```

Run tests:
```bash
pytest
```

Run linting:
```bash
ruff check src/ tests/
```

## Publishing Releases

1. Update version in `pyproject.toml`
2. Commit changes
3. Create a tag: `git tag v0.1.0`
4. Push tag: `git push origin v0.1.0`

This will trigger:
- Automated builds for Windows, macOS, and Linux
- Publishing to PyPI
- Creation of a GitHub Release with executables
