Metadata-Version: 2.4
Name: github-ai-scraper
Version: 0.2.0
Summary: AI Engineering Trending - 爬取 GitHub Trending，过滤 AI 项目
Project-URL: Homepage, https://github.com/lwx66615/github-ai-scraper
Project-URL: Repository, https://github.com/lwx66615/github-ai-scraper
Author: lwx66615
License-Expression: MIT
Keywords: ai,cli,github,machine-learning,scraper,trending
Classifier: Development Status :: 4 - Beta
Classifier: Environment :: Console
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Typing :: Typed
Requires-Python: >=3.10
Requires-Dist: aiohttp>=3.9.0
Requires-Dist: aiosqlite>=0.19.0
Requires-Dist: beautifulsoup4>=4.12.0
Requires-Dist: click>=8.1.0
Requires-Dist: lxml>=4.9.0
Requires-Dist: pydantic>=2.0.0
Requires-Dist: python-dotenv>=1.0.0
Requires-Dist: pyyaml>=6.0
Requires-Dist: rich>=13.0.0
Provides-Extra: dev
Requires-Dist: pytest-asyncio>=0.23.0; extra == 'dev'
Requires-Dist: pytest-cov>=4.1.0; extra == 'dev'
Requires-Dist: pytest>=8.0.0; extra == 'dev'
Description-Content-Type: text/markdown

# AI Engineering Trending

English | [简体中文](README_CN.md)

A CLI tool for discovering trending AI Engineering projects from GitHub.

## What is AI Engineering?

AI Engineering focuses on **practical engineering applications** of AI/LLM technologies, including:

- **LLM SDK & API Clients** - OpenAI, Anthropic, Gemini, Mistral SDKs
- **Agent Frameworks** - LangChain, LlamaIndex, AutoGPT, CrewAI
- **RAG Tools** - Retrieval-augmented generation, document processing
- **Vector Databases** - Chroma, Pinecone, Weaviate, Milvus
- **AI Gateways** - LiteLLM, OpenRouter, unified API proxies
- **AI Code Assistants** - Cursor, Copilot, Claude Code
- **AI Observability** - Langfuse, LangSmith, monitoring tools
- **LLM Inference** - vLLM, Ollama, inference servers

This tool filters out research papers, datasets, tutorials, and model weights to focus on **production-ready** projects.

## Installation

```bash
# Install from PyPI
pip install github-ai-scraper

# Or install from source
pip install -e ".[dev]"
```

### Windows Quick Start

Double-click `install.bat` after cloning.

If `ai-scraper` is not recognized, run:
```bash
py -m ai_scraper.cli --help
```

## Quick Start

```bash
# Show this week's trending AI Engineering projects
ai-scraper trending

# Today's trending
ai-scraper trending --period daily

# This month's trending
ai-scraper trending --period monthly

# Save as Markdown
ai-scraper trending --save output.md

# Show verbose (filtered projects)
ai-scraper trending -v
```

## Commands

| Command | Description |
|---------|-------------|
| `ai-scraper trending` | Show trending AI Engineering projects |
| `ai-scraper trending --period daily` | Today's trending |
| `ai-scraper trending --period monthly` | This month's trending |
| `ai-scraper trending --limit 50` | Show top 50 projects |
| `ai-scraper trending --save output.md` | Save as Markdown |
| `ai-scraper trending -v` | Show filtered projects |
| `ai-scraper config show` | Show current configuration |
| `ai-scraper db stats` | Show database statistics |
| `ai-scraper db clean --vacuum` | Optimize database |

## Configuration

Create `ai-scraper.yaml` to customize:

```yaml
github:
  token: ${GITHUB_TOKEN}  # Optional, for higher rate limits

database:
  path: ./data/ai_scraper.db

trending:
  languages:
    - python
    - typescript
    - javascript
    - go
    - rust
  timeout: 30

summary:
  enabled: false
  provider: anthropic
  api_key: ${ANTHROPIC_API_KEY}
  model: claude-3-5-haiku-20241022
```

## AI Chinese Summaries

Enable AI-powered Chinese summaries for repository descriptions:

```bash
pip install "github-ai-scraper[ai]"
set ANTHROPIC_API_KEY=your_api_key
ai-scraper trending --ai-summary
```

## Project Structure

```
github-ai-scraper/
├── src/ai_scraper/
│   ├── cli.py              # CLI entry point
│   ├── config.py           # Configuration management
│   ├── classifier.py       # AI Engineering classification
│   ├── scraper/
│   │   └── trending.py     # GitHub Trending scraper
│   ├── output/
│   │   ├── exporter.py     # Markdown exporter
│   │   └── summarizer.py   # AI summary generator
│   ├── models/
│   │   └── repository.py   # Data models
│   └── storage/
│       └── database.py     # SQLite storage
├── tests/                  # Test suite
└── pyproject.toml          # Package metadata
```

## How It Works

1. **Scrape GitHub Trending** - Fetches trending repos from GitHub's trending page
2. **Filter Engineering Projects** - Removes tutorials, datasets, model weights, etc.
3. **Classify AI Projects** - Uses keyword/topic matching to identify AI Engineering projects
4. **Sort by Growth** - Orders by star growth rate
5. **Export Results** - Outputs to console or Markdown file

## Development

```bash
pip install -e ".[dev]"
pytest tests/ -v
```

## License

MIT