Metadata-Version: 2.4
Name: feedray
Version: 0.3.0
Summary: FastAPI news intelligence service with source crawling, AI article analysis, event clustering, timelines, and recommendations.
Project-URL: Homepage, https://github.com/johnvonneumann36/FeedRay
Project-URL: Repository, https://github.com/johnvonneumann36/FeedRay
Project-URL: Issues, https://github.com/johnvonneumann36/FeedRay/issues
Project-URL: Releases, https://github.com/johnvonneumann36/FeedRay/releases
Author: johnvonneumann36
License: Apache-2.0
License-File: LICENSE
Keywords: event-clustering,fastapi,news,pgvector,recommendations
Classifier: Development Status :: 3 - Alpha
Classifier: Framework :: FastAPI
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Internet :: WWW/HTTP :: Indexing/Search
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: >=3.11
Requires-Dist: apscheduler<4.0,>=3.10
Requires-Dist: asyncpg<1.0,>=0.29
Requires-Dist: fastapi<1.0,>=0.115
Requires-Dist: httpx[http2]<1.0,>=0.28
Requires-Dist: loguru<1.0,>=0.7
Requires-Dist: numpy<3.0,>=1.26
Requires-Dist: passlib[bcrypt]<2.0,>=1.7
Requires-Dist: pgvector<1.0,>=0.3
Requires-Dist: pydantic[email]<3.0,>=2.0
Requires-Dist: python-dotenv<2.0,>=1.0
Requires-Dist: python-jose[cryptography]<4.0,>=3.3
Requires-Dist: python-multipart<1.0,>=0.0.9
Requires-Dist: pyyaml<7.0,>=6.0
Requires-Dist: requests<3.0,>=2.31
Requires-Dist: sqlalchemy[asyncio]<3.0,>=2.0
Requires-Dist: uvicorn[standard]<1.0,>=0.30
Provides-Extra: all
Requires-Dist: beautifulsoup4<5.0,>=4.12; extra == 'all'
Requires-Dist: curl-cffi<1.0,>=0.7; extra == 'all'
Requires-Dist: feedparser<7.0,>=6.0; extra == 'all'
Requires-Dist: lxml<7.0,>=5.0; extra == 'all'
Requires-Dist: playwright-stealth<3.0,>=2.0; extra == 'all'
Requires-Dist: playwright<2.0,>=1.45; extra == 'all'
Requires-Dist: sentence-transformers<4.0,>=3.4; extra == 'all'
Requires-Dist: trafilatura<3.0,>=2.0; extra == 'all'
Requires-Dist: transformers<5.0,>=4.48; extra == 'all'
Provides-Extra: crawler
Requires-Dist: beautifulsoup4<5.0,>=4.12; extra == 'crawler'
Requires-Dist: curl-cffi<1.0,>=0.7; extra == 'crawler'
Requires-Dist: feedparser<7.0,>=6.0; extra == 'crawler'
Requires-Dist: lxml<7.0,>=5.0; extra == 'crawler'
Requires-Dist: playwright-stealth<3.0,>=2.0; extra == 'crawler'
Requires-Dist: playwright<2.0,>=1.45; extra == 'crawler'
Requires-Dist: trafilatura<3.0,>=2.0; extra == 'crawler'
Provides-Extra: dev
Requires-Dist: build<2.0,>=1.2; extra == 'dev'
Requires-Dist: pytest<9.0,>=8.3; extra == 'dev'
Requires-Dist: twine<7.0,>=5.0; extra == 'dev'
Provides-Extra: local-hf
Requires-Dist: sentence-transformers<4.0,>=3.4; extra == 'local-hf'
Requires-Dist: transformers<5.0,>=4.48; extra == 'local-hf'
Description-Content-Type: text/markdown

<p align="center">
  <img src="assets/feedray-logo.png" alt="FeedRay logo" width="180">
</p>

<p align="center">
  <a href="https://pypi.org/project/feedray/"><img src="https://img.shields.io/pypi/v/feedray.svg" alt="PyPI"></a>
  <a href="LICENSE"><img src="https://img.shields.io/badge/license-Apache--2.0-blue.svg" alt="Apache-2.0"></a>
  <img src="https://img.shields.io/badge/status-alpha-f6c85f.svg" alt="Alpha">
  <img src="https://img.shields.io/badge/python-3.11%2B-5aa9ff.svg" alt="Python 3.11+">
</p>

# FeedRay

FeedRay is a Python/FastAPI news intelligence service. It crawls configured news sources, stores article embeddings in PostgreSQL with pgvector, extracts article topics/entities/importance, groups related articles into event clusters, builds event timelines, and recommends both fresh articles and evolving stories.

Status: alpha. The backend is ready for local development and early integration. APIs, schema, and packaging may still change before a stable release.

## What It Does

- Collects Google News RSS and direct RSS sources.
- Resolves publisher pages and extracts article metadata.
- Runs provider-based embedding and chat model analysis.
- Stores article `topics`, `entities`, and public `importance_score`.
- Groups related articles into event clusters.
- Promotes coherent pending articles into new events.
- Compresses long-running events into snapshots and timelines.
- Recommends articles and events from user interests and behavior.
- Tracks pipeline metrics, retry state, source reliability, and stale event lifecycle.

## Architecture

```text
sources
  -> article shells
  -> embedding queue
  -> chat analysis queue
  -> event assignment queue
  -> pending promotion
  -> event snapshots / lifecycle
  -> recommendations
```

Core modules:

| Area | Module |
| --- | --- |
| API | `feedray.api` |
| SQLAlchemy models | `feedray.db.models` |
| Model providers | `feedray.providers` |
| Article analysis | `feedray.services.analysis` |
| Event assignment | `feedray.services.event_assignment` |
| Pending event promotion | `feedray.services.pending_events` |
| Event compression | `feedray.services.event_compression` |
| Async pipeline | `feedray.services.pipeline` |
| Crawler | `feedray.scraping.crawler` |

## Install

Package install:

```bash
pip install feedray
```

Crawler extras:

```bash
pip install "feedray[crawler]"
playwright install chromium
```

Development install:

```bash
git clone https://github.com/johnvonneumann36/FeedRay.git
cd FeedRay
pip install -e ".[dev,crawler]"
```

Legacy local install:

```bash
pip install -r requirements.txt
```

## Configuration

Copy `.env.example` to `.env`, then edit local values. Do not commit `.env`.

| Setting | Purpose |
| --- | --- |
| `DB_USER`, `DB_PASSWORD`, `DB_HOST`, `DB_PORT`, `DB_NAME` | PostgreSQL connection |
| `JWT_SECRET_KEY`, `JWT_ALGORITHM` | API auth token signing |
| `FEEDRAY_EMBEDDING_PROVIDER_TYPE` | Embedding provider |
| `FEEDRAY_EMBEDDING_MODEL_NAME` | Embedding model |
| `FEEDRAY_EMBEDDING_BASE_URL` | Optional OpenAI-compatible embedding endpoint |
| `FEEDRAY_EMBEDDING_API_KEY` | Embedding provider credential |
| `FEEDRAY_CHAT_PROVIDER_TYPE` | Chat provider |
| `FEEDRAY_CHAT_MODEL_NAME` | Chat model |
| `FEEDRAY_CHAT_BASE_URL` | Optional OpenAI-compatible chat endpoint |
| `FEEDRAY_CHAT_API_KEY` | Chat provider credential |

Supported provider types:

| Role | Providers |
| --- | --- |
| Embedding | `ollama`, `openai`, `gemini`, `local_huggingface` |
| Chat | `ollama`, `openai`, `anthropic`, `gemini`, `local_huggingface` |

## Database

FeedRay expects PostgreSQL with the `vector` extension available.

Warning: `feedray-init-db` drops all existing FeedRay tables before recreating them.

```bash
feedray-init-db
```

Use it only for fresh local setup or disposable development databases.

## Run API

```bash
uvicorn feedray.api.app:app --host 0.0.0.0 --port 8000
```

Root endpoint:

```text
GET /
```

Most API routes require a bearer token from `/auth/login`.

## Jobs

Console entry points:

```bash
feedray-crawler
feedray-analyze-backfill --missing-only
feedray-event-backfill --missing-only
feedray-promote-pending-events --window-hours 48 --min-articles 2
feedray-compress-events --window-hours 24
feedray-archive-events --quiet-after-hours 48 --archive-after-hours 168
```

Python module equivalents:

```bash
python -m feedray.jobs.crawler
python -m feedray.jobs.analyze_backfill --missing-only
python -m feedray.jobs.event_backfill --missing-only
python -m feedray.jobs.promote_pending_events
python -m feedray.jobs.compress_events
python -m feedray.jobs.archive_events
```

## API Highlights

| Method | Path | Purpose |
| --- | --- | --- |
| `POST` | `/auth/register` | Create user |
| `POST` | `/auth/login` | Get bearer token |
| `GET` | `/articles` | List/filter/search articles |
| `GET` | `/articles/{id}` | Article detail |
| `GET` | `/events` | List/filter/search event clusters |
| `GET` | `/events/{id}` | Event detail |
| `GET` | `/events/{id}/articles` | Event articles |
| `GET` | `/events/{id}/timeline` | Event snapshots |
| `GET` | `/recommendations` | Article recommendations |
| `GET` | `/recommendations/events` | Event recommendations |
| `POST` | `/activities` | Article feedback |
| `POST` | `/activities/events` | Event feedback |
| `GET` | `/models/health` | Provider health |

## Release Hygiene

Before publishing:

```bash
python -m pytest
python -m compileall feedray
python -m build
python -m twine check dist/*
```

Recommended smoke test:

```bash
python -m venv .venv-smoke
.venv-smoke\Scripts\python -m pip install dist\feedray-0.3.0-py3-none-any.whl
.venv-smoke\Scripts\python -c "import feedray; print(feedray.__version__)"
```

Security notes:

- `.env` is ignored and must stay local.
- `.env.example` contains placeholders only.
- `chrome_profile/`, `logs/`, `dist/`, and `*.egg-info/` are ignored.
- Generated database, browser, log, and cache files should not be committed.

## License

Apache-2.0. See `LICENSE`.
