Metadata-Version: 2.4
Name: scrapy_item_ingest
Version: 0.2.6
Summary: Scrapy extension for database ingestion with job/spider tracking
Home-page: https://github.com/fawadss1/scrapy_item_ingest
Author: Fawad Ali
Author-email: fawadstar6@gmail.com
Project-URL: Documentation, https://scrapy-item-ingest.readthedocs.io/
Project-URL: Source, https://github.com/fawadss1/scrapy_item_ingest
Project-URL: Tracker, https://github.com/fawadss1/scrapy_item_ingest/issues
Keywords: scrapy,database,postgresql,web-scraping,data-pipeline
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.7
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Framework :: Scrapy
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Topic :: Internet :: WWW/HTTP
Classifier: Topic :: Database
Requires-Python: >=3.7
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: scrapy>=2.13.3
Requires-Dist: psycopg2-binary>=2.9.10
Requires-Dist: itemadapter>=0.11.0
Requires-Dist: SQLAlchemy>=2.0.41
Requires-Dist: pytz>=2025.2
Provides-Extra: docs
Requires-Dist: sphinx>=5.0.0; extra == "docs"
Requires-Dist: sphinx_rtd_theme>=1.2.0; extra == "docs"
Requires-Dist: myst-parser>=0.18.0; extra == "docs"
Requires-Dist: sphinx-autodoc-typehints>=1.19.0; extra == "docs"
Requires-Dist: sphinx-copybutton>=0.5.0; extra == "docs"
Provides-Extra: dev
Requires-Dist: pytest>=7.0.0; extra == "dev"
Requires-Dist: pytest-cov>=4.0.0; extra == "dev"
Requires-Dist: black>=22.0.0; extra == "dev"
Requires-Dist: flake8>=5.0.0; extra == "dev"
Requires-Dist: mypy>=0.991; extra == "dev"
Requires-Dist: pre-commit>=2.20.0; extra == "dev"
Provides-Extra: test
Requires-Dist: pytest>=7.0.0; extra == "test"
Requires-Dist: pytest-cov>=4.0.0; extra == "test"
Requires-Dist: pytest-mock>=3.8.0; extra == "test"
Dynamic: author
Dynamic: author-email
Dynamic: classifier
Dynamic: description
Dynamic: description-content-type
Dynamic: home-page
Dynamic: keywords
Dynamic: license-file
Dynamic: project-url
Dynamic: provides-extra
Dynamic: requires-dist
Dynamic: requires-python
Dynamic: summary

# Scrapy Item Ingest

A tiny, straightforward addon for Scrapy that saves your items, requests, and logs to PostgreSQL. No boilerplate, no ceremony.

## Install

```bash
pip install scrapy-item-ingest
```

## Minimal setup (settings.py)

```python
ITEM_PIPELINES = {
    'scrapy_item_ingest.DbInsertPipeline': 300,
}

EXTENSIONS = {
    'scrapy_item_ingest.LoggingExtension': 500,
}

# Pick ONE of the two database config styles:
DB_URL = "postgresql://user:password@localhost:5432/database"
# Or use discrete fields (avoids URL encoding):
# DB_HOST = "localhost"
# DB_PORT = 5432
# DB_USER = "user"
# DB_PASSWORD = "password"
# DB_NAME = "database"

# Optional
CREATE_TABLES = True     # auto‑create tables on first run (default True)
JOB_ID = 1               # or omit; spider name will be used
```

Run your spider:

```bash
scrapy crawl your_spider
```

## Troubleshooting

- Password has special characters like `@` or `$`?
  - In a URL, encode them: `@` -> `%40`, `$` -> `%24`.
  - Example: `postgresql://user:PAK%40swat1%24@localhost:5432/db`
  - Or use the discrete fields (no encoding needed).

## Useful settings (optional)

- `LOG_DB_LEVEL` (default: `DEBUG`) — minimum level stored in DB
- `LOG_DB_CAPTURE_LEVEL` — capture level for Scrapy loggers routed to DB (does not affect console)
- `LOG_DB_LOGGERS` — allowed logger prefixes (defaults always include `[spider.name, 'scrapy']`)
- `LOG_DB_EXCLUDE_LOGGERS` (default: `['scrapy.core.scraper']`)
- `LOG_DB_EXCLUDE_PATTERNS` (default: `['Scraped from <']`)
- `CREATE_TABLES` (default: `True`) — create `job_items`, `job_requests`, `job_logs` on startup
- `ITEMS_TABLE`, `REQUESTS_TABLE`, `LOGS_TABLE` — override table names

## Links

- Docs: https://scrapy-item-ingest.readthedocs.io/
- Changelog: docs/development/changelog.rst
- Issues: https://github.com/fawadss1/scrapy_item_ingest/issues

## License

MIT License. See [LICENSE](LICENSE).
