Metadata-Version: 2.4
Name: pazufa_corelib
Version: 0.1.0
Summary: Core library for collecting parliamentary data
License-Expression: GPL-3.0-only
License-File: LICENSE
Keywords: parliament,open-data,scraper,germany
Author: PaZuFa team
Maintainer: PaZuFa team
Maintainer-email: pub@pazufa.de
Requires-Python: >=3.12,<3.15
Classifier: Development Status :: 3 - Alpha
Classifier: Operating System :: OS Independent
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: Natural Language :: German
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3 :: Only
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Programming Language :: Python :: 3.14
Classifier: Topic :: Internet :: WWW/HTTP
Classifier: Topic :: Text Processing
Requires-Dist: instructor[litellm] (>=1.14.0,<2.0.0)
Requires-Dist: litellm (>=1.81.13,<2.0.0)
Requires-Dist: numpy (>=2.4.4,<3.0.0)
Requires-Dist: pydantic (>=2.13,<3.0)
Requires-Dist: pyyaml (>=6.0,<7.0)
Requires-Dist: rapidfuzz (>=3.14.5,<4.0.0)
Requires-Dist: w3lib (>=2.4.1,<3.0.0)
Project-URL: Changelog, https://codeberg.org/PaZuFa/pazufa-scraper-core/src/branch/main/CHANGELOG.md
Project-URL: Documentation, https://wiki.pazufa.de/books/scraper-core
Project-URL: Homepage, https://pazufa.de
Project-URL: Issues, https://codeberg.org/PaZuFa/pazufa-scraper-core/issues
Project-URL: Repository, https://codeberg.org/PaZuFa/pazufa-scraper-core
Description-Content-Type: text/markdown

# Scraper-core

Core library for collectors/scrapers of [PaZuFa](https://codeberg.org/PaZuFa/parlamentszusammenfasser.git), providing shared functionality and base classes.

> **Status:** Work in Progress (WIP). For a detailed status overview, see the [status page in the wiki](https://wiki.pazufa.de/books/scraper-core/page/aktueller-stand) (German).



The detailed documentation can be found in the [wiki](https://wiki.pazufa.de/books/scraper-core) (German).



## Requests

If you have a request for the Scraper-core, the best way to voice it is to write a Codeberg issue. Please add the label `external-request` to it.

If it is a bug you can alternatively use the label `Bug`.

For requests and questions, you can, of course, contact us on [Mattermost](https://chat.pazufa.de).

## Structure

The library consists of three parts:

1. **CoreLib:** shared classes and utilities used by all scrapers regardless of implementation approach. This includes [Pydantic](https://docs.pydantic.dev/latest/) validation models, API client helpers, common data transformation functions, standardised phrases and tag mappings (e.g. normalising committee names, document types, and Schlagworte across parliaments), and reusable components for tasks like LLM enrichment.
2. **[Scrapy](https://www.scrapy.org/)-based:** Opinionated implementation of Corelib in Scrapy based classes.
3. **Collector-based:** Our project's scaffolding for scrapers, implemented in an opinionated manner using Corelib. 
## Requirements

- Python 3.12+
- Poetry 2.x

 For the full dependency list, see [pyproject.toml](https://codeberg.org/PaZuFa/pazufa-scraper-core/src/branch/main/pyproject.toml). 

## Setup 

See [SETUP.md](https://codeberg.org/PaZuFa/pazufa-scraper-core/src/branch/main/SETUP.md) for the full setup guide, which is versioned alongside the code.

## Contribution

See [CONTRIBUTING.md](https://codeberg.org/PaZuFa/pazufa-scraper-core/src/branch/main/CONTRIBUTING.md) for development setup, git workflow, code generation, documentation and project context.

## License

GPL-3.0

