Metadata-Version: 2.4
Name: LinkedInWebScraper
Version: 1.1.1
Summary: A library for scraping LinkedIn job postings.
Author-email: Ricardo Garcia Ramirez <rgr.5882@gmail.com>
License-Expression: MIT
Project-URL: Homepage, https://github.com/ricardogr07/LinkedInWebScraper
Project-URL: Source, https://github.com/ricardogr07/LinkedInWebScraper
Keywords: linkedin,scraper,jobs,openai,pandas,sqlite
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Programming Language :: Python :: 3.14
Classifier: Topic :: Internet :: WWW/HTTP
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: >=3.11
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: beautifulsoup4>=4.12.3
Requires-Dist: numpy>=1.26.4
Requires-Dist: pandas>=2.2.0
Requires-Dist: requests>=2.31.0
Requires-Dist: SQLAlchemy>=2.0.36
Provides-Extra: openai
Requires-Dist: openai>=2.29.0; extra == "openai"
Requires-Dist: pydantic>=2.7.0; extra == "openai"
Provides-Extra: dev
Requires-Dist: build>=1.2.2; extra == "dev"
Requires-Dist: coverage[toml]>=7.6.0; extra == "dev"
Requires-Dist: mkdocs>=1.6.1; extra == "dev"
Requires-Dist: mkdocs-material>=9.5.34; extra == "dev"
Requires-Dist: mkdocstrings[python]>=0.26.1; extra == "dev"
Requires-Dist: openai>=2.29.0; extra == "dev"
Requires-Dist: pydantic>=2.7.0; extra == "dev"
Requires-Dist: pytest>=8.3.2; extra == "dev"
Requires-Dist: pytest-cov>=5.0.0; extra == "dev"
Requires-Dist: pyrefly>=0.26.0; extra == "dev"
Requires-Dist: ruff>=0.6.9; extra == "dev"
Requires-Dist: tox>=4.21.2; extra == "dev"
Provides-Extra: docs
Requires-Dist: mkdocs>=1.6.1; extra == "docs"
Requires-Dist: mkdocs-material>=9.5.34; extra == "docs"
Requires-Dist: mkdocstrings[python]>=0.26.1; extra == "docs"
Dynamic: license-file

# LinkedInWebScraper

[![CI](https://github.com/ricardogr07/LinkedInWebScraper/actions/workflows/ci.yml/badge.svg)](https://github.com/ricardogr07/LinkedInWebScraper/actions/workflows/ci.yml)
[![Docs](https://github.com/ricardogr07/LinkedInWebScraper/actions/workflows/docs.yml/badge.svg)](https://github.com/ricardogr07/LinkedInWebScraper/actions/workflows/docs.yml)
[![Docs site](https://img.shields.io/badge/docs-GitHub%20Pages-blue)](https://ricardogr07.github.io/LinkedInWebScraper/)
[![Release](https://github.com/ricardogr07/LinkedInWebScraper/actions/workflows/release.yml/badge.svg)](https://github.com/ricardogr07/LinkedInWebScraper/actions/workflows/release.yml)
[![PyPI version](https://img.shields.io/pypi/v/LinkedInWebScraper.svg)](https://pypi.org/project/LinkedInWebScraper/)
[![Python versions](https://img.shields.io/pypi/pyversions/LinkedInWebScraper.svg)](https://pypi.org/project/LinkedInWebScraper/)
[![License](https://img.shields.io/pypi/l/LinkedInWebScraper.svg)](https://github.com/ricardogr07/LinkedInWebScraper/blob/main/LICENSE)

LinkedInWebScraper is a production-minded Python library and scheduled job runner for collecting LinkedIn job listings, normalizing the data, persisting run history, and exporting reusable datasets.

## Highlights

- Canonical package namespace under `linkedin_web_scraper`
- Typed programmatic config for single scrapes and TOML runtime config for CLI and scheduled runs
- Managed artifacts under `artifacts/jobs`, `artifacts/logs`, and `artifacts/state`
- SQLite-backed persistence through a clean application storage port
- Package CLI with `scrape once`, `scrape daily`, `export`, and `--dry-run`
- Optional OpenAI enrichment built on the current Responses API
- Runnable examples under `examples/`
- Auto release automation that waits for green CI and Docs runs on `main`

## Install

```bash
pip install LinkedInWebScraper
pip install LinkedInWebScraper[openai]
pip install -e .[dev]
```

## Quickstart

```python
from linkedin_web_scraper import (
    JobScraperConfig,
    LinkedInJobScraper,
    RemoteType,
    configure_logging,
)

logger = configure_logging(filename="example.log")
config = JobScraperConfig(
    position="Data Analyst",
    location="San Francisco",
    remote=RemoteType.REMOTE,
)

jobs = LinkedInJobScraper(logger=logger, config=config).run()
print(jobs.head())
```

## Examples

Run the example scripts from `examples/`:

```bash
python examples/example.py
python examples/example_advanced_config.py
python examples/example_openai.py
```

The OpenAI example requires `OPENAI_API_KEY` in the environment.

## CLI Runtime

```bash
linkedin-webscraper scrape once --dry-run
linkedin-webscraper scrape daily
linkedin-webscraper export --run-id <run-id>
```

Use `runtime.example.toml` as the template for a real `runtime.toml`. The root runtime scripts remain available for the daily and once workflows:

```bash
python main.py
python process_ds_jobs.py
```

## Docs

- [Getting Started](docs/getting-started.md)
- [Configuration](docs/configuration.md)
- [Runtime and Deployment](docs/runtime.md)
- [Release and Automation](docs/development/release-and-automation.md)
- [Validation](docs/development/validation.md)
- [API Reference](docs/api.md)

## Development

Run the local gate before risky pushes or merges:

```bash
python -m tox -e preflight
```

For a faster smoke-only path:

```bash
python -m tox -e smoke
```

The detailed validation matrix and release flow live in `docs/development/validation.md` and `docs/development/release-and-automation.md`.

## License

This project is licensed under the MIT License.
