Metadata-Version: 2.4
Name: scrape-forvo
Version: 0.1.0
Summary: Add your description here
Requires-Python: >=3.13
Description-Content-Type: text/markdown
Requires-Dist: playwright>=1.58.0
Requires-Dist: requests>=2.32.5
Provides-Extra: dev
Requires-Dist: pytest>=8.0.0; extra == "dev"
Requires-Dist: responses>=0.25.0; extra == "dev"

# scrape-forvo

Download pronunciation MP3s from Forvo search pages.

## Installation

```bash
python -m pip install -e .
```

## Usage

Only this command is confirmed to work reliably:

```bash
scrape-forvo https://forvo.com/search/egg/no/ --use-playwright --headed
```

## Scriptable Usage

You can also import `scrape_forvo` and use it from Python:

```python
from scrape_forvo import scrape

result = scrape(
    "https://forvo.com/search/egg/no/",
    outdir="forvo_mp3",
    use_playwright=True,
    headed=True,
)

print(result.downloaded_count)
for candidate in result.candidates:
    print(candidate.url, "->", candidate.out_path)
```

The `scrape()` arguments map directly to CLI flags, so both interfaces share the same behavior without duplicated logic.

## Development

Install dev dependencies:

```bash
python -m pip install -e .[dev]
```

Run tests:

```bash
pytest
```

### Optional live test

Set `FORVO_LIVE_TEST=1` to enable the live integration test.

## TODO

edge cases
- [ ] when multiple pronunciation files come out. which one to pick?
- [ ] when there's no pronunciation available.

integration
- [ ] integration with the vocab repo
