Metadata-Version: 2.4
Name: mapscopex
Version: 0.1.1
Summary: Search Google Maps listings and extract business website emails.
Project-URL: Homepage, https://github.com/rohdahal/geoprobe
Requires-Python: >=3.9
Description-Content-Type: text/markdown
Requires-Dist: playwright>=1.40
Requires-Dist: requests>=2.31

# mapscopex

`mapscopex` searches Google Maps for businesses and attempts to extract email addresses from their websites.

## Install

```bash
pip install mapscopex
```

## Public API

```python
from mapscopex import build_search_query, collect_searchprobe, searchprobe
```

- `searchprobe(keyword, location, max_items=None)` streams cumulative results as they are found.
- `collect_searchprobe(keyword, location, max_items=None)` waits for the scrape to finish and returns the final list once.
- `build_search_query(keyword, location)` normalizes inputs into the Google Maps query string.

## Usage

```python
from mapscopex import searchprobe

for results_so_far in searchprobe(
    keyword="lawyers and law firm",
    location="Houston, TX",
    max_items=None,
):
    print(results_so_far[-1])
```

`max_items=None` means `mapscopex` will keep scrolling until Google Maps stops yielding new businesses.

On first use, `mapscopex` checks whether Playwright Chromium is available. If it is missing, `mapscopex` attempts to install it automatically:
- macOS: `chromium`
- Windows: `chromium`
- Linux: `--with-deps chromium`, then falls back to `chromium` if needed

On Linux, `--with-deps` may still fail if the environment blocks privileged package installation. In that case, the package raises a runtime error with the last Playwright install failure.

If you want the old collect-then-return behavior:

```python
from mapscopex import collect_searchprobe

results = collect_searchprobe(
    keyword="lawyers and law firm",
    location="Houston, TX",
    max_items=None,
)
```

## Output Shape

Each result is a dictionary with these keys:

```python
{
    "name": str,
    "phone": str,
    "website": str,
    "emails": list[str],
}
```

`searchprobe()` yields a list of these dictionaries after each new business is added. `collect_searchprobe()` returns the final list once the scraper stops.

## Release Notes

The current package version is `0.1.0`. That is a reasonable first public release while the API is still settling. The repository also includes:

- CI for build plus Playwright bootstrap checks on macOS, Linux, and Windows
- a tag-based PyPI publish workflow for trusted publishing via GitHub Actions
