Metadata-Version: 2.4
Name: google-maps-scraper
Version: 0.1.0
Summary: Scrape Google Maps place details (rating, reviews, address, etc.) using Playwright — no API key needed
Project-URL: Homepage, https://github.com/noworneverev/google-maps-scraper
Project-URL: Repository, https://github.com/noworneverev/google-maps-scraper
Project-URL: Issues, https://github.com/noworneverev/google-maps-scraper/issues
Project-URL: Changelog, https://github.com/noworneverev/google-maps-scraper/releases
Author-email: Yan-Ying Liao <liao961120@gmail.com>
License-Expression: MIT
License-File: LICENSE
Keywords: async,google-maps,places,playwright,ratings,reviews,scraper,web-scraping
Classifier: Development Status :: 4 - Beta
Classifier: Framework :: AsyncIO
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Internet :: WWW/HTTP :: Dynamic Content
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: >=3.10
Requires-Dist: playwright>=1.40.0
Requires-Dist: pydantic>=2.0.0
Requires-Dist: tqdm>=4.60.0
Provides-Extra: dev
Requires-Dist: pytest-asyncio>=0.21.0; extra == 'dev'
Requires-Dist: pytest>=7.0.0; extra == 'dev'
Provides-Extra: stealth
Requires-Dist: playwright-stealth>=1.0.6; extra == 'stealth'
Description-Content-Type: text/markdown

# google-maps-scraper

[![PyPI version](https://badge.fury.io/py/google-maps-scraper.svg)](https://pypi.org/project/google-maps-scraper/)
[![Python 3.10+](https://img.shields.io/badge/python-3.10%2B-blue.svg)](https://www.python.org/downloads/)
[![License: MIT](https://img.shields.io/badge/License-MIT-green.svg)](https://opensource.org/licenses/MIT)

Scrape Google Maps place details — **rating, review count, address, phone, hours, coordinates, and more** — without an API key.

Built with [Playwright](https://playwright.dev/) (Firefox) for reliable rendering and **asyncio** for high-throughput batch processing.

## Features

- 🔍 **Scrape place details** from any Google Maps URL or search query
- ⭐ **Extract 20+ fields** — rating, review count, address, phone, website, hours, coordinates, category, and more
- 📝 **Review scraping** — extract individual user reviews with ratings and text
- 🚀 **Async batch processing** — configurable concurrency for scraping thousands of URLs
- 💾 **Crash recovery** — auto-save with resume support; pick up where you left off
- 🌍 **Multi-language** — supports any Google Maps locale (`en`, `ja`, `zh-TW`, `ko`, ...)
- 🔎 **Smart search handling** — auto-clicks the first search result when a query returns multiple matches
- 🤖 **Headless-ready** — runs perfectly in CI/CD and headless environments
- 📦 **CLI + Python API** — use from the command line or import as a library

## Installation

```bash
pip install google-maps-scraper
playwright install firefox
```

> **Note:** If running on a server without GUI, use `playwright install firefox --with-deps` to install browser dependencies.

### Optional: Stealth Mode

For better anti-detection, install `playwright-stealth`:

```bash
pip install google-maps-scraper[stealth]
```

## Quick Start

### CLI

```bash
# Scrape a single place
gmaps-scraper scrape "https://www.google.com/maps/search/?api=1&query=Eiffel+Tower"

# Scrape with language setting
gmaps-scraper scrape "https://www.google.com/maps/search/?api=1&query=東京タワー" --lang ja

# Batch scrape from CSV
gmaps-scraper batch urls.csv -o results.json --concurrency 5

# Batch scrape to CSV
gmaps-scraper batch urls.csv -o results.csv --lang zh-TW --concurrency 3
```

### Python API (Async)

```python
import asyncio
from gmaps_scraper import GoogleMapsScraper, ScrapeConfig

async def main():
    config = ScrapeConfig(language="en", headless=True)
    async with GoogleMapsScraper(config) as scraper:
        result = await scraper.scrape(
            "https://www.google.com/maps/search/?api=1&query=Machu+Picchu"
        )
        if result.success:
            print(f"Name:    {result.place.name}")
            print(f"Rating:  {result.place.rating}")
            print(f"Reviews: {result.place.review_count}")
            print(f"Address: {result.place.address}")

asyncio.run(main())
```

### Python API (Sync)

```python
from gmaps_scraper import scrape_place

result = scrape_place("https://www.google.com/maps/search/?api=1&query=Colosseum")
print(result.place.name, result.place.rating)
```

### Batch Processing

```python
import asyncio
from gmaps_scraper import scrape_batch, ScrapeConfig

async def main():
    urls = open("urls.txt").read().splitlines()

    config = ScrapeConfig(
        concurrency=5,
        delay_min=1.0,
        delay_max=3.0,
        headless=True,
        save_interval=50,
    )

    results = await scrape_batch(
        urls=urls,
        config=config,
        output_path="results.json",
        resume=True,  # Skip already-scraped URLs on restart
    )

    success = sum(1 for r in results if r.success)
    print(f"Done: {success}/{len(results)} succeeded")

asyncio.run(main())
```

## CLI Reference

### `gmaps-scraper scrape <url>`

Scrape a single Google Maps URL and output JSON.

| Option | Default | Description |
|---|---|---|
| `--lang` | — | Language code (e.g., `en`, `ja`, `zh-TW`) |
| `--no-headless` | — | Show the browser window (for debugging) |
| `--reviews` | — | Also scrape individual reviews |
| `--max-reviews` | `20` | Max reviews to extract |
| `-v, --verbose` | — | Enable debug logging |

### `gmaps-scraper batch <input> -o <output>`

Batch scrape URLs from a file. Output format is inferred from file extension (`.json` or `.csv`).

| Option | Default | Description |
|---|---|---|
| `-o, --output` | *required* | Output file path (`.json` or `.csv`) |
| `--concurrency` | `5` | Parallel browser tabs |
| `--lang` | — | Language code |
| `--proxy` | — | Proxy server URL (e.g., `http://proxy:8080`) |
| `--delay-min` | `2.0` | Min delay between requests (seconds) |
| `--delay-max` | `5.0` | Max delay between requests (seconds) |
| `--no-resume` | — | Start fresh, don't resume from existing output |
| `--reviews` | — | Also scrape individual reviews |
| `--max-reviews` | `20` | Max reviews per place |
| `--save-interval` | `50` | Auto-save every N results |

## Input File Format

**CSV** — the scraper looks for a column named `url`, `URL`, or `link`:

```csv
url,name
https://www.google.com/maps/search/?api=1&query=Eiffel+Tower,Eiffel Tower
https://www.google.com/maps/search/?api=1&query=Colosseum,Colosseum
```

**Text** — one URL per line:

```text
https://www.google.com/maps/search/?api=1&query=Eiffel+Tower
https://www.google.com/maps/search/?api=1&query=Colosseum
```

## Output Format

### JSON

```json
[
  {
    "input_url": "https://www.google.com/maps/search/?api=1&query=Eiffel+Tower",
    "success": true,
    "place": {
      "name": "Eiffel Tower",
      "rating": 4.7,
      "review_count": 344856,
      "address": "Av. Gustave Eiffel, 75007 Paris, France",
      "phone": "+33 8 92 70 12 39",
      "website": "https://www.toureiffel.paris/",
      "category": "Historical landmark",
      "latitude": 48.8583701,
      "longitude": 2.2944813,
      "hours": ["Monday 09:30–23:45", "..."],
      "google_maps_url": "https://www.google.com/maps/place/...",
      "permanently_closed": false
    },
    "reviews": [],
    "scraped_at": "2025-03-06T12:00:00"
  }
]
```

### CSV

Flat structure with all place fields as columns. Ideal for data analysis.

## Extracted Fields

| Field | Type | Description |
|---|---|---|
| `name` | `str` | Place name |
| `rating` | `float` | Star rating (1.0–5.0) |
| `review_count` | `int` | Total number of reviews |
| `address` | `str` | Full address |
| `phone` | `str` | Phone number |
| `website` | `str` | Website URL |
| `category` | `str` | Place category (e.g., "Restaurant") |
| `hours` | `list[str]` | Opening hours per day |
| `latitude` | `float` | Latitude coordinate |
| `longitude` | `float` | Longitude coordinate |
| `plus_code` | `str` | Google Plus Code |
| `place_id` | `str` | Google Maps Place ID |
| `url` | `str` | Canonical Google Maps URL |
| `google_maps_url` | `str` | Direct Google Maps link |
| `price_level` | `str` | Price level indicator |
| `image_url` | `str` | Main image URL |
| `description` | `str` | Place description |
| `photos_count` | `int` | Number of photos |
| `permanently_closed` | `bool` | Whether permanently closed |
| `temporarily_closed` | `bool` | Whether temporarily closed |

## Performance Guide

| Concurrency | Est. Throughput | Time for 10K URLs | Notes |
|---|---|---|---|
| 3 | ~1,200/hr | ~8.3 hrs | Conservative, stable |
| 5 | ~2,000/hr | ~5.0 hrs | Default |
| 10 | ~4,000/hr | ~2.5 hrs | Recommended with proxy |

**Tips:**

- Use `--proxy` with rotating proxies for higher concurrency
- The scraper auto-saves progress; if interrupted, just re-run and it will resume
- For large batches in CI (e.g., GitHub Actions with 6-hour limit), split into chunks

## Development

```bash
git clone https://github.com/noworneverev/google-maps-scraper.git
cd google-maps-scraper
pip install -e ".[dev]"
playwright install firefox
pytest tests/ -v
```

## License

[MIT](LICENSE) © [Yan-Ying Liao](https://github.com/noworneverev)
