Metadata-Version: 2.4
Name: google-maps-scraper
Version: 0.1.2
Summary: Scrape Google Maps place details (rating, reviews, address, etc.) using Playwright — no API key needed
Project-URL: Homepage, https://github.com/noworneverev/google-maps-scraper
Project-URL: Repository, https://github.com/noworneverev/google-maps-scraper
Project-URL: Issues, https://github.com/noworneverev/google-maps-scraper/issues
Project-URL: Changelog, https://github.com/noworneverev/google-maps-scraper/releases
Author-email: Yan-Ying Liao <n9102125@gmail.com>
License-Expression: MIT
License-File: LICENSE
Keywords: async,google-maps,places,playwright,ratings,reviews,scraper,web-scraping
Classifier: Development Status :: 4 - Beta
Classifier: Framework :: AsyncIO
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Internet :: WWW/HTTP :: Dynamic Content
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: >=3.10
Requires-Dist: playwright>=1.40.0
Requires-Dist: pydantic>=2.0.0
Requires-Dist: tqdm>=4.60.0
Provides-Extra: dev
Requires-Dist: pytest-asyncio>=0.21.0; extra == 'dev'
Requires-Dist: pytest>=7.0.0; extra == 'dev'
Provides-Extra: stealth
Requires-Dist: playwright-stealth>=1.0.6; extra == 'stealth'
Description-Content-Type: text/markdown

# google-maps-scraper

[![PyPI version](https://badge.fury.io/py/google-maps-scraper.svg)](https://pypi.org/project/google-maps-scraper/)
[![Python 3.10+](https://img.shields.io/badge/python-3.10%2B-blue.svg)](https://www.python.org/downloads/)
[![License: MIT](https://img.shields.io/badge/License-MIT-green.svg)](https://opensource.org/licenses/MIT)

Scrape Google Maps place details — **rating, review count, address, phone, hours, coordinates, and more** — without an API key.

Built with [Playwright](https://playwright.dev/) (Firefox) for reliable rendering and **asyncio** for high-throughput batch processing.

## Features

- 🔍 **Scrape place details** from any Google Maps URL or search query
- ⭐ **Extract 20+ fields** — rating, review count, address, phone, website, hours, coordinates, category, and more
- ⭐ **Extract 20+ fields** — rating, review count, address, phone, website, hours, coordinates, category, and more
- 🚀 **Async batch processing** — configurable concurrency for scraping thousands of URLs
- 💾 **Crash recovery** — auto-save with resume support; pick up where you left off
- 🌍 **Multi-language** — supports any Google Maps locale (`en`, `ja`, `zh-TW`, `ko`, ...)
- 🔎 **Smart search handling** — auto-clicks the first search result when a query returns multiple matches
- 🤖 **Headless-ready** — runs perfectly in CI/CD and headless environments
- 📦 **CLI + Python API** — use from the command line or import as a library

## Installation

```bash
pip install google-maps-scraper
playwright install firefox
```

> **Note:** If running on a server without GUI, use `playwright install firefox --with-deps` to install browser dependencies.

### Optional: Stealth Mode

For better anti-detection, install `playwright-stealth`:

```bash
pip install google-maps-scraper[stealth]
```

## Quick Start

### CLI

```bash
# Scrape a single place
gmaps-scraper scrape "https://www.google.com/maps/search/?api=1&query=Eiffel+Tower"

# Scrape with language setting
gmaps-scraper scrape "https://www.google.com/maps/search/?api=1&query=東京タワー" --lang ja

# Batch scrape from CSV
gmaps-scraper batch urls.csv -o results.json --concurrency 5

# Batch scrape to CSV
gmaps-scraper batch urls.csv -o results.csv --lang zh-TW --concurrency 3
```

### Python API (Async)

```python
import asyncio
from gmaps_scraper import GoogleMapsScraper, ScrapeConfig

async def main():
    config = ScrapeConfig(language="en", headless=True)
    async with GoogleMapsScraper(config) as scraper:
        result = await scraper.scrape(
            "https://www.google.com/maps/search/?api=1&query=Machu+Picchu"
        )
        if result.success:
            print(f"Name:    {result.place.name}")
            print(f"Rating:  {result.place.rating}")
            print(f"Reviews: {result.place.review_count}")
            print(f"Address: {result.place.address}")

asyncio.run(main())
```

### Python API (Sync)

```python
from gmaps_scraper import scrape_place

result = scrape_place("https://www.google.com/maps/search/?api=1&query=Colosseum")
print(result.place.name, result.place.rating)
```

### Batch Processing

```python
import asyncio
from gmaps_scraper import scrape_batch, ScrapeConfig

async def main():
    urls = open("urls.txt").read().splitlines()

    config = ScrapeConfig(
        concurrency=5,
        delay_min=1.0,
        delay_max=3.0,
        headless=True,
        save_interval=50,
    )

    results = await scrape_batch(
        urls=urls,
        config=config,
        output_path="results.json",
        resume=True,  # Skip already-scraped URLs on restart
    )

    success = sum(1 for r in results if r.success)
    print(f"Done: {success}/{len(results)} succeeded")

asyncio.run(main())
```

## CLI Reference

### `gmaps-scraper scrape <url>`

Scrape a single Google Maps URL and output JSON.

| Option | Default | Description |
|---|---|---|
| `--lang` | — | Language code (e.g., `en`, `ja`, `zh-TW`) |
| `--no-headless` | — | Show the browser window (for debugging) |
| `-v, --verbose` | — | Enable debug logging |

### `gmaps-scraper batch <input> -o <output>`

Batch scrape URLs from a file. Output format is inferred from file extension (`.json` or `.csv`).

| Option | Default | Description |
|---|---|---|
| `-o, --output` | *required* | Output file path (`.json` or `.csv`) |
| `--concurrency` | `5` | Parallel browser tabs |
| `--lang` | — | Language code |
| `--proxy` | — | Proxy server URL (e.g., `http://proxy:8080`) |
| `--delay-min` | `2.0` | Min delay between requests (seconds) |
| `--delay-max` | `5.0` | Max delay between requests (seconds) |
| `--no-resume` | — | Start fresh, don't resume from existing output |
| `--save-interval` | `50` | Auto-save every N results |

## Input File Format

**CSV** — the scraper looks for a column named `url`, `URL`, or `link`:

```csv
url,name
https://www.google.com/maps/search/?api=1&query=Eiffel+Tower,Eiffel Tower
https://www.google.com/maps/search/?api=1&query=Colosseum,Colosseum
```

**Text** — one URL per line:

```text
https://www.google.com/maps/search/?api=1&query=Eiffel+Tower
https://www.google.com/maps/search/?api=1&query=Colosseum
```

## Output Format

### JSON

```json
[
  {
    "input_url": "https://www.google.com/maps/search/?api=1&query=Eiffel+Tower",
    "success": true,
    "place": {
      "name": "Eiffel Tower",
      "rating": 4.7,
      "review_count": 344856,
      "address": "Av. Gustave Eiffel, 75007 Paris, France",
      "phone": "+33 8 92 70 12 39",
      "website": "https://www.toureiffel.paris/",
      "category": "Historical landmark",
      "latitude": 48.8583701,
      "longitude": 2.2944813,
      "hours": ["Monday 09:30–23:45", "..."],
      "google_maps_url": "https://www.google.com/maps/place/...",
      "image_url": "https://lh3.googleusercontent.com/gps-cs-s/...",
      "permanently_closed": false
    },
    "scraped_at": "2025-03-06T12:00:00"
  }
]
```

### CSV

Flat structure with all place fields as columns. Ideal for data analysis.

## Extracted Fields

| Field | Type | Description |
|---|---|---|
| `name` | `str` | Place name |
| `rating` | `float` | Star rating (1.0–5.0) |
| `review_count` | `int` | Total number of reviews |
| `address` | `str` | Full address |
| `phone` | `str` | Phone number |
| `website` | `str` | Website URL |
| `category` | `str` | Place category (e.g., "Restaurant") |
| `hours` | `list[str]` | Opening hours per day |
| `latitude` | `float` | Latitude coordinate |
| `longitude` | `float` | Longitude coordinate |
| `plus_code` | `str` | Google Plus Code |
| `place_id` | `str` | Google Maps Place ID |
| `url` | `str` | Canonical Google Maps URL |
| `google_maps_url` | `str` | Direct Google Maps link |
| `price_level` | `str` | Price level indicator |
| `image_url` | `str` | Main image URL |
| `description` | `str` | Place description |
| `photos_count` | `int` | Number of photos |
| `permanently_closed` | `bool` | Whether permanently closed |
| `temporarily_closed` | `bool` | Whether temporarily closed |

## Performance Guide

| Concurrency | Est. Throughput | Time for 10K URLs | Notes |
|---|---|---|---|
| 3 | ~1,200/hr | ~8.3 hrs | Conservative, stable |
| 5 | ~2,000/hr | ~5.0 hrs | Default |
| 10 | ~4,000/hr | ~2.5 hrs | Recommended with proxy |

**Tips:**

- Use `--proxy` with rotating proxies for higher concurrency
- The scraper auto-saves progress; if interrupted, just re-run and it will resume
- For large batches in CI (e.g., GitHub Actions with 6-hour limit), split into chunks

## Development

```bash
git clone https://github.com/noworneverev/google-maps-scraper.git
cd google-maps-scraper
pip install -e ".[dev]"
playwright install firefox
pytest tests/ -v
```

## ⚠️ Disclaimer

This tool is provided for **educational and research purposes only**. By using this software, you acknowledge and agree that:

- **Google Maps Terms of Service**: Web scraping may violate [Google Maps' Terms of Service](https://www.google.com/intl/en/help/terms_maps/). You are solely responsible for ensuring your use complies with all applicable terms, laws, and regulations.
- **No Warranty**: This software is provided "as is", without warranty of any kind. The authors are not responsible for any consequences arising from the use of this tool.
- **Rate Limiting**: Excessive or aggressive scraping may result in your IP being temporarily or permanently blocked by Google. Use appropriate delays and concurrency settings.
- **Data Privacy**: Respect the privacy of individuals whose reviews or information may be collected. Handle all scraped data in accordance with applicable privacy laws (e.g., GDPR, CCPA).
- **Personal Responsibility**: The user assumes all responsibility for how the tool is used and the data it collects.

The authors and contributors of this project do not endorse or encourage any misuse of this software.

## License

[MIT](LICENSE) © [Yan-Ying Liao](https://github.com/noworneverev)
