Metadata-Version: 2.4
Name: google-flights-scraper-api
Version: 0.0.1
Summary: Python client for scraping Google Flights using the ScrapingBee web scraping API
Author: wordstotech
License: MIT
Project-URL: Homepage, https://www.scrapingbee.com/scrapers/google-flights-scraper/
Keywords: google flights api,google flights scraper,google flights scraper api,web scraping,scrapingbee,flights,travel
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: requests
Dynamic: license-file

# google-flights-scraper-api

A Python client for the **Google Flights scraper API** powered by ScrapingBee. It turns a public Google Flights page into clean data you can load into pandas, a database, or a price-monitoring job, without you running a single headless browser or proxy.

Google does not ship a public **Google Flights API**, and the page builds its fares with JavaScript behind anti-bot protection. This package sends the work to ScrapingBee, which renders the page, rotates residential proxies, and hands back rendered HTML or structured JSON.

[Built on the ScrapingBee web scraping API](https://www.scrapingbee.com/scrapers/google-flights-scraper/)

If you searched for any of these, you are in the right place:

- google flights api
- google flights scraper
- google flights scraper api

## Why a Google Flights scraper API instead of plain requests

A direct `requests.get()` against Google Flights returns an empty shell. The fares, durations, and stop counts are injected by JavaScript after load, and Google quickly blocks datacenter IPs with consent walls and challenges.

A managed **google flights api** layer removes that whole class of problems:

- Executes the page JavaScript in a real headless browser
- Rotates residential proxies so requests are not blocked
- Skips the Google consent interstitial
- Returns structured JSON when you supply extraction rules

You write the query and read the data. The infrastructure is someone else's problem.

## Installation

```bash
pip install google-flights-scraper-api
```

Requires Python 3.8+ and `requests`.

## Quick start

```python
from google_flights_scraper_api import GoogleFlightsScraper

scraper = GoogleFlightsScraper(api_key="YOUR_API_KEY")

html = scraper.search(query="Flights to London from New York")
print(html[:500])
```

Grab a free key first. ScrapingBee gives 1,000 credits with no card required at [scrapingbee.com](https://www.scrapingbee.com/).

## How it works

Every call hits the ScrapingBee HTML API:

```
https://app.scrapingbee.com/api/v1/
```

The client builds the request with documented parameters: the Google Flights `url`, `render_js=true`, `premium_proxy=true`, and the Google `CONSENT` cookie so the consent page is skipped. You never assemble the query string yourself.

## Structured data with AI extraction

Rather than parse Google's rotating markup, pass `ai_extract_rules` and get JSON back. The schema you define becomes the response shape.

```python
from google_flights_scraper_api import GoogleFlightsScraper

scraper = GoogleFlightsScraper(api_key="YOUR_API_KEY")

data = scraper.search(
    query="Flights to Tokyo from San Francisco",
    ai_extract_rules={
        "flights": {
            "description": "every flight result on the page",
            "type": "list",
            "output": {
                "airline": "name of the airline",
                "price": "ticket price in dollars",
                "departure_time": "departure time",
                "arrival_time": "arrival time",
                "duration": "total trip duration",
                "stops": "number of stops",
            },
        },
    },
)

for flight in data.get("flights", []):
    print(flight["airline"], flight["price"], flight["stops"])
```

The `description`, `type`, and `output` keys follow ScrapingBee's documented extraction schema. `type` accepts `string`, `list`, `number`, `boolean`, and `item`.

## Waiting for fares to load

Google Flights sometimes streams results in after first paint. Use a `js_scenario` to wait or scroll before the page is captured. A scenario runs up to 40 seconds.

```python
html = scraper.search(
    query="Flights to Rome from Boston",
    js_scenario={
        "instructions": [
            {"wait": 3000},
            {"scroll_y": 1000},
            {"wait": 1000},
        ],
    },
)
```

## Configuration options

| Argument | API parameter | Description |
|---|---|---|
| `query` | `url` (`?q=`) | Natural-language flight search appended to the Google Flights URL |
| `url` | `url` | A full Google Flights URL, used instead of `query` |
| `render_js` | `render_js` | Execute page JavaScript (default `True`) |
| `premium_proxy` | `premium_proxy` | Residential proxies (default `True`) |
| `stealth_proxy` | `stealth_proxy` | Stealth tier for the hardest blocks |
| `country_code` | `country_code` | ISO country code, needs `premium_proxy=True` |
| `ai_extract_rules` | `ai_extract_rules` | Natural-language extraction, returns JSON, adds 5 credits |
| `extract_rules` | `extract_rules` | CSS or XPath extraction rules |
| `js_scenario` | `js_scenario` | Script waits, scrolls, and clicks before capture |
| `wait` | `wait` | Fixed wait in milliseconds |
| `screenshot_full_page` | `screenshot_full_page` | Return a full-page screenshot as bytes |
| `json_response` | `json_response` | Wrap the response in a JSON envelope |

## What you get back

- Default: the rendered HTML of the Google Flights page as a string.
- With `ai_extract_rules` or `extract_rules`: parsed JSON matching the schema you defined.
- With `screenshot_full_page=True`: raw PNG bytes.

## Production use cases

This google flights scraper fits cleanly into:

- Fare-tracking jobs that alert when a route drops below a threshold
- Competitive pricing dashboards for travel agencies and OTAs
- Route and demand research across markets
- Data pipelines feeding a warehouse or a notebook for analysis

## Pricing

ScrapingBee bills successful requests. A request that fails with HTTP 500 is not charged. Scraping a Google URL through the HTML API is a flat rate, and toggling JS does not change it:

- Classic or Premium proxy: 20 credits per request
- Stealth proxy: 75 credits per request
- `ai_extract_rules`: adds 5 credits

Current rate card: [scrapingbee.com/pricing](https://www.scrapingbee.com/pricing/).

## FAQ

**Is there an official Google Flights API?**
No. Google does not offer a public Google Flights API for fares, so a scraper API that renders the public page is the practical route. This package wraps that approach.

**Why not parse the HTML myself?**
You can, but Google Flights uses obfuscated, rotating class names. Defining `ai_extract_rules` is more durable than maintaining selectors that break every few weeks.

**Can I target a specific country or currency view?**
Yes. Set `country_code` together with `premium_proxy=True`. The country code has no effect without a premium proxy.

**Does it handle the Google consent page?**
Yes. The client sends the Google `CONSENT` cookie by default. Disable it with `skip_consent=False`.

## Documentation

- [Google Flights scraper API](https://www.scrapingbee.com/scrapers/google-flights-scraper/)
- [ScrapingBee documentation](https://www.scrapingbee.com/documentation/)
- [Data extraction rules](https://www.scrapingbee.com/documentation/data-extraction/)
- [ScrapingBee pricing](https://www.scrapingbee.com/pricing/)

## License

MIT

## Disclaimer

This is an unofficial Python client built on top of the ScrapingBee web scraping API. It is not affiliated with ScrapingBee or Google. Scrape only public pages, and comply with Google's terms of service and applicable data-protection law.
