Metadata-Version: 2.4
Name: nagooglesearch-playwright
Version: 1.2
Summary: Not another Google searching tool.
Author: Ivan Sincek
Project-URL: Homepage, https://github.com/ivan-sincek/nagooglesearch-playwright
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Requires-Python: >=3.10
Description-Content-Type: text/markdown
Requires-Dist: bot_safe_agents>=1.0
Requires-Dist: requests>=2.31.0
Requires-Dist: bs4>=0.0.1
Requires-Dist: beautifulsoup4>=4.12.3
Requires-Dist: playwright>=1.47.0

# Not Another Google Search - Playwright

Not another Google searching library. Just kidding - it is.

Made for educational purposes. I hope it will help!

## Table of Contents

* [How to Install](#how-to-install)
	* [Install Playwright and Chromium](#install-playwright-and-chromium)
	* [Standard Install](#standard-install)
	* [Build and Install From the Source](#build-and-install-from-the-source)
* [Usage](#usage)
	* [Standard](#standard)
	* [Shortest Possible](#shortest-possible)
	* [Time Sensitive Search](#time-sensitive-search)
	* [User Agents](#user-agents)

## How to Install

### Install Playwright and Chromium

```bash
pip3 install --upgrade playwright

playwright install chromium
```

Make sure each time you upgrade your Playwright dependency to re-install Chromium; otherwise, you might get an error using the headless browser.

### Standard Install

```bash
pip3 install nagooglesearch-playwright

pip3 install --upgrade nagooglesearch-playwright
```

### Build and Install From the Source

```bash
git clone https://github.com/ivan-sincek/nagooglesearch-playwright && cd nagooglesearch-playwright

python3 -m pip install --upgrade build

python3 -m build

python3 -m pip install dist/nagooglesearch_playwright-1.2-py3-none-any.whl
```

## Usage

### Standard

Default values:

```python
nagooglesearch_playwright.GoogleClient(
	tld = "com",
	homepage_parameters = {
		"btnK": "Google+Search",
		"source": "hp"
	},
	search_parameters = {
	},
	cookies = {
	},
	user_agent = "",
	proxy = "",
	max_results = 100,
	min_sleep = 8,
	max_sleep = 18,
	consent_selector = "xpath=//img[@alt='Google']/../../following-sibling::div[2]/div/button[1]",
	headless = True,
	humanize = False,
	debug = False
)
```

**Only domains without they keyword `google` and not ending with the keyword `goo.gl` are accepted as valid results. The final output is a unique and sorted list of URLs.**

Example, standard:

```python
import nagooglesearch_playwright, asyncio

# the following query string parameters are set only if 'start' query string parameter is not set or is equal to zero
# simulate a homepage search
homepage_parameters = {
	"btnK": "Google+Search",
	"source": "hp"
}

# search the internet for additional query string parameters
# https://brightdata.com/blog/web-data/google-search-url-parameters
search_parameters = {
	"q": "site:*.example.com intext:password", # search query
	"tbs": "li:1", # specify 'li:1' for verbatim search (no alternate spellings, etc.)
	"hl": "en",
	"lr": "lang_en",
	"cr": "countryUS",
	"udm": "14", # only web results
	"filter": "0", # specify '0' to display hidden results
	"safe": "images" # specify 'images' to turn off safe search, or specify 'active' to turn on safe search
}

# specify custom cookies here
cookies = {
}

client = nagooglesearch_playwright.GoogleClient(
	tld = "com", # top level domain, e.g., www.google.com or www.google.hr
	homepage_parameters = homepage_parameters, # 'search_parameters' will override 'homepage_parameters'
	search_parameters = search_parameters,
	cookies = cookies,
	user_agent = "curl/3.30.1", # a random user agent will be set if none is provided
	proxy = "socks5://127.0.0.1:9050", # supported URL schemes are 'http[s]', 'socks4[h]', and 'socks5[h]'
	max_results = 200, # maximum unique URLs to return
	min_sleep = 15, # minimum sleep between page requests
	max_sleep = 30, # maximum sleep between page requests
	consent_selector = "xpath=//img[@alt='Google']/../../following-sibling::div[2]/div/button[1]", # 'button[1]' rejects all, 'button[2]' accepts all
	headless = False, # show the web browser
	humanize = True, # enable human-like web browser interactions
	debug = True # enable debug output
)

urls = asyncio.run(client.search())

if client.get_error() == nagooglesearch_playwright.Error.PLAYWRIGHT:
	print("[ Playwright Exception ]")
	# do something
elif client.get_error() == nagooglesearch_playwright.Error.REQUEST:
	print("[ Request Exception ]")
	# do something
elif client.get_error() == nagooglesearch_playwright.Error.RATE_LIMIT:
	print("[ HTTP 429 Too Many Requests ]")
	# do something

for url in urls:
	print(url)
	# do something
```

Check the list of user agents [here](https://github.com/ivan-sincek/bot-safe-agents/blob/main/src/bot_safe_agents/user_agents.txt). For more user agents, check [scrapeops.io](https://scrapeops.io).

### Shortest Possible

Example, shortest possible:

```python
import nagooglesearch_playwright, asyncio

urls = asyncio.run(nagooglesearch_playwright.GoogleClient(search_parameters = {"q": "site:*.example.com intext:password"}).search())

# do something
```

### Time Sensitive Search

Example, do not show results older than 6 months:

```python
import nagooglesearch_playwright, dateutil.relativedelta as relativedelta

def get_tbs(months: int):
	today = datetime.datetime.today()
	return nagooglesearch_playwright.get_tbs(today, today - relativedelta.relativedelta(months = months))

search_parameters = {
	"tbs": get_tbs(6)
}

# do something
```

### User Agents

Example, get all user agents:

```python
import nagooglesearch_playwright

user_agents = nagooglesearch_playwright.get_all_user_agents()
print(user_agents)

# do something
```

Example, get a random user agent:

```python
import nagooglesearch_playwright

user_agent = nagooglesearch_playwright.get_random_user_agent()
print(user_agent)

# do something
```
