Metadata-Version: 2.4
Name: geonode-scraper-cli
Version: 0.1.0
Summary: Command-line interface for the Geonode Scraper API
Author: Geonode Team
License-Expression: MIT
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Typing :: Typed
Requires-Python: >=3.10
Description-Content-Type: text/markdown
Requires-Dist: geonode-scraper-tools-core>=0.3.1
Requires-Dist: typer>=0.12
Requires-Dist: rich>=13.0
Requires-Dist: PyYAML>=6.0
Requires-Dist: tomli>=2.0; python_version < "3.11"
Provides-Extra: dev
Requires-Dist: pytest>=8.0; extra == "dev"
Requires-Dist: pytest-cov>=5.0; extra == "dev"
Requires-Dist: ruff>=0.12.11; extra == "dev"

# Geonode Scraper CLI

`gscraper` is the command-line interface for the Geonode Scraper API. It is a thin
presentation layer over [`geonode-scraper-tools-core`](../python-agent-tools-core):
commands parse flags, resolve configuration, call a stable service method, and
render the result. All domain logic (validation, polling, retries) lives in the
service layer, not in the CLI.

## Requirements

- Python 3.10+
- Works on **Linux**, **macOS**, and **Windows**

## Installation

**Recommended — install as a standalone tool with [pipx](https://pipx.pypa.io):**

```sh
pipx install geonode-scraper-cli
```

`pipx` installs `gscraper` into its own isolated virtual environment and puts it
on your `PATH`, so it never conflicts with other Python projects. This is the
preferred way to install CLI tools globally.

**Alternative — install with pip:**

```sh
pip install geonode-scraper-cli
```

> **Windows note:** on Windows the `gscraper` command is placed in the Python
> Scripts folder (e.g. `%APPDATA%\Python\Python3xx\Scripts`). If the command is
> not found after installation, add that folder to your `PATH`, or use
> `python -m geonode_scraper_cli` as a fallback. `pipx` handles this
> automatically and is the simpler choice on Windows.

## Configuration

Configuration is resolved with the following precedence (highest first):

1. Command-line flags (`--api-key`, `--host`, ...)
2. Environment variables (`GEONODE_SCRAPER_API_KEY`, `GEONODE_SCRAPER_HOST`,
   `GEONODE_SCRAPER_VERIFY_SSL`, `GEONODE_SCRAPER_TIMEOUT`, `GEONODE_SCRAPER_PROFILE`)
3. A TOML config file at `~/.config/geonode-scraper/config.toml`
4. Built-in defaults

Prefer environment variables or the config file for your API key — passing
`--api-key` on the command line can leak it into your shell history.

Example `~/.config/geonode-scraper/config.toml`:

```toml
[default]
host = "https://api.example.com"
api_key = "your-api-key"
verify_ssl = true

[staging]
host = "https://staging.example.com"
api_key = "your-staging-key"
```

Select a non-default profile with `--profile staging` or
`GEONODE_SCRAPER_PROFILE=staging`. Inspect the active configuration with:

```sh
gscraper config path     # print the config file location
gscraper config show     # show profiles (API keys masked)
```

## Output

Commands print a human-readable summary by default. Use `--json` or `--yaml`
to print the raw result envelope for scripting. These flags can appear either
before the subcommand (global position) or after it (per-command position) —
both work:

```sh
gscraper extract https://example.com --json | jq -r .result.data.markdown
gscraper --json extract https://example.com | jq -r .result.data.markdown
```

The JSON/YAML envelope has the shape `{ "ok": bool, "operation": str, "result": {...} }`
on success, or `{ "ok": false, "operation": str, "error": {...} }` on failure.

## Commands

```text
gscraper extract URL [--format markdown|html] [--render-js] [--async] \
                     [--proxy-country US] [--proxy-type residential] \
                     [--header "K: V"] [--output out.md]

gscraper jobs get JOB_ID
gscraper jobs list [--status completed] [--url ...] [--page N]
gscraper jobs wait JOB_ID [--timeout S] [--interval S]

gscraper batch create URL [URL ...] [--format markdown]
gscraper batch status JOB_ID
gscraper batch wait JOB_ID [--timeout S] [--interval S]
gscraper batch list [--status ...]
gscraper batch cancel JOB_ID

gscraper crawl create URL [--depth 2] [--limit 50] [--include-subdomains]
gscraper crawl status JOB_ID
gscraper crawl wait JOB_ID
gscraper crawl list [--url ...]
gscraper crawl cancel JOB_ID

gscraper map run URL [--search term] [--no-subdomains]   # primary action
gscraper map jobs list                                   # inspect past map jobs
gscraper map jobs get JOB_ID

gscraper stats [--start-date ISO] [--end-date ISO]
gscraper health
```

Run `gscraper --help` or `gscraper <command> --help` for full details.

## Exit codes

| Code | Meaning |
| ---- | ------- |
| 0 | Success |
| 1 | Generic error |
| 2 | Usage / invalid arguments |
| 4 | Authentication / authorization (401, 403) |
| 5 | Not found (404) |
| 6 | Validation error (422) |
| 7 | Network / connection error |
| 8 | Polling timeout (`wait` commands) |

## Shell completion

```sh
gscraper --install-completion
```
