API

class sneakpeek.server.SneakpeekServer(worker: Optional[sneakpeek.worker.WorkerABC] = None, scheduler: Optional[sneakpeek.scheduler.SchedulerABC] = None, api: Optional[fastapi_jsonrpc.API] = None, api_port: int = 8080, expose_metrics: bool = True, metrics_port: int = 9090)

Bases: object

Sneakpeek server. It can run multiple services at once:

  • API - allows interactions with scrapers storage and scrapers via JsonRPC or UI

  • Worker - executes scheduled scrapers

  • Scheduler - automatically schedules scrapers that are stored in the storage

Parameters
  • worker (WorkerABC | None, optional) – Worker that consumes scraper jobs queue. Defaults to None.

  • scheduler (SchedulerABC | None, optional) – Scrapers scheduler. Defaults to None.

  • api (jsonrpc.API | None, optional) – API to interact with the system. Defaults to None.

  • api_port (int, optional) – Port which is used for API and UI. Defaults to 8080.

  • expose_metrics (bool, optional) – Whether to expose metrics (prometheus format). Defaults to True.

  • metrics_port (int, optional) – Port which is used to expose metric. Defaults to 9090.

Return type

None

static create(handlers: list[sneakpeek.scraper_handler.ScraperHandler], scrapers_storage: sneakpeek.lib.storage.base.ScrapersStorage, jobs_storage: sneakpeek.lib.storage.base.ScraperJobsStorage, lease_storage: sneakpeek.lib.storage.base.LeaseStorage, with_api: bool = True, with_worker: bool = True, with_scheduler: bool = True, expose_metrics: bool = True, worker_max_concurrency: int = 50, api_port: int = 8080, scheduler_storage_poll_delay: datetime.timedelta = datetime.timedelta(seconds=5), scheduler_lease_duration: datetime.timedelta = datetime.timedelta(seconds=60), plugins: Optional[list[sneakpeek.scraper_context.BeforeRequestPlugin | sneakpeek.scraper_context.AfterResponsePlugin]] = None, metrics_port: int = 9090)

Create Sneakpeek server using default API, worker and scheduler implementations

Parameters
  • handlers (list[ScraperHandler]) – List of handlers that implement scraper logic

  • scrapers_storage (ScrapersStorage) – Scrapers storage

  • jobs_storage (ScraperJobsStorage) – Jobs storage

  • lease_storage (LeaseStorage) – Lease storage

  • run_api (bool, optional) – Whether to run API service. Defaults to True.

  • run_worker (bool, optional) – Whether to run worker service. Defaults to True.

  • run_scheduler (bool, optional) – Whether to run scheduler service. Defaults to True.

  • expose_metrics (bool, optional) – Whether to expose metrics (prometheus format). Defaults to True.

  • worker_max_concurrency (int, optional) – Maximum number of concurrently executed scrapers. Defaults to 50.

  • api_port (int, optional) – Port which is used for API and UI. Defaults to 8080.

  • scheduler_storage_poll_delay (timedelta, optional) – How much scheduler wait before polling storage for scrapers updates. Defaults to 5 seconds.

  • scheduler_lease_duration (timedelta, optional) – How long scheduler lease lasts. Lease is required for scheduler to be able to create new scraper jobs. This is needed so at any point of time there’s only one active scheduler instance. Defaults to 1 minute.

  • plugins (list[Plugin] | None, optional) – List of plugins that will be used by scraper runner. Can be omitted if run_worker is False. Defaults to None.

  • metrics_port (int, optional) – Port which is used to expose metric. Defaults to 9090.

  • with_api (bool) –

  • with_worker (bool) –

  • with_scheduler (bool) –

serve(loop: Optional[asyncio.events.AbstractEventLoop] = None, blocking: bool = True) None

Start Sneakpeek server

Parameters
  • loop (asyncio.AbstractEventLoop | None, optional) – AsyncIO loop to use. In case it’s None result of asyncio.get_event_loop() will be used. Defaults to None.

  • blocking (bool, optional) – Whether to block thread while server is running. Defaults to True.

Return type

None

stop(loop: Optional[asyncio.events.AbstractEventLoop] = None) None

Stop Sneakpeek server

Parameters

loop (asyncio.AbstractEventLoop | None, optional) – AsyncIO loop to use. In case it’s None result of asyncio.get_event_loop() will be used. Defaults to None.

Return type

None

class sneakpeek.scraper_config.ScraperConfig(*, params: dict[str, typing.Any] | None = None, plugins: dict[str, typing.Any] | None = None)

Bases: pydantic.main.BaseModel

Scraper configuration

Parameters
  • params (dict[str, typing.Any] | None) –

  • plugins (dict[str, typing.Any] | None) –

Return type

None

params

Scraper configuration that is passed to the handler. Defaults to None.

Type

dict[str, Any] | None

plugins

Plugins configuration that defines which plugins to use (besides global ones). Takes precedence over global plugin configuration. Defaults to None.

Type

dict[str, Any] | None

Create a new model by parsing and validating input data from keyword arguments.

Raises ValidationError if the input data cannot be parsed to form a valid model.

class sneakpeek.scraper_context.AfterResponsePlugin

Bases: abc.ABC

Abstract class for the plugin which is called after each request

abstract async after_response(request: sneakpeek.scraper_context.Request, response: aiohttp.client_reqrep.ClientResponse, config: Optional[Any] = None) aiohttp.client_reqrep.ClientResponse

Function that is called on each (HTTP) response before its result returned to the caller.

Parameters
  • request (Request) – Request metadata

  • response (aiohttp.ClientResponse) – HTTP Response

  • config (Any | None, optional) – Plugin configuration. Defaults to None.

Returns

HTTP Response

Return type

aiohttp.ClientResponse

abstract property name: str

Name of the plugin

class sneakpeek.scraper_context.BeforeRequestPlugin

Bases: abc.ABC

Abstract class for the plugin which is called before each request (like Middleware)

abstract async before_request(request: sneakpeek.scraper_context.Request, config: Optional[Any] = None) sneakpeek.scraper_context.Request

Function that is called on each (HTTP) request before its dispatched.

Parameters
  • request (Request) – Request metadata

  • config (Any | None, optional) – Plugin configuration. Defaults to None.

Returns

Request metadata

Return type

Request

abstract property name: str

Name of the plugin

class sneakpeek.scraper_context.HttpMethod(value)

Bases: str, enum.Enum

HTTP method

class sneakpeek.scraper_context.Request(method: sneakpeek.scraper_context.HttpMethod, url: str, headers: Optional[dict[str, str]] = None, kwargs: Optional[dict[str, typing.Any]] = None)

Bases: object

HTTP Request metadata

Parameters
Return type

None

class sneakpeek.scraper_context.ScraperContext(config: sneakpeek.scraper_config.ScraperConfig, plugins: Optional[list[sneakpeek.scraper_context.BeforeRequestPlugin | sneakpeek.scraper_context.AfterResponsePlugin]] = None, ping_session_func: Optional[Callable] = None)

Bases: object

Scraper context - helper class that implements basic HTTP client which logic can be extended by plugins that can preprocess request (e.g. Rate Limiter) and postprocess response (e.g. Response logger).

Parameters
  • config (ScraperConfig) – Scraper configuration

  • plugins (list[BeforeRequestPlugin | AfterResponsePlugin] | None, optional) – List of available plugins. Defaults to None.

  • ping_session_func (Callable | None, optional) – Function that pings scraper job. Defaults to None.

Return type

None

async delete(url: str, *, headers: Optional[dict[str, str]] = None, **kwargs) aiohttp.client_reqrep.ClientResponse

Make DELETE request to the given URL

Parameters
  • url (str) – URL to send DELETE request to

  • headers (HttpHeaders | None, optional) – HTTP headers. Defaults to None.

  • **kwargs – See aiohttp.delete() for the full list of arguments

Return type

aiohttp.client_reqrep.ClientResponse

async get(url: str, *, headers: Optional[dict[str, str]] = None, **kwargs) aiohttp.client_reqrep.ClientResponse

Make GET request to the given URL

Parameters
  • url (str) – URL to send GET request to

  • headers (HttpHeaders | None, optional) – HTTP headers. Defaults to None.

  • **kwargs – See aiohttp.get() for the full list of arguments

Return type

aiohttp.client_reqrep.ClientResponse

async head(url: str, *, headers: Optional[dict[str, str]] = None, **kwargs) aiohttp.client_reqrep.ClientResponse

Make HEAD request to the given URL

Parameters
  • url (str) – URL to send HEAD request to

  • headers (HttpHeaders | None, optional) – HTTP headers. Defaults to None.

  • **kwargs – See aiohttp.head() for the full list of arguments

Return type

aiohttp.client_reqrep.ClientResponse

async options(url: str, *, headers: Optional[dict[str, str]] = None, **kwargs) aiohttp.client_reqrep.ClientResponse

Make OPTIONS request to the given URL

Parameters
  • url (str) – URL to send OPTIONS request to

  • headers (HttpHeaders | None, optional) – HTTP headers. Defaults to None.

  • **kwargs – See aiohttp.options() for the full list of arguments

Return type

aiohttp.client_reqrep.ClientResponse

async ping_session() None

Ping scraper job, so it’s not considered dead

Return type

None

async post(url: str, *, headers: Optional[dict[str, str]] = None, **kwargs) aiohttp.client_reqrep.ClientResponse

Make POST request to the given URL

Parameters
  • url (str) – URL to send POST request to

  • headers (HttpHeaders | None, optional) – HTTP headers. Defaults to None.

  • **kwargs – See aiohttp.get() for the full list of arguments

Return type

aiohttp.client_reqrep.ClientResponse

async put(url: str, *, headers: Optional[dict[str, str]] = None, **kwargs) aiohttp.client_reqrep.ClientResponse

Make PUT request to the given URL

Parameters
  • url (str) – URL to send PUT request to

  • headers (HttpHeaders | None, optional) – HTTP headers. Defaults to None.

  • **kwargs – See aiohttp.put() for the full list of arguments

Return type

aiohttp.client_reqrep.ClientResponse

class sneakpeek.scraper_handler.ScraperHandler

Bases: abc.ABC

Abstract class that scraper logic handler must implement

abstract property name: str

Name of the handler

abstract async run(context: sneakpeek.scraper_context.ScraperContext) str

Execute scraper logic

Parameters

context (ScraperContext) – Scraper context

Returns

scraper result that will be persisted in the storage (should be relatively small information to give sense on job result)

Return type

str

class sneakpeek.lib.models.Lease(*, name: str, owner_id: str, acquired: datetime.datetime, acquired_until: datetime.datetime)

Bases: pydantic.main.BaseModel

Lease metadata

Create a new model by parsing and validating input data from keyword arguments.

Raises ValidationError if the input data cannot be parsed to form a valid model.

Parameters
  • name (str) –

  • owner_id (str) –

  • acquired (datetime.datetime) –

  • acquired_until (datetime.datetime) –

Return type

None

acquired: datetime.datetime

Time when the lease was acquired

acquired_until: datetime.datetime

Time until the lease is acquired

name: str

Lease name (resource name to be locked)

owner_id: str

ID of the acquirer (should be the same if you already have the lease and want to prolong it)

class sneakpeek.lib.models.Scraper(*, id: int, name: str, schedule: sneakpeek.lib.models.ScraperSchedule, schedule_crontab: str | None = None, handler: str, config: sneakpeek.scraper_config.ScraperConfig, schedule_priority: sneakpeek.lib.models.ScraperJobPriority = ScraperJobPriority.NORMAL)

Bases: pydantic.main.BaseModel

Scraper metadata

Create a new model by parsing and validating input data from keyword arguments.

Raises ValidationError if the input data cannot be parsed to form a valid model.

Parameters
Return type

None

config: sneakpeek.scraper_config.ScraperConfig

Scraper configuration that is passed to the handler

handler: str

Name of the scraper handler that implements scraping logic

id: int

Scraper unique identifier

name: str

Scraper name

schedule: sneakpeek.lib.models.ScraperSchedule

Scraper schedule configuration

schedule_crontab: str | None

Must be defined if schedule equals to CRONTAB

schedule_priority: sneakpeek.lib.models.ScraperJobPriority

Default priority to enqueue scraper jobs with

class sneakpeek.lib.models.ScraperJob(*, id: int, scraper: sneakpeek.lib.models.Scraper, status: sneakpeek.lib.models.ScraperJobStatus, priority: sneakpeek.lib.models.ScraperJobPriority, created_at: datetime.datetime, started_at: datetime.datetime | None = None, last_active_at: datetime.datetime | None = None, finished_at: datetime.datetime | None = None, result: str | None = None)

Bases: pydantic.main.BaseModel

Scraper job metadata

Create a new model by parsing and validating input data from keyword arguments.

Raises ValidationError if the input data cannot be parsed to form a valid model.

Parameters
Return type

None

created_at: datetime.datetime

When the job was created and enqueued

finished_at: datetime.datetime | None

When the job finished

id: int

Job unique identifier

last_active_at: datetime.datetime | None

When the job last sent heartbeat

priority: sneakpeek.lib.models.ScraperJobPriority

Scraper job priority

result: str | None

Information with the job result (should be rather small and should summarize the outcome of the scraping)

scraper: sneakpeek.lib.models.Scraper

Scraper metadata

started_at: datetime.datetime | None

When the job was dequeued and started being processed by the worker

status: sneakpeek.lib.models.ScraperJobStatus

Scraper job status

class sneakpeek.lib.models.ScraperJobPriority(value)

Bases: enum.Enum

Priority of the scraper job

HIGH = 1
NORMAL = 2
UTMOST = 0
class sneakpeek.lib.models.ScraperJobStatus(value)

Bases: str, enum.Enum

Scraper job status

DEAD = 'dead'

Scraper job was inactive for a while, so scheduler marked it as dead and scheduler can schedule scraper again

FAILED = 'failed'

Scraper job failed

KILLED = 'killed'

Scraper job was killed by the user

PENDING = 'pending'

Scraper job is in the queue

STARTED = 'started'

Scraper job was dequeued by the worker and is being processed

SUCCEEDED = 'succeeded'

Scraper job succeeded

class sneakpeek.lib.models.ScraperSchedule(value)

Bases: str, enum.Enum

Scraper schedule options. Note that it’s disallowed to have 2 concurrent scraper jobs, so if there’s an active scraper job new job won’t be scheduled

CRONTAB = 'crontab'

Specify crontab when scraper should be scheduled

EVERY_DAY = 'every_day'

Scraper will be scheduled every day

EVERY_HOUR = 'every_hour'

Scraper will be scheduled every hour

EVERY_MINUTE = 'every_minute'

Scraper will be scheduled every minute

EVERY_MONTH = 'every_month'

Scraper will be scheduled every month

EVERY_SECOND = 'every_second'

Scraper will be scheduled every second

EVERY_WEEK = 'every_week'

Scraper will be scheduled every week

INACTIVE = 'inactive'

Scraper won’t be automatically scheduled

class sneakpeek.scheduler.Scheduler(scrapers_storage: sneakpeek.lib.storage.base.ScrapersStorage, jobs_storage: sneakpeek.lib.storage.base.ScraperJobsStorage, lease_storage: sneakpeek.lib.storage.base.LeaseStorage, queue: sneakpeek.lib.queue.QueueABC, storage_poll_frequency: datetime.timedelta = datetime.timedelta(seconds=5), lease_duration: datetime.timedelta = datetime.timedelta(seconds=60), jobs_to_keep: int = 100)

Bases: sneakpeek.scheduler.SchedulerABC

Sneakpeeker scheduler - schedules scrapers and performs maintenance jobs. Uses APScheduler under the hood.

Initialize scheduler

Parameters
  • scrapers_storage (ScrapersStorage) – Scrapers storage

  • jobs_storage (ScraperJobsStorage) – Jobs storage

  • lease_storage (LeaseStorage) – Lease storage

  • queue (Queue) – Sneakpeek queue implementation

  • storage_poll_frequency (timedelta, optional) – How much scheduler wait before polling storage for scrapers updates. Defaults to 5 seconds.

  • lease_duration (timedelta, optional) – How long scheduler lease lasts. Lease is required for scheduler to be able to create new scraper jobs. This is needed so at any point of time there’s only one active scheduler instance. Defaults to 1 minute.

  • jobs_to_keep (int, optional) – Maximum number of historical scraper jobs to keep in the storage. Storage is cleaned up every 10 minutes. Defaults to 100.

Return type

None

class sneakpeek.scheduler.SchedulerABC

Bases: abc.ABC

class sneakpeek.worker.Worker(runner: sneakpeek.runner.RunnerABC, queue: sneakpeek.lib.queue.QueueABC, loop: Optional[asyncio.events.AbstractEventLoop] = None, max_concurrency: int = 50)

Bases: sneakpeek.worker.WorkerABC

Sneakpeeker worker - consumes scraper jobs queue and executes scapers logic

Parameters
  • runner (RunnerABC) – Scraper runner

  • queue (Queue) – Sneakpeek queue implementation

  • loop (asyncio.AbstractEventLoop | None, optional) – AsyncIO loop to use. In case it’s None result of asyncio.get_event_loop() will be used. Defaults to None.

  • max_concurrency (int, optional) – Maximum number of concurrent scraper jobs. Defaults to 50.

Return type

None

class sneakpeek.worker.WorkerABC

Bases: abc.ABC

class sneakpeek.runner.Runner(handlers: List[sneakpeek.lib.models.Scraper], queue: sneakpeek.lib.queue.QueueABC, storage: sneakpeek.lib.storage.base.ScraperJobsStorage, plugins: Optional[list[sneakpeek.scraper_context.BeforeRequestPlugin | sneakpeek.scraper_context.AfterResponsePlugin]] = None)

Bases: sneakpeek.runner.RunnerABC

Default scraper jobner implementation

Initialize runner

Parameters
  • handlers (list[ScraperHandler]) – List of handlers that implement scraper logic

  • queue (Queue) – Sneakpeek queue implementation

  • storage (Storage) – Sneakpeek storage implementation

  • plugins (list[Plugin] | None, optional) – List of plugins that will be used by scraper runner. Defaults to None.

Return type

None

async run(job: sneakpeek.lib.models.ScraperJob) None

Execute scraper. Following logic is done:

  • Ping scraper job

  • Build scraper context

  • Execute scraper logic

  • [On success] Set scraper job status to SUCCEEDED

  • [On fail] Set scraper job status to FAILED

  • [If the scraper job was killed] Do nothing

  • Persist scraper job status

Parameters

job (ScraperJob) – Scraper job metadata

Return type

None

class sneakpeek.runner.RunnerABC

Bases: abc.ABC

Scraper runner - manages scraper job lifecycle and runs the scraper logic

abstract async run(job: sneakpeek.lib.models.ScraperJob) None

Execute scraper job

Parameters

job (ScraperJob) – Scraper job metadata

Return type

None

sneakpeek.api.create_api(scrapers_storage: sneakpeek.lib.storage.base.ScrapersStorage, jobs_storage: sneakpeek.lib.storage.base.ScraperJobsStorage, queue: sneakpeek.lib.queue.QueueABC, handlers: list[sneakpeek.scraper_handler.ScraperHandler]) fastapi_jsonrpc.API

Create JsonRPC API (FastAPI is used under the hood)

Parameters
Return type

fastapi_jsonrpc.API

sneakpeek.api.get_api_entrypoint(scrapers_storage: sneakpeek.lib.storage.base.ScrapersStorage, jobs_storage: sneakpeek.lib.storage.base.ScraperJobsStorage, queue: sneakpeek.lib.queue.Queue, handlers: list[sneakpeek.scraper_handler.ScraperHandler]) fastapi_jsonrpc.Entrypoint

Create public JsonRPC API entrypoint (mostly mimics storage and queue API)

Parameters
Returns

FastAPI JsonRPC entrypoint

Return type

jsonrpc.Entrypoint

sneakpeek.metrics.count_invocations(subsystem: str)

Decorator for measuring number of function invocations (works for both sync and async functions).

@count_invocations(subsytem="my subsystem")
def my_awesome_func():
    ...

This will export following Prometheus counter metrics:

# Total number of invocations
sneakpeek_invocations{subsystem="my subsystem", method="my_awesome_func", type="total", error=""}
# Total number of successful invocations (ones that haven't thrown an exception)
sneakpeek_invocations{subsystem="my subsystem", method="my_awesome_func", type="success", error=""}
# Total number of failed invocations (ones that have thrown an exception)
sneakpeek_invocations{subsystem="my subsystem", method="my_awesome_func", type="error", error="<Exception class name>"}
Parameters

subsystem (str) – Subsystem name to be used in the metric annotation

sneakpeek.metrics.measure_latency(subsystem: str)

Decorator for measuring latency of the function (works for both sync and async functions).

@measure_latency(subsytem="my subsystem")
def my_awesome_func():
    ...

This will export following Prometheus histogram metric:

sneakpeek_latency{subsystem="my subsystem", method="my_awesome_func"}
Parameters

subsystem (str) – Subsystem name to be used in the metric annotation

class sneakpeek.logging.ScraperContextInjectingFilter(name='')

Bases: logging.Filter

Scraper context filter which automatically injects scraper and scraper job IDs to the logging metadata.

Example of usage:

logger = logging.getLogger()
handler = logging.StreamHandler()
handler.addFilter(ScraperContextInjectingFilter())
logger.addHandler(handler)

Initialize a filter.

Initialize with the name of the logger which, together with its children, will have its events allowed through the filter. If no name is specified, allow every event.

filter(record: logging.LogRecord) bool

Injects scraper metadata into log record:

  • scraper_job_id - Scraper Job ID

  • scraper_id - Scraper ID

  • scraper_name - Scraper name

  • scraper_handler - Scraper logic implementation

  • scraper_job_human_name - Formatted scraper job ID (<name>::<scraper_id>::<scraper_job_id>)

Parameters

record (logging.LogRecord) – Log record to inject metadata into

Returns

Always True

Return type

bool

sneakpeek.logging.configure_logging(level: int = 20)

Helper function to configure logging:

  • Adds console logger to the root logger

  • Adds scraper context injector filter to the console logger

  • Configures console formatting to use scraper metadata

Parameters

level (int, optional) – Minimum logging level. Defaults to logging.INFO.

sneakpeek.logging.scraper_job_context(scraper_job: sneakpeek.lib.models.ScraperJob) None

Initialize scraper job logging context which automatically adds scraper and scraper job IDs to the logging metadata

Parameters

scraper_job (ScraperJob) – Scraper job definition

Return type

None

exception sneakpeek.lib.errors.ScraperHasActiveRunError(data=None)

Bases: fastapi_jsonrpc.BaseError

exception sneakpeek.lib.errors.ScraperJobNotFoundError(data=None)

Bases: fastapi_jsonrpc.BaseError

exception sneakpeek.lib.errors.ScraperJobPingFinishedError(data=None)

Bases: fastapi_jsonrpc.BaseError

exception sneakpeek.lib.errors.ScraperJobPingNotStartedError(data=None)

Bases: fastapi_jsonrpc.BaseError

exception sneakpeek.lib.errors.ScraperNotFoundError(data=None)

Bases: fastapi_jsonrpc.BaseError

exception sneakpeek.lib.errors.UnknownScraperHandlerError(data=None)

Bases: fastapi_jsonrpc.BaseError

class sneakpeek.lib.queue.Queue(scrapers_storage: sneakpeek.lib.storage.base.ScrapersStorage, scraper_jobs_storage: sneakpeek.lib.storage.base.ScraperJobsStorage, dead_timeout: datetime.timedelta = datetime.timedelta(seconds=300))

Bases: object

Default priority queue implementation

Parameters
  • scrapers_storage (ScrapersStorage) – Scrapers storage

  • scraper_jobs_storage (ScraperJobsStorage) – Scraper jobs storage

  • dead_timeout (timedelta, optional) – If the scraper job hasn’t pinged for the given time period, the job will be marked as dead. Defaults to 5 minute.

Return type

None

class sneakpeek.lib.queue.QueueABC

Bases: abc.ABC

Sneakpeek scraper job priority queue

abstract async dequeue() sneakpeek.lib.models.ScraperJob | None

Try to dequeue a job from the queue.

Returns

Scraper job metadata if queue wasn’t empty or None otherwise

Return type

ScraperJob

abstract async enqueue(scraper_id: int, priority: sneakpeek.lib.models.ScraperJobPriority) sneakpeek.lib.models.ScraperJob

Enqueue scraper job.

Parameters
  • scraper_id (int) – ID of the scraper to enqueue

  • priority (ScraperJobPriority) – Priority of the job to enqueue

Returns

Scraper job metadata

Return type

ScraperJob

Raises
abstract async get_queue_len(priority: sneakpeek.lib.models.ScraperJobPriority) int
Parameters

priority (ScraperJobPriority) – Queue priority

Returns

Number of pending items in the queue

Return type

int

abstract async kill_dead_scraper_jobs(scraper_id: int) list[sneakpeek.lib.models.ScraperJob]

Kill dead scraper jobs for the given scraper

Parameters

scraper_id (int) – Scraper ID to kill jobs for

Returns

List of dead scraper jobs

Return type

list[ScraperJob]

abstract async ping_scraper_job(scraper_id: int, scraper_job_id: int) sneakpeek.lib.models.ScraperJob

Send a heartbeat for the scraper job

Parameters
  • scraper_id (int) – Scraper ID

  • scraper_job_id (int) – Scraper job ID

Returns

Update scraper job metadata

Return type

ScraperJob

Raises
class sneakpeek.lib.storage.base.LeaseStorage

Bases: abc.ABC

Sneakpeeker lease storage abstract class

abstract async maybe_acquire_lease(lease_name: str, owner_id: str, acquire_for: datetime.timedelta) sneakpeek.lib.models.Lease | None

Try to acquire lease (global lock).

Parameters
  • lease_name (str) – Lease name (resource name to be locked)

  • owner_id (str) – ID of the acquirer (should be the same if you already have the lease and want to prolong it)

  • acquire_for (timedelta) – For how long lease will be acquired

Returns

Lease metadata if it was acquired, None otherwise

Return type

Lease | None

abstract async release_lease(lease_name: str, owner_id: str) None

Release lease (global lock)

Parameters
  • lease_name (str) – Lease name (resource name to be unlocked)

  • owner_id (str) – ID of the acquirer

Return type

None

class sneakpeek.lib.storage.base.ScraperJobsStorage

Bases: abc.ABC

Sneakpeeker storage scraper jobs storage abstract class

abstract async add_scraper_job(scraper_job: sneakpeek.lib.models.ScraperJob) sneakpeek.lib.models.ScraperJob
Parameters

scraper_job (ScraperJob) – scraper job to add

Returns

Created scraper job

Return type

ScraperJob

abstract async delete_old_scraper_jobs(keep_last: int = 50) None

Delete old historical scraper jobs

Parameters

keep_last (int, optional) – How many historical scraper jobs to keep. Defaults to 50.

Return type

None

abstract async dequeue_scraper_job(priority: sneakpeek.lib.models.ScraperJobPriority) sneakpeek.lib.models.ScraperJob | None

Try to dequeue pending scraper job of given priority

Parameters

priority (ScraperJobPriority) – Queue priority

Returns

First pending scraper job or None if the queue is empty

Return type

ScraperJob | None

abstract async get_queue_len(priority: sneakpeek.lib.models.ScraperJobPriority) int

Get number of pending scraper jobs in the queue

Parameters

priority (ScraperJobPriority) – Queue priority

Returns

Number of pending scraper jobs in the queue

Return type

int

abstract async get_scraper_job(scraper_id: int, scraper_job_id: int) sneakpeek.lib.models.ScraperJob

Get scraper job by ID. Throws ScraperNotFoundError if scraper doesn’t exist Throws ScraperJobNotFoundError if scraper job doesn’t exist

Parameters
  • scraper_id (int) – Scraper ID

  • scraper_job_id (int) – scraper job ID

Returns

Found scraper job

Return type

ScraperJob

abstract async get_scraper_jobs(scraper_id: int) List[sneakpeek.lib.models.ScraperJob]
Parameters

scraper_id (int) – Scraper ID

Returns

List of scraper jobs

Return type

List[ScraperJob]

abstract async update_scraper_job(scraper_job: sneakpeek.lib.models.ScraperJob) sneakpeek.lib.models.ScraperJob
Parameters

scraper_job (ScraperJob) – scraper job to update

Returns

Updated scraper job

Return type

ScraperJob

class sneakpeek.lib.storage.base.ScrapersStorage

Bases: abc.ABC

Sneakpeeker storage scraper storage abstract class

abstract async create_scraper(scraper: sneakpeek.lib.models.Scraper) sneakpeek.lib.models.Scraper
Parameters

scraper (Scraper) – Scraper Metadata

Returns

Created scraper

Return type

Scraper

abstract async delete_scraper(id: int) sneakpeek.lib.models.Scraper
Parameters

id (int) – Scraper ID

Returns

Deleted scraper

Return type

Scraper

abstract async get_scraper(id: int) sneakpeek.lib.models.Scraper

Get scraper by ID. Throws ScraperNotFoundError if scraper doesn’t exist

Parameters

id (int) – Scraper ID

Returns

Scraper metadata

Return type

Scraper

abstract async get_scrapers() List[sneakpeek.lib.models.Scraper]
Returns

List of all available scrapers

Return type

List[Scraper]

abstract async is_read_only() bool
Returns

Whether the storage allows modifiying scrapers list and metadata

Return type

bool

abstract async maybe_get_scraper(id: int) sneakpeek.lib.models.Scraper | None

Get scraper by ID. Return None if scraper doesn’t exist

Parameters

id (int) – Scraper ID

Returns

Scraper metadata

Return type

Scraper

abstract async search_scrapers(name_filter: Optional[str] = None, max_items: Optional[int] = None, offset: Optional[int] = None) List[sneakpeek.lib.models.Scraper]

Search scrapers using given filters

Parameters
  • name_filter (str | None, optional) – Search scrapers that have given substring in the name. Defaults to None.

  • max_items (int | None, optional) – Maximum number of items to return. Defaults to None.

  • offset (int | None, optional) – Offset for search results. Defaults to None.

Returns

Found scrapers

Return type

List[Scraper]

abstract async update_scraper(scraper: sneakpeek.lib.models.Scraper) sneakpeek.lib.models.Scraper
Parameters

scraper (Scraper) – Scraper Metadata

Returns

Updated scraper

Return type

Scraper

class sneakpeek.lib.storage.in_memory_storage.InMemoryLeaseStorage

Bases: sneakpeek.lib.storage.base.LeaseStorage

In memory storage for leases. Should only be used for development purposes

Return type

None

async maybe_acquire_lease(lease_name: str, owner_id: str, acquire_for: datetime.timedelta) sneakpeek.lib.models.Lease | None

Try to acquire lease (global lock).

Parameters
  • lease_name (str) – Lease name (resource name to be locked)

  • owner_id (str) – ID of the acquirer (should be the same if you already have the lease and want to prolong it)

  • acquire_for (timedelta) – For how long lease will be acquired

Returns

Lease metadata if it was acquired, None otherwise

Return type

Lease | None

async release_lease(lease_name: str, owner_id: str) None

Release lease (global lock)

Parameters
  • lease_name (str) – Lease name (resource name to be unlocked)

  • owner_id (str) – ID of the acquirer

Return type

None

class sneakpeek.lib.storage.in_memory_storage.InMemoryScraperJobsStorage

Bases: sneakpeek.lib.storage.base.ScraperJobsStorage

In memory storage for scraper jobs. Should only be used for development purposes

Return type

None

async add_scraper_job(scraper_job: sneakpeek.lib.models.ScraperJob) sneakpeek.lib.models.ScraperJob
Parameters

scraper_job (ScraperJob) – scraper job to add

Returns

Created scraper job

Return type

ScraperJob

async delete_old_scraper_jobs(keep_last: int = 50) None

Delete old historical scraper jobs

Parameters

keep_last (int, optional) – How many historical scraper jobs to keep. Defaults to 50.

Return type

None

async dequeue_scraper_job(priority: sneakpeek.lib.models.ScraperJobPriority) sneakpeek.lib.models.ScraperJob | None

Try to dequeue pending scraper job of given priority

Parameters

priority (ScraperJobPriority) – Queue priority

Returns

First pending scraper job or None if the queue is empty

Return type

ScraperJob | None

async get_queue_len(priority: sneakpeek.lib.models.ScraperJobPriority) int

Get number of pending scraper jobs in the queue

Parameters

priority (ScraperJobPriority) – Queue priority

Returns

Number of pending scraper jobs in the queue

Return type

int

async get_scraper_job(scraper_id: int, scraper_job_id: int) sneakpeek.lib.models.ScraperJob

Get scraper job by ID. Throws ScraperNotFoundError if scraper doesn’t exist Throws ScraperJobNotFoundError if scraper job doesn’t exist

Parameters
  • scraper_id (int) – Scraper ID

  • scraper_job_id (int) – scraper job ID

Returns

Found scraper job

Return type

ScraperJob

async get_scraper_jobs(id: int) list[sneakpeek.lib.models.ScraperJob]
Parameters
  • scraper_id (int) – Scraper ID

  • id (int) –

Returns

List of scraper jobs

Return type

List[ScraperJob]

async update_scraper_job(scraper_job: sneakpeek.lib.models.ScraperJob) sneakpeek.lib.models.ScraperJob
Parameters

scraper_job (ScraperJob) – scraper job to update

Returns

Updated scraper job

Return type

ScraperJob

class sneakpeek.lib.storage.in_memory_storage.InMemoryScrapersStorage(scrapers: Optional[list[sneakpeek.lib.models.Scraper]] = None, is_read_only: bool = True)

Bases: sneakpeek.lib.storage.base.ScrapersStorage

In-memory storage implementation

Parameters
  • scrapers (list[Scraper] | None, optional) – List of pre-defined scrapers. Defaults to None.

  • is_read_only (bool, optional) – Whether to allow modifications of the scrapers list. Set to true only for development. Defaults to True.

Return type

None

async create_scraper(scraper: sneakpeek.lib.models.Scraper) sneakpeek.lib.models.Scraper
Parameters

scraper (Scraper) – Scraper Metadata

Returns

Created scraper

Return type

Scraper

async delete_scraper(id: int) sneakpeek.lib.models.Scraper
Parameters

id (int) – Scraper ID

Returns

Deleted scraper

Return type

Scraper

async get_scraper(id: int) sneakpeek.lib.models.Scraper

Get scraper by ID. Throws ScraperNotFoundError if scraper doesn’t exist

Parameters

id (int) – Scraper ID

Returns

Scraper metadata

Return type

Scraper

async get_scrapers() list[sneakpeek.lib.models.Scraper]
Returns

List of all available scrapers

Return type

List[Scraper]

async is_read_only() bool
Returns

Whether the storage allows modifiying scrapers list and metadata

Return type

bool

async maybe_get_scraper(id: int) sneakpeek.lib.models.Scraper | None

Get scraper by ID. Return None if scraper doesn’t exist

Parameters

id (int) – Scraper ID

Returns

Scraper metadata

Return type

Scraper

async search_scrapers(name_filter: str | None = None, max_items: int | None = None, offset: int | None = None) list[sneakpeek.lib.models.Scraper]

Search scrapers using given filters

Parameters
  • name_filter (str | None, optional) – Search scrapers that have given substring in the name. Defaults to None.

  • max_items (int | None, optional) – Maximum number of items to return. Defaults to None.

  • offset (int | None, optional) – Offset for search results. Defaults to None.

Returns

Found scrapers

Return type

List[Scraper]

async update_scraper(scraper: sneakpeek.lib.models.Scraper) sneakpeek.lib.models.Scraper
Parameters

scraper (Scraper) – Scraper Metadata

Returns

Updated scraper

Return type

Scraper

class sneakpeek.lib.storage.redis_storage.RedisLeaseStorage(redis: redis.asyncio.client.Redis)

Bases: sneakpeek.lib.storage.base.LeaseStorage

Redis storage for leases. Should only be used for development purposes

Parameters

redis (Redis) – Async redis client

Return type

None

async maybe_acquire_lease(lease_name: str, owner_id: str, acquire_for: datetime.timedelta) sneakpeek.lib.models.Lease | None

Try to acquire lease (global lock).

Parameters
  • lease_name (str) – Lease name (resource name to be locked)

  • owner_id (str) – ID of the acquirer (should be the same if you already have the lease and want to prolong it)

  • acquire_for (timedelta) – For how long lease will be acquired

Returns

Lease metadata if it was acquired, None otherwise

Return type

Lease | None

async release_lease(lease_name: str, owner_id: str) None

Release lease (global lock)

Parameters
  • lease_name (str) – Lease name (resource name to be unlocked)

  • owner_id (str) – ID of the acquirer

Return type

None

class sneakpeek.lib.storage.redis_storage.RedisScraperJobsStorage(redis: redis.asyncio.client.Redis, scrapers_storage: sneakpeek.lib.storage.base.ScrapersStorage)

Bases: sneakpeek.lib.storage.base.ScraperJobsStorage

Redis storage for scraper jobs. Should only be used for development purposes

Parameters
Return type

None

async add_scraper_job(scraper_job: sneakpeek.lib.models.ScraperJob) sneakpeek.lib.models.ScraperJob
Parameters

scraper_job (ScraperJob) – scraper job to add

Returns

Created scraper job

Return type

ScraperJob

async delete_old_scraper_jobs(keep_last: int = 50) None

Delete old historical scraper jobs

Parameters

keep_last (int, optional) – How many historical scraper jobs to keep. Defaults to 50.

Return type

None

async dequeue_scraper_job(priority: sneakpeek.lib.models.ScraperJobPriority) sneakpeek.lib.models.ScraperJob | None

Try to dequeue pending scraper job of given priority

Parameters

priority (ScraperJobPriority) – Queue priority

Returns

First pending scraper job or None if the queue is empty

Return type

ScraperJob | None

async get_queue_len(priority: sneakpeek.lib.models.ScraperJobPriority) int

Get number of pending scraper jobs in the queue

Parameters

priority (ScraperJobPriority) – Queue priority

Returns

Number of pending scraper jobs in the queue

Return type

int

async get_scraper_job(scraper_id: int, scraper_job_id: int) sneakpeek.lib.models.ScraperJob

Get scraper job by ID. Throws ScraperNotFoundError if scraper doesn’t exist Throws ScraperJobNotFoundError if scraper job doesn’t exist

Parameters
  • scraper_id (int) – Scraper ID

  • scraper_job_id (int) – scraper job ID

Returns

Found scraper job

Return type

ScraperJob

async get_scraper_jobs(scraper_id: int) list[sneakpeek.lib.models.ScraperJob]
Parameters

scraper_id (int) – Scraper ID

Returns

List of scraper jobs

Return type

List[ScraperJob]

async update_scraper_job(scraper_job: sneakpeek.lib.models.ScraperJob) sneakpeek.lib.models.ScraperJob
Parameters

scraper_job (ScraperJob) – scraper job to update

Returns

Updated scraper job

Return type

ScraperJob

class sneakpeek.lib.storage.redis_storage.RedisScrapersStorage(redis: redis.asyncio.client.Redis, is_read_only: bool = False)

Bases: sneakpeek.lib.storage.base.ScrapersStorage

Redis scrapers storage implementation

Parameters
  • redis (Redis) – Async redis client

  • is_read_only (bool, optional) – Whether to allow modifications of the scrapers list. Defaults to False.

Return type

None

async create_scraper(scraper: sneakpeek.lib.models.Scraper) sneakpeek.lib.models.Scraper
Parameters

scraper (Scraper) – Scraper Metadata

Returns

Created scraper

Return type

Scraper

async delete_scraper(id: int) sneakpeek.lib.models.Scraper
Parameters

id (int) – Scraper ID

Returns

Deleted scraper

Return type

Scraper

async get_scraper(id: int) sneakpeek.lib.models.Scraper

Get scraper by ID. Throws ScraperNotFoundError if scraper doesn’t exist

Parameters

id (int) – Scraper ID

Returns

Scraper metadata

Return type

Scraper

async get_scrapers() list[sneakpeek.lib.models.Scraper]
Returns

List of all available scrapers

Return type

List[Scraper]

async is_read_only() bool
Returns

Whether the storage allows modifiying scrapers list and metadata

Return type

bool

async maybe_get_scraper(id: int) sneakpeek.lib.models.Scraper | None

Get scraper by ID. Return None if scraper doesn’t exist

Parameters

id (int) – Scraper ID

Returns

Scraper metadata

Return type

Scraper

async search_scrapers(name_filter: str | None = None, max_items: int | None = None, offset: int | None = None) list[sneakpeek.lib.models.Scraper]

Search scrapers using given filters

Parameters
  • name_filter (str | None, optional) – Search scrapers that have given substring in the name. Defaults to None.

  • max_items (int | None, optional) – Maximum number of items to return. Defaults to None.

  • offset (int | None, optional) – Offset for search results. Defaults to None.

Returns

Found scrapers

Return type

List[Scraper]

async update_scraper(scraper: sneakpeek.lib.models.Scraper) sneakpeek.lib.models.Scraper
Parameters

scraper (Scraper) – Scraper Metadata

Returns

Updated scraper

Return type

Scraper

class sneakpeek.plugins.proxy_plugin.ProxyPlugin(default_config: Optional[sneakpeek.plugins.proxy_plugin.ProxyPluginConfig] = None)

Bases: sneakpeek.scraper_context.BeforeRequestPlugin

Proxy plugin automatically sets proxy arguments for all HTTP requests.

Parameters

default_config (sneakpeek.plugins.proxy_plugin.ProxyPluginConfig | None) –

Return type

None

async before_request(request: sneakpeek.scraper_context.Request, config: Optional[Any]) sneakpeek.scraper_context.Request

Function that is called on each (HTTP) request before its dispatched.

Parameters
  • request (Request) – Request metadata

  • config (Any | None, optional) – Plugin configuration. Defaults to None.

Returns

Request metadata

Return type

Request

property name: str

Name of the plugin

class sneakpeek.plugins.proxy_plugin.ProxyPluginConfig(*, proxy: str | yarl.URL | None = None, proxy_auth: aiohttp.helpers.BasicAuth | None = None)

Bases: pydantic.main.BaseModel

Proxy plugin config

Create a new model by parsing and validating input data from keyword arguments.

Raises ValidationError if the input data cannot be parsed to form a valid model.

Parameters
  • proxy (str | yarl.URL | None) –

  • proxy_auth (aiohttp.helpers.BasicAuth | None) –

Return type

None

proxy: str | yarl.URL | None

Proxy URL

proxy_auth: aiohttp.helpers.BasicAuth | None

Proxy authentication info to use

exception sneakpeek.plugins.rate_limiter_plugin.RateLimitedException

Bases: Exception

Request is rate limited because too many requests were made to the host

class sneakpeek.plugins.rate_limiter_plugin.RateLimitedStrategy(value)

Bases: enum.Enum

What to do if the request is rate limited

THROW = 1

Throw an exception

WAIT = 2

Wait until request is no longer rate limited

class sneakpeek.plugins.rate_limiter_plugin.RateLimiterPlugin(default_config: Optional[sneakpeek.plugins.rate_limiter_plugin.RateLimiterPluginConfig] = None)

Bases: sneakpeek.scraper_context.BeforeRequestPlugin

Rate limiter implements leaky bucket algorithm to limit number of requests made to the hosts. If the request is rate limited it can either raise an exception or wait until the request won’t be limited anymore.

Parameters

default_config (sneakpeek.plugins.rate_limiter_plugin.RateLimiterPluginConfig | None) –

Return type

None

async before_request(request: sneakpeek.scraper_context.Request, config: Optional[Any]) sneakpeek.scraper_context.Request

Function that is called on each (HTTP) request before its dispatched.

Parameters
  • request (Request) – Request metadata

  • config (Any | None, optional) – Plugin configuration. Defaults to None.

Returns

Request metadata

Return type

Request

property name: str

Name of the plugin

class sneakpeek.plugins.rate_limiter_plugin.RateLimiterPluginConfig(*, max_requests: int = 60, rate_limited_strategy: sneakpeek.plugins.rate_limiter_plugin.RateLimitedStrategy = RateLimitedStrategy.WAIT, time_window: datetime.timedelta = datetime.timedelta(seconds=60))

Bases: pydantic.main.BaseModel

Rate limiter plugin configuration

Create a new model by parsing and validating input data from keyword arguments.

Raises ValidationError if the input data cannot be parsed to form a valid model.

Parameters
Return type

None

max_requests: int

Maximum number of allowed requests per host within time window

rate_limited_strategy: sneakpeek.plugins.rate_limiter_plugin.RateLimitedStrategy

What to do if the request is rate limited

time_window: datetime.timedelta

Time window to aggregate requests

class sneakpeek.plugins.requests_logging_plugin.RequestsLoggingPlugin(default_config: Optional[sneakpeek.plugins.requests_logging_plugin.RequestsLoggingPluginConfig] = None)

Bases: sneakpeek.scraper_context.BeforeRequestPlugin, sneakpeek.scraper_context.AfterResponsePlugin

Requests logging middleware logs all requests being made and received responses.

Parameters

default_config (sneakpeek.plugins.requests_logging_plugin.RequestsLoggingPluginConfig | None) –

Return type

None

async after_response(request: sneakpeek.scraper_context.Request, response: aiohttp.client_reqrep.ClientResponse, config: Optional[Any]) aiohttp.client_reqrep.ClientResponse

Function that is called on each (HTTP) response before its result returned to the caller.

Parameters
  • request (Request) – Request metadata

  • response (aiohttp.ClientResponse) – HTTP Response

  • config (Any | None, optional) – Plugin configuration. Defaults to None.

Returns

HTTP Response

Return type

aiohttp.ClientResponse

async before_request(request: sneakpeek.scraper_context.Request, config: Optional[Any]) sneakpeek.scraper_context.Request

Function that is called on each (HTTP) request before its dispatched.

Parameters
  • request (Request) – Request metadata

  • config (Any | None, optional) – Plugin configuration. Defaults to None.

Returns

Request metadata

Return type

Request

property name: str

Name of the plugin

class sneakpeek.plugins.requests_logging_plugin.RequestsLoggingPluginConfig(*, log_request: bool = True, log_response: bool = True)

Bases: pydantic.main.BaseModel

Requests logging plugin config

Create a new model by parsing and validating input data from keyword arguments.

Raises ValidationError if the input data cannot be parsed to form a valid model.

Parameters
  • log_request (bool) –

  • log_response (bool) –

Return type

None

log_request: bool

Whether to log request being made

log_response: bool

Whether to log response being made

class sneakpeek.plugins.robots_txt_plugin.RobotsTxtPlugin(default_config: Optional[sneakpeek.plugins.robots_txt_plugin.RobotsTxtPluginConfig] = None)

Bases: sneakpeek.scraper_context.BeforeRequestPlugin

Robots.txt plugin can log and optionally block requests if they are disallowed by website robots.txt.

Parameters

default_config (sneakpeek.plugins.robots_txt_plugin.RobotsTxtPluginConfig | None) –

Return type

None

async before_request(request: sneakpeek.scraper_context.Request, config: Optional[Any]) sneakpeek.scraper_context.Request

Function that is called on each (HTTP) request before its dispatched.

Parameters
  • request (Request) – Request metadata

  • config (Any | None, optional) – Plugin configuration. Defaults to None.

Returns

Request metadata

Return type

Request

property name: str

Name of the plugin

class sneakpeek.plugins.robots_txt_plugin.RobotsTxtPluginConfig(*, violation_strategy: sneakpeek.plugins.robots_txt_plugin.RobotsTxtViolationStrategy = RobotsTxtViolationStrategy.LOG)

Bases: pydantic.main.BaseModel

robots.txt plugin configuration

Create a new model by parsing and validating input data from keyword arguments.

Raises ValidationError if the input data cannot be parsed to form a valid model.

Parameters

violation_strategy (sneakpeek.plugins.robots_txt_plugin.RobotsTxtViolationStrategy) –

Return type

None

exception sneakpeek.plugins.robots_txt_plugin.RobotsTxtViolationException

Bases: Exception

Exception which is raised if request is disallowed by website robots.txt

class sneakpeek.plugins.robots_txt_plugin.RobotsTxtViolationStrategy(value)

Bases: enum.Enum

What to do if the request is disallowed by website robots.txt

LOG = 1

Only log violation

THROW = 2

Raise an exception on vioalation

class sneakpeek.plugins.user_agent_injecter_plugin.UserAgentInjecterPlugin(default_config: Optional[sneakpeek.plugins.user_agent_injecter_plugin.UserAgentInjecterPluginConfig] = None)

Bases: sneakpeek.scraper_context.BeforeRequestPlugin

This plugin automatically adds User-Agent header if it’s not present. It uses fake-useragent in order to generate fake real world user agents.

Parameters

default_config (sneakpeek.plugins.user_agent_injecter_plugin.UserAgentInjecterPluginConfig | None) –

Return type

None

async before_request(request: sneakpeek.scraper_context.Request, config: Optional[Any]) sneakpeek.scraper_context.Request

Function that is called on each (HTTP) request before its dispatched.

Parameters
  • request (Request) – Request metadata

  • config (Any | None, optional) – Plugin configuration. Defaults to None.

Returns

Request metadata

Return type

Request

property name: str

Name of the plugin

class sneakpeek.plugins.user_agent_injecter_plugin.UserAgentInjecterPluginConfig(*, use_external_data: bool = True, browsers: list[str] = ['chrome', 'edge', 'firefox', 'safari', 'opera'])

Bases: pydantic.main.BaseModel

Plugin configuration

Create a new model by parsing and validating input data from keyword arguments.

Raises ValidationError if the input data cannot be parsed to form a valid model.

Parameters
  • use_external_data (bool) –

  • browsers (list[str]) –

Return type

None

browsers: list[str]

List of browsers which are used to generate user agents

use_external_data: bool

Whether to use external data as a fallback