API
- class sneakpeek.server.SneakpeekServer(worker: Optional[sneakpeek.worker.WorkerABC] = None, scheduler: Optional[sneakpeek.scheduler.SchedulerABC] = None, api: Optional[fastapi_jsonrpc.API] = None, api_port: int = 8080, expose_metrics: bool = True, metrics_port: int = 9090)
Bases:
object
Sneakpeek server. It can run multiple services at once:
API - allows interactions with scrapers storage and scrapers via JsonRPC or UI
Worker - executes scheduled scrapers
Scheduler - automatically schedules scrapers that are stored in the storage
- Parameters
worker (WorkerABC | None, optional) – Worker that consumes scraper jobs queue. Defaults to None.
scheduler (SchedulerABC | None, optional) – Scrapers scheduler. Defaults to None.
api (jsonrpc.API | None, optional) – API to interact with the system. Defaults to None.
api_port (int, optional) – Port which is used for API and UI. Defaults to 8080.
expose_metrics (bool, optional) – Whether to expose metrics (prometheus format). Defaults to True.
metrics_port (int, optional) – Port which is used to expose metric. Defaults to 9090.
- Return type
None
- static create(handlers: list[sneakpeek.scraper_handler.ScraperHandler], scrapers_storage: sneakpeek.lib.storage.base.ScrapersStorage, jobs_storage: sneakpeek.lib.storage.base.ScraperJobsStorage, lease_storage: sneakpeek.lib.storage.base.LeaseStorage, with_api: bool = True, with_worker: bool = True, with_scheduler: bool = True, expose_metrics: bool = True, worker_max_concurrency: int = 50, api_port: int = 8080, scheduler_storage_poll_delay: datetime.timedelta = datetime.timedelta(seconds=5), scheduler_lease_duration: datetime.timedelta = datetime.timedelta(seconds=60), plugins: Optional[list[sneakpeek.scraper_context.BeforeRequestPlugin | sneakpeek.scraper_context.AfterResponsePlugin]] = None, metrics_port: int = 9090)
Create Sneakpeek server using default API, worker and scheduler implementations
- Parameters
handlers (list[ScraperHandler]) – List of handlers that implement scraper logic
scrapers_storage (ScrapersStorage) – Scrapers storage
jobs_storage (ScraperJobsStorage) – Jobs storage
lease_storage (LeaseStorage) – Lease storage
run_api (bool, optional) – Whether to run API service. Defaults to True.
run_worker (bool, optional) – Whether to run worker service. Defaults to True.
run_scheduler (bool, optional) – Whether to run scheduler service. Defaults to True.
expose_metrics (bool, optional) – Whether to expose metrics (prometheus format). Defaults to True.
worker_max_concurrency (int, optional) – Maximum number of concurrently executed scrapers. Defaults to 50.
api_port (int, optional) – Port which is used for API and UI. Defaults to 8080.
scheduler_storage_poll_delay (timedelta, optional) – How much scheduler wait before polling storage for scrapers updates. Defaults to 5 seconds.
scheduler_lease_duration (timedelta, optional) – How long scheduler lease lasts. Lease is required for scheduler to be able to create new scraper jobs. This is needed so at any point of time there’s only one active scheduler instance. Defaults to 1 minute.
plugins (list[Plugin] | None, optional) – List of plugins that will be used by scraper runner. Can be omitted if run_worker is False. Defaults to None.
metrics_port (int, optional) – Port which is used to expose metric. Defaults to 9090.
with_api (bool) –
with_worker (bool) –
with_scheduler (bool) –
- serve(loop: Optional[asyncio.events.AbstractEventLoop] = None, blocking: bool = True) None
Start Sneakpeek server
- Parameters
loop (asyncio.AbstractEventLoop | None, optional) – AsyncIO loop to use. In case it’s None result of asyncio.get_event_loop() will be used. Defaults to None.
blocking (bool, optional) – Whether to block thread while server is running. Defaults to True.
- Return type
None
- stop(loop: Optional[asyncio.events.AbstractEventLoop] = None) None
Stop Sneakpeek server
- Parameters
loop (asyncio.AbstractEventLoop | None, optional) – AsyncIO loop to use. In case it’s None result of asyncio.get_event_loop() will be used. Defaults to None.
- Return type
None
- class sneakpeek.scraper_config.ScraperConfig(*, params: dict[str, typing.Any] | None = None, plugins: dict[str, typing.Any] | None = None)
Bases:
pydantic.main.BaseModel
Scraper configuration
- Parameters
params (dict[str, typing.Any] | None) –
plugins (dict[str, typing.Any] | None) –
- Return type
None
- params
Scraper configuration that is passed to the handler. Defaults to None.
- Type
dict[str, Any] | None
- plugins
Plugins configuration that defines which plugins to use (besides global ones). Takes precedence over global plugin configuration. Defaults to None.
- Type
dict[str, Any] | None
Create a new model by parsing and validating input data from keyword arguments.
Raises ValidationError if the input data cannot be parsed to form a valid model.
- class sneakpeek.scraper_context.AfterResponsePlugin
Bases:
abc.ABC
Abstract class for the plugin which is called after each request
- abstract async after_response(request: sneakpeek.scraper_context.Request, response: aiohttp.client_reqrep.ClientResponse, config: Optional[Any] = None) aiohttp.client_reqrep.ClientResponse
Function that is called on each (HTTP) response before its result returned to the caller.
- Parameters
request (Request) – Request metadata
response (aiohttp.ClientResponse) – HTTP Response
config (Any | None, optional) – Plugin configuration. Defaults to None.
- Returns
HTTP Response
- Return type
aiohttp.ClientResponse
- abstract property name: str
Name of the plugin
- class sneakpeek.scraper_context.BeforeRequestPlugin
Bases:
abc.ABC
Abstract class for the plugin which is called before each request (like Middleware)
- abstract async before_request(request: sneakpeek.scraper_context.Request, config: Optional[Any] = None) sneakpeek.scraper_context.Request
Function that is called on each (HTTP) request before its dispatched.
- abstract property name: str
Name of the plugin
- class sneakpeek.scraper_context.HttpMethod(value)
Bases:
str
,enum.Enum
HTTP method
- class sneakpeek.scraper_context.Request(method: sneakpeek.scraper_context.HttpMethod, url: str, headers: Optional[dict[str, str]] = None, kwargs: Optional[dict[str, typing.Any]] = None)
Bases:
object
HTTP Request metadata
- Parameters
method (sneakpeek.scraper_context.HttpMethod) –
url (str) –
headers (dict[str, str] | None) –
kwargs (dict[str, typing.Any] | None) –
- Return type
None
- class sneakpeek.scraper_context.ScraperContext(config: sneakpeek.scraper_config.ScraperConfig, plugins: Optional[list[sneakpeek.scraper_context.BeforeRequestPlugin | sneakpeek.scraper_context.AfterResponsePlugin]] = None, ping_session_func: Optional[Callable] = None)
Bases:
object
Scraper context - helper class that implements basic HTTP client which logic can be extended by plugins that can preprocess request (e.g. Rate Limiter) and postprocess response (e.g. Response logger).
- Parameters
config (ScraperConfig) – Scraper configuration
plugins (list[BeforeRequestPlugin | AfterResponsePlugin] | None, optional) – List of available plugins. Defaults to None.
ping_session_func (Callable | None, optional) – Function that pings scraper job. Defaults to None.
- Return type
None
- async delete(url: str, *, headers: Optional[dict[str, str]] = None, **kwargs) aiohttp.client_reqrep.ClientResponse
Make DELETE request to the given URL
- Parameters
url (str) – URL to send DELETE request to
headers (HttpHeaders | None, optional) – HTTP headers. Defaults to None.
**kwargs – See aiohttp.delete() for the full list of arguments
- Return type
aiohttp.client_reqrep.ClientResponse
- async get(url: str, *, headers: Optional[dict[str, str]] = None, **kwargs) aiohttp.client_reqrep.ClientResponse
Make GET request to the given URL
- Parameters
url (str) – URL to send GET request to
headers (HttpHeaders | None, optional) – HTTP headers. Defaults to None.
**kwargs – See aiohttp.get() for the full list of arguments
- Return type
aiohttp.client_reqrep.ClientResponse
- async head(url: str, *, headers: Optional[dict[str, str]] = None, **kwargs) aiohttp.client_reqrep.ClientResponse
Make HEAD request to the given URL
- Parameters
url (str) – URL to send HEAD request to
headers (HttpHeaders | None, optional) – HTTP headers. Defaults to None.
**kwargs – See aiohttp.head() for the full list of arguments
- Return type
aiohttp.client_reqrep.ClientResponse
- async options(url: str, *, headers: Optional[dict[str, str]] = None, **kwargs) aiohttp.client_reqrep.ClientResponse
Make OPTIONS request to the given URL
- Parameters
url (str) – URL to send OPTIONS request to
headers (HttpHeaders | None, optional) – HTTP headers. Defaults to None.
**kwargs – See aiohttp.options() for the full list of arguments
- Return type
aiohttp.client_reqrep.ClientResponse
- async ping_session() None
Ping scraper job, so it’s not considered dead
- Return type
None
- async post(url: str, *, headers: Optional[dict[str, str]] = None, **kwargs) aiohttp.client_reqrep.ClientResponse
Make POST request to the given URL
- Parameters
url (str) – URL to send POST request to
headers (HttpHeaders | None, optional) – HTTP headers. Defaults to None.
**kwargs – See aiohttp.get() for the full list of arguments
- Return type
aiohttp.client_reqrep.ClientResponse
- async put(url: str, *, headers: Optional[dict[str, str]] = None, **kwargs) aiohttp.client_reqrep.ClientResponse
Make PUT request to the given URL
- Parameters
url (str) – URL to send PUT request to
headers (HttpHeaders | None, optional) – HTTP headers. Defaults to None.
**kwargs – See aiohttp.put() for the full list of arguments
- Return type
aiohttp.client_reqrep.ClientResponse
- class sneakpeek.scraper_handler.ScraperHandler
Bases:
abc.ABC
Abstract class that scraper logic handler must implement
- abstract property name: str
Name of the handler
- abstract async run(context: sneakpeek.scraper_context.ScraperContext) str
Execute scraper logic
- Parameters
context (ScraperContext) – Scraper context
- Returns
scraper result that will be persisted in the storage (should be relatively small information to give sense on job result)
- Return type
str
- class sneakpeek.lib.models.Lease(*, name: str, owner_id: str, acquired: datetime.datetime, acquired_until: datetime.datetime)
Bases:
pydantic.main.BaseModel
Lease metadata
Create a new model by parsing and validating input data from keyword arguments.
Raises ValidationError if the input data cannot be parsed to form a valid model.
- Parameters
name (str) –
owner_id (str) –
acquired (datetime.datetime) –
acquired_until (datetime.datetime) –
- Return type
None
- acquired: datetime.datetime
Time when the lease was acquired
- acquired_until: datetime.datetime
Time until the lease is acquired
- name: str
Lease name (resource name to be locked)
- owner_id: str
ID of the acquirer (should be the same if you already have the lease and want to prolong it)
- class sneakpeek.lib.models.Scraper(*, id: int, name: str, schedule: sneakpeek.lib.models.ScraperSchedule, schedule_crontab: str | None = None, handler: str, config: sneakpeek.scraper_config.ScraperConfig, schedule_priority: sneakpeek.lib.models.ScraperJobPriority = ScraperJobPriority.NORMAL)
Bases:
pydantic.main.BaseModel
Scraper metadata
Create a new model by parsing and validating input data from keyword arguments.
Raises ValidationError if the input data cannot be parsed to form a valid model.
- Parameters
id (int) –
name (str) –
schedule (sneakpeek.lib.models.ScraperSchedule) –
schedule_crontab (str | None) –
handler (str) –
config (sneakpeek.scraper_config.ScraperConfig) –
schedule_priority (sneakpeek.lib.models.ScraperJobPriority) –
- Return type
None
- config: sneakpeek.scraper_config.ScraperConfig
Scraper configuration that is passed to the handler
- handler: str
Name of the scraper handler that implements scraping logic
- id: int
Scraper unique identifier
- name: str
Scraper name
- schedule: sneakpeek.lib.models.ScraperSchedule
Scraper schedule configuration
- schedule_crontab: str | None
Must be defined if schedule equals to
CRONTAB
- schedule_priority: sneakpeek.lib.models.ScraperJobPriority
Default priority to enqueue scraper jobs with
- class sneakpeek.lib.models.ScraperJob(*, id: int, scraper: sneakpeek.lib.models.Scraper, status: sneakpeek.lib.models.ScraperJobStatus, priority: sneakpeek.lib.models.ScraperJobPriority, created_at: datetime.datetime, started_at: datetime.datetime | None = None, last_active_at: datetime.datetime | None = None, finished_at: datetime.datetime | None = None, result: str | None = None)
Bases:
pydantic.main.BaseModel
Scraper job metadata
Create a new model by parsing and validating input data from keyword arguments.
Raises ValidationError if the input data cannot be parsed to form a valid model.
- Parameters
id (int) –
scraper (sneakpeek.lib.models.Scraper) –
status (sneakpeek.lib.models.ScraperJobStatus) –
priority (sneakpeek.lib.models.ScraperJobPriority) –
created_at (datetime.datetime) –
started_at (datetime.datetime | None) –
last_active_at (datetime.datetime | None) –
finished_at (datetime.datetime | None) –
result (str | None) –
- Return type
None
- created_at: datetime.datetime
When the job was created and enqueued
- finished_at: datetime.datetime | None
When the job finished
- id: int
Job unique identifier
- last_active_at: datetime.datetime | None
When the job last sent heartbeat
- priority: sneakpeek.lib.models.ScraperJobPriority
Scraper job priority
- result: str | None
Information with the job result (should be rather small and should summarize the outcome of the scraping)
- scraper: sneakpeek.lib.models.Scraper
Scraper metadata
- started_at: datetime.datetime | None
When the job was dequeued and started being processed by the worker
- status: sneakpeek.lib.models.ScraperJobStatus
Scraper job status
- class sneakpeek.lib.models.ScraperJobPriority(value)
Bases:
enum.Enum
Priority of the scraper job
- HIGH = 1
- NORMAL = 2
- UTMOST = 0
- class sneakpeek.lib.models.ScraperJobStatus(value)
Bases:
str
,enum.Enum
Scraper job status
- DEAD = 'dead'
Scraper job was inactive for a while, so scheduler marked it as dead and scheduler can schedule scraper again
- FAILED = 'failed'
Scraper job failed
- KILLED = 'killed'
Scraper job was killed by the user
- PENDING = 'pending'
Scraper job is in the queue
- STARTED = 'started'
Scraper job was dequeued by the worker and is being processed
- SUCCEEDED = 'succeeded'
Scraper job succeeded
- class sneakpeek.lib.models.ScraperSchedule(value)
Bases:
str
,enum.Enum
Scraper schedule options. Note that it’s disallowed to have 2 concurrent scraper jobs, so if there’s an active scraper job new job won’t be scheduled
- CRONTAB = 'crontab'
Specify crontab when scraper should be scheduled
- EVERY_DAY = 'every_day'
Scraper will be scheduled every day
- EVERY_HOUR = 'every_hour'
Scraper will be scheduled every hour
- EVERY_MINUTE = 'every_minute'
Scraper will be scheduled every minute
- EVERY_MONTH = 'every_month'
Scraper will be scheduled every month
- EVERY_SECOND = 'every_second'
Scraper will be scheduled every second
- EVERY_WEEK = 'every_week'
Scraper will be scheduled every week
- INACTIVE = 'inactive'
Scraper won’t be automatically scheduled
- class sneakpeek.scheduler.Scheduler(scrapers_storage: sneakpeek.lib.storage.base.ScrapersStorage, jobs_storage: sneakpeek.lib.storage.base.ScraperJobsStorage, lease_storage: sneakpeek.lib.storage.base.LeaseStorage, queue: sneakpeek.lib.queue.QueueABC, storage_poll_frequency: datetime.timedelta = datetime.timedelta(seconds=5), lease_duration: datetime.timedelta = datetime.timedelta(seconds=60), jobs_to_keep: int = 100)
Bases:
sneakpeek.scheduler.SchedulerABC
Sneakpeeker scheduler - schedules scrapers and performs maintenance jobs. Uses APScheduler under the hood.
Initialize scheduler
- Parameters
scrapers_storage (ScrapersStorage) – Scrapers storage
jobs_storage (ScraperJobsStorage) – Jobs storage
lease_storage (LeaseStorage) – Lease storage
queue (Queue) – Sneakpeek queue implementation
storage_poll_frequency (timedelta, optional) – How much scheduler wait before polling storage for scrapers updates. Defaults to 5 seconds.
lease_duration (timedelta, optional) – How long scheduler lease lasts. Lease is required for scheduler to be able to create new scraper jobs. This is needed so at any point of time there’s only one active scheduler instance. Defaults to 1 minute.
jobs_to_keep (int, optional) – Maximum number of historical scraper jobs to keep in the storage. Storage is cleaned up every 10 minutes. Defaults to 100.
- Return type
None
- class sneakpeek.scheduler.SchedulerABC
Bases:
abc.ABC
- class sneakpeek.worker.Worker(runner: sneakpeek.runner.RunnerABC, queue: sneakpeek.lib.queue.QueueABC, loop: Optional[asyncio.events.AbstractEventLoop] = None, max_concurrency: int = 50)
Bases:
sneakpeek.worker.WorkerABC
Sneakpeeker worker - consumes scraper jobs queue and executes scapers logic
- Parameters
runner (RunnerABC) – Scraper runner
queue (Queue) – Sneakpeek queue implementation
loop (asyncio.AbstractEventLoop | None, optional) – AsyncIO loop to use. In case it’s None result of asyncio.get_event_loop() will be used. Defaults to None.
max_concurrency (int, optional) – Maximum number of concurrent scraper jobs. Defaults to 50.
- Return type
None
- class sneakpeek.worker.WorkerABC
Bases:
abc.ABC
- class sneakpeek.runner.Runner(handlers: List[sneakpeek.lib.models.Scraper], queue: sneakpeek.lib.queue.QueueABC, storage: sneakpeek.lib.storage.base.ScraperJobsStorage, plugins: Optional[list[sneakpeek.scraper_context.BeforeRequestPlugin | sneakpeek.scraper_context.AfterResponsePlugin]] = None)
Bases:
sneakpeek.runner.RunnerABC
Default scraper jobner implementation
Initialize runner
- Parameters
handlers (list[ScraperHandler]) – List of handlers that implement scraper logic
queue (Queue) – Sneakpeek queue implementation
storage (Storage) – Sneakpeek storage implementation
plugins (list[Plugin] | None, optional) – List of plugins that will be used by scraper runner. Defaults to None.
- Return type
None
- async run(job: sneakpeek.lib.models.ScraperJob) None
Execute scraper. Following logic is done:
Ping scraper job
Build scraper context
Execute scraper logic
[On success] Set scraper job status to
SUCCEEDED
[On fail] Set scraper job status to
FAILED
[If the scraper job was killed] Do nothing
Persist scraper job status
- Parameters
job (ScraperJob) – Scraper job metadata
- Return type
None
- class sneakpeek.runner.RunnerABC
Bases:
abc.ABC
Scraper runner - manages scraper job lifecycle and runs the scraper logic
- abstract async run(job: sneakpeek.lib.models.ScraperJob) None
Execute scraper job
- Parameters
job (ScraperJob) – Scraper job metadata
- Return type
None
- sneakpeek.api.create_api(scrapers_storage: sneakpeek.lib.storage.base.ScrapersStorage, jobs_storage: sneakpeek.lib.storage.base.ScraperJobsStorage, queue: sneakpeek.lib.queue.QueueABC, handlers: list[sneakpeek.scraper_handler.ScraperHandler]) fastapi_jsonrpc.API
Create JsonRPC API (FastAPI is used under the hood)
- Parameters
storage (Storage) – Sneakpeek storage implementation
queue (Queue) – Sneakpeek queue implementation
handlers (list[ScraperHandler]) – List of handlers that implement scraper logic
scrapers_storage (sneakpeek.lib.storage.base.ScrapersStorage) –
jobs_storage (sneakpeek.lib.storage.base.ScraperJobsStorage) –
- Return type
fastapi_jsonrpc.API
- sneakpeek.api.get_api_entrypoint(scrapers_storage: sneakpeek.lib.storage.base.ScrapersStorage, jobs_storage: sneakpeek.lib.storage.base.ScraperJobsStorage, queue: sneakpeek.lib.queue.Queue, handlers: list[sneakpeek.scraper_handler.ScraperHandler]) fastapi_jsonrpc.Entrypoint
Create public JsonRPC API entrypoint (mostly mimics storage and queue API)
- Parameters
storage (Storage) – Sneakpeek storage implementation
queue (Queue) – Sneakpeek queue implementation
handlers (list[ScraperHandler]) – List of handlers that implement scraper logic
scrapers_storage (sneakpeek.lib.storage.base.ScrapersStorage) –
jobs_storage (sneakpeek.lib.storage.base.ScraperJobsStorage) –
- Returns
FastAPI JsonRPC entrypoint
- Return type
jsonrpc.Entrypoint
- sneakpeek.metrics.count_invocations(subsystem: str)
Decorator for measuring number of function invocations (works for both sync and async functions).
@count_invocations(subsytem="my subsystem") def my_awesome_func(): ...
This will export following Prometheus counter metrics:
# Total number of invocations sneakpeek_invocations{subsystem="my subsystem", method="my_awesome_func", type="total", error=""} # Total number of successful invocations (ones that haven't thrown an exception) sneakpeek_invocations{subsystem="my subsystem", method="my_awesome_func", type="success", error=""} # Total number of failed invocations (ones that have thrown an exception) sneakpeek_invocations{subsystem="my subsystem", method="my_awesome_func", type="error", error="<Exception class name>"}
- Parameters
subsystem (str) – Subsystem name to be used in the metric annotation
- sneakpeek.metrics.measure_latency(subsystem: str)
Decorator for measuring latency of the function (works for both sync and async functions).
@measure_latency(subsytem="my subsystem") def my_awesome_func(): ...
This will export following Prometheus histogram metric:
sneakpeek_latency{subsystem="my subsystem", method="my_awesome_func"}
- Parameters
subsystem (str) – Subsystem name to be used in the metric annotation
- class sneakpeek.logging.ScraperContextInjectingFilter(name='')
Bases:
logging.Filter
Scraper context filter which automatically injects scraper and scraper job IDs to the logging metadata.
Example of usage:
logger = logging.getLogger() handler = logging.StreamHandler() handler.addFilter(ScraperContextInjectingFilter()) logger.addHandler(handler)
Initialize a filter.
Initialize with the name of the logger which, together with its children, will have its events allowed through the filter. If no name is specified, allow every event.
- filter(record: logging.LogRecord) bool
Injects scraper metadata into log record:
scraper_job_id
- Scraper Job IDscraper_id
- Scraper IDscraper_name
- Scraper namescraper_handler
- Scraper logic implementationscraper_job_human_name
- Formatted scraper job ID (<name>::<scraper_id>::<scraper_job_id>
)
- Parameters
record (logging.LogRecord) – Log record to inject metadata into
- Returns
Always True
- Return type
bool
- sneakpeek.logging.configure_logging(level: int = 20)
Helper function to configure logging:
Adds console logger to the root logger
Adds scraper context injector filter to the console logger
Configures console formatting to use scraper metadata
- Parameters
level (int, optional) – Minimum logging level. Defaults to logging.INFO.
- sneakpeek.logging.scraper_job_context(scraper_job: sneakpeek.lib.models.ScraperJob) None
Initialize scraper job logging context which automatically adds scraper and scraper job IDs to the logging metadata
- Parameters
scraper_job (ScraperJob) – Scraper job definition
- Return type
None
- exception sneakpeek.lib.errors.ScraperHasActiveRunError(data=None)
Bases:
fastapi_jsonrpc.BaseError
- exception sneakpeek.lib.errors.ScraperJobNotFoundError(data=None)
Bases:
fastapi_jsonrpc.BaseError
- exception sneakpeek.lib.errors.ScraperJobPingFinishedError(data=None)
Bases:
fastapi_jsonrpc.BaseError
- exception sneakpeek.lib.errors.ScraperJobPingNotStartedError(data=None)
Bases:
fastapi_jsonrpc.BaseError
- exception sneakpeek.lib.errors.ScraperNotFoundError(data=None)
Bases:
fastapi_jsonrpc.BaseError
- exception sneakpeek.lib.errors.UnknownScraperHandlerError(data=None)
Bases:
fastapi_jsonrpc.BaseError
- class sneakpeek.lib.queue.Queue(scrapers_storage: sneakpeek.lib.storage.base.ScrapersStorage, scraper_jobs_storage: sneakpeek.lib.storage.base.ScraperJobsStorage, dead_timeout: datetime.timedelta = datetime.timedelta(seconds=300))
Bases:
object
Default priority queue implementation
- Parameters
scrapers_storage (ScrapersStorage) – Scrapers storage
scraper_jobs_storage (ScraperJobsStorage) – Scraper jobs storage
dead_timeout (timedelta, optional) – If the scraper job hasn’t pinged for the given time period, the job will be marked as dead. Defaults to 5 minute.
- Return type
None
- class sneakpeek.lib.queue.QueueABC
Bases:
abc.ABC
Sneakpeek scraper job priority queue
- abstract async dequeue() sneakpeek.lib.models.ScraperJob | None
Try to dequeue a job from the queue.
- Returns
Scraper job metadata if queue wasn’t empty or None otherwise
- Return type
- abstract async enqueue(scraper_id: int, priority: sneakpeek.lib.models.ScraperJobPriority) sneakpeek.lib.models.ScraperJob
Enqueue scraper job.
- Parameters
scraper_id (int) – ID of the scraper to enqueue
priority (ScraperJobPriority) – Priority of the job to enqueue
- Returns
Scraper job metadata
- Return type
- Raises
ScraperNotFoundError – If scraper doesn’t exist
ScraperHasActiveRunError – If there are scraper jobs in
PENDING
orSTARTED
state
- abstract async get_queue_len(priority: sneakpeek.lib.models.ScraperJobPriority) int
- Parameters
priority (ScraperJobPriority) – Queue priority
- Returns
Number of pending items in the queue
- Return type
int
- abstract async kill_dead_scraper_jobs(scraper_id: int) list[sneakpeek.lib.models.ScraperJob]
Kill dead scraper jobs for the given scraper
- Parameters
scraper_id (int) – Scraper ID to kill jobs for
- Returns
List of dead scraper jobs
- Return type
list[ScraperJob]
- abstract async ping_scraper_job(scraper_id: int, scraper_job_id: int) sneakpeek.lib.models.ScraperJob
Send a heartbeat for the scraper job
- Parameters
scraper_id (int) – Scraper ID
scraper_job_id (int) – Scraper job ID
- Returns
Update scraper job metadata
- Return type
- Raises
ScraperNotFoundError – If scraper doesn’t exist
ScraperJobFoundError – If scraper job doesn’t exist
ScraperJobPingNotStartedError – If scraper job is still in the
PENDING
stateScraperJobPingFinishedError – If scraper job is not in the
STARTED
state but it’s in finished state (e.g.DEAD
)
- class sneakpeek.lib.storage.base.LeaseStorage
Bases:
abc.ABC
Sneakpeeker lease storage abstract class
- abstract async maybe_acquire_lease(lease_name: str, owner_id: str, acquire_for: datetime.timedelta) sneakpeek.lib.models.Lease | None
Try to acquire lease (global lock).
- Parameters
lease_name (str) – Lease name (resource name to be locked)
owner_id (str) – ID of the acquirer (should be the same if you already have the lease and want to prolong it)
acquire_for (timedelta) – For how long lease will be acquired
- Returns
Lease metadata if it was acquired, None otherwise
- Return type
Lease | None
- abstract async release_lease(lease_name: str, owner_id: str) None
Release lease (global lock)
- Parameters
lease_name (str) – Lease name (resource name to be unlocked)
owner_id (str) – ID of the acquirer
- Return type
None
- class sneakpeek.lib.storage.base.ScraperJobsStorage
Bases:
abc.ABC
Sneakpeeker storage scraper jobs storage abstract class
- abstract async add_scraper_job(scraper_job: sneakpeek.lib.models.ScraperJob) sneakpeek.lib.models.ScraperJob
- Parameters
scraper_job (ScraperJob) – scraper job to add
- Returns
Created scraper job
- Return type
- abstract async delete_old_scraper_jobs(keep_last: int = 50) None
Delete old historical scraper jobs
- Parameters
keep_last (int, optional) – How many historical scraper jobs to keep. Defaults to 50.
- Return type
None
- abstract async dequeue_scraper_job(priority: sneakpeek.lib.models.ScraperJobPriority) sneakpeek.lib.models.ScraperJob | None
Try to dequeue pending scraper job of given priority
- Parameters
priority (ScraperJobPriority) – Queue priority
- Returns
First pending scraper job or None if the queue is empty
- Return type
ScraperJob | None
- abstract async get_queue_len(priority: sneakpeek.lib.models.ScraperJobPriority) int
Get number of pending scraper jobs in the queue
- Parameters
priority (ScraperJobPriority) – Queue priority
- Returns
Number of pending scraper jobs in the queue
- Return type
int
- abstract async get_scraper_job(scraper_id: int, scraper_job_id: int) sneakpeek.lib.models.ScraperJob
Get scraper job by ID. Throws
ScraperNotFoundError
if scraper doesn’t exist ThrowsScraperJobNotFoundError
if scraper job doesn’t exist- Parameters
scraper_id (int) – Scraper ID
scraper_job_id (int) – scraper job ID
- Returns
Found scraper job
- Return type
- abstract async get_scraper_jobs(scraper_id: int) List[sneakpeek.lib.models.ScraperJob]
- Parameters
scraper_id (int) – Scraper ID
- Returns
List of scraper jobs
- Return type
List[ScraperJob]
- abstract async update_scraper_job(scraper_job: sneakpeek.lib.models.ScraperJob) sneakpeek.lib.models.ScraperJob
- Parameters
scraper_job (ScraperJob) – scraper job to update
- Returns
Updated scraper job
- Return type
- class sneakpeek.lib.storage.base.ScrapersStorage
Bases:
abc.ABC
Sneakpeeker storage scraper storage abstract class
- abstract async create_scraper(scraper: sneakpeek.lib.models.Scraper) sneakpeek.lib.models.Scraper
- abstract async delete_scraper(id: int) sneakpeek.lib.models.Scraper
- Parameters
id (int) – Scraper ID
- Returns
Deleted scraper
- Return type
- abstract async get_scraper(id: int) sneakpeek.lib.models.Scraper
Get scraper by ID. Throws
ScraperNotFoundError
if scraper doesn’t exist- Parameters
id (int) – Scraper ID
- Returns
Scraper metadata
- Return type
- abstract async get_scrapers() List[sneakpeek.lib.models.Scraper]
- Returns
List of all available scrapers
- Return type
List[Scraper]
- abstract async is_read_only() bool
- Returns
Whether the storage allows modifiying scrapers list and metadata
- Return type
bool
- abstract async maybe_get_scraper(id: int) sneakpeek.lib.models.Scraper | None
Get scraper by ID. Return None if scraper doesn’t exist
- Parameters
id (int) – Scraper ID
- Returns
Scraper metadata
- Return type
- abstract async search_scrapers(name_filter: Optional[str] = None, max_items: Optional[int] = None, offset: Optional[int] = None) List[sneakpeek.lib.models.Scraper]
Search scrapers using given filters
- Parameters
name_filter (str | None, optional) – Search scrapers that have given substring in the name. Defaults to None.
max_items (int | None, optional) – Maximum number of items to return. Defaults to None.
offset (int | None, optional) – Offset for search results. Defaults to None.
- Returns
Found scrapers
- Return type
List[Scraper]
- abstract async update_scraper(scraper: sneakpeek.lib.models.Scraper) sneakpeek.lib.models.Scraper
- class sneakpeek.lib.storage.in_memory_storage.InMemoryLeaseStorage
Bases:
sneakpeek.lib.storage.base.LeaseStorage
In memory storage for leases. Should only be used for development purposes
- Return type
None
- async maybe_acquire_lease(lease_name: str, owner_id: str, acquire_for: datetime.timedelta) sneakpeek.lib.models.Lease | None
Try to acquire lease (global lock).
- Parameters
lease_name (str) – Lease name (resource name to be locked)
owner_id (str) – ID of the acquirer (should be the same if you already have the lease and want to prolong it)
acquire_for (timedelta) – For how long lease will be acquired
- Returns
Lease metadata if it was acquired, None otherwise
- Return type
Lease | None
- async release_lease(lease_name: str, owner_id: str) None
Release lease (global lock)
- Parameters
lease_name (str) – Lease name (resource name to be unlocked)
owner_id (str) – ID of the acquirer
- Return type
None
- class sneakpeek.lib.storage.in_memory_storage.InMemoryScraperJobsStorage
Bases:
sneakpeek.lib.storage.base.ScraperJobsStorage
In memory storage for scraper jobs. Should only be used for development purposes
- Return type
None
- async add_scraper_job(scraper_job: sneakpeek.lib.models.ScraperJob) sneakpeek.lib.models.ScraperJob
- Parameters
scraper_job (ScraperJob) – scraper job to add
- Returns
Created scraper job
- Return type
- async delete_old_scraper_jobs(keep_last: int = 50) None
Delete old historical scraper jobs
- Parameters
keep_last (int, optional) – How many historical scraper jobs to keep. Defaults to 50.
- Return type
None
- async dequeue_scraper_job(priority: sneakpeek.lib.models.ScraperJobPriority) sneakpeek.lib.models.ScraperJob | None
Try to dequeue pending scraper job of given priority
- Parameters
priority (ScraperJobPriority) – Queue priority
- Returns
First pending scraper job or None if the queue is empty
- Return type
ScraperJob | None
- async get_queue_len(priority: sneakpeek.lib.models.ScraperJobPriority) int
Get number of pending scraper jobs in the queue
- Parameters
priority (ScraperJobPriority) – Queue priority
- Returns
Number of pending scraper jobs in the queue
- Return type
int
- async get_scraper_job(scraper_id: int, scraper_job_id: int) sneakpeek.lib.models.ScraperJob
Get scraper job by ID. Throws
ScraperNotFoundError
if scraper doesn’t exist ThrowsScraperJobNotFoundError
if scraper job doesn’t exist- Parameters
scraper_id (int) – Scraper ID
scraper_job_id (int) – scraper job ID
- Returns
Found scraper job
- Return type
- async get_scraper_jobs(id: int) list[sneakpeek.lib.models.ScraperJob]
- Parameters
scraper_id (int) – Scraper ID
id (int) –
- Returns
List of scraper jobs
- Return type
List[ScraperJob]
- async update_scraper_job(scraper_job: sneakpeek.lib.models.ScraperJob) sneakpeek.lib.models.ScraperJob
- Parameters
scraper_job (ScraperJob) – scraper job to update
- Returns
Updated scraper job
- Return type
- class sneakpeek.lib.storage.in_memory_storage.InMemoryScrapersStorage(scrapers: Optional[list[sneakpeek.lib.models.Scraper]] = None, is_read_only: bool = True)
Bases:
sneakpeek.lib.storage.base.ScrapersStorage
In-memory storage implementation
- Parameters
scrapers (list[Scraper] | None, optional) – List of pre-defined scrapers. Defaults to None.
is_read_only (bool, optional) – Whether to allow modifications of the scrapers list. Set to true only for development. Defaults to True.
- Return type
None
- async create_scraper(scraper: sneakpeek.lib.models.Scraper) sneakpeek.lib.models.Scraper
- async delete_scraper(id: int) sneakpeek.lib.models.Scraper
- Parameters
id (int) – Scraper ID
- Returns
Deleted scraper
- Return type
- async get_scraper(id: int) sneakpeek.lib.models.Scraper
Get scraper by ID. Throws
ScraperNotFoundError
if scraper doesn’t exist- Parameters
id (int) – Scraper ID
- Returns
Scraper metadata
- Return type
- async get_scrapers() list[sneakpeek.lib.models.Scraper]
- Returns
List of all available scrapers
- Return type
List[Scraper]
- async is_read_only() bool
- Returns
Whether the storage allows modifiying scrapers list and metadata
- Return type
bool
- async maybe_get_scraper(id: int) sneakpeek.lib.models.Scraper | None
Get scraper by ID. Return None if scraper doesn’t exist
- Parameters
id (int) – Scraper ID
- Returns
Scraper metadata
- Return type
- async search_scrapers(name_filter: str | None = None, max_items: int | None = None, offset: int | None = None) list[sneakpeek.lib.models.Scraper]
Search scrapers using given filters
- Parameters
name_filter (str | None, optional) – Search scrapers that have given substring in the name. Defaults to None.
max_items (int | None, optional) – Maximum number of items to return. Defaults to None.
offset (int | None, optional) – Offset for search results. Defaults to None.
- Returns
Found scrapers
- Return type
List[Scraper]
- async update_scraper(scraper: sneakpeek.lib.models.Scraper) sneakpeek.lib.models.Scraper
- class sneakpeek.lib.storage.redis_storage.RedisLeaseStorage(redis: redis.asyncio.client.Redis)
Bases:
sneakpeek.lib.storage.base.LeaseStorage
Redis storage for leases. Should only be used for development purposes
- Parameters
redis (Redis) – Async redis client
- Return type
None
- async maybe_acquire_lease(lease_name: str, owner_id: str, acquire_for: datetime.timedelta) sneakpeek.lib.models.Lease | None
Try to acquire lease (global lock).
- Parameters
lease_name (str) – Lease name (resource name to be locked)
owner_id (str) – ID of the acquirer (should be the same if you already have the lease and want to prolong it)
acquire_for (timedelta) – For how long lease will be acquired
- Returns
Lease metadata if it was acquired, None otherwise
- Return type
Lease | None
- async release_lease(lease_name: str, owner_id: str) None
Release lease (global lock)
- Parameters
lease_name (str) – Lease name (resource name to be unlocked)
owner_id (str) – ID of the acquirer
- Return type
None
- class sneakpeek.lib.storage.redis_storage.RedisScraperJobsStorage(redis: redis.asyncio.client.Redis, scrapers_storage: sneakpeek.lib.storage.base.ScrapersStorage)
Bases:
sneakpeek.lib.storage.base.ScraperJobsStorage
Redis storage for scraper jobs. Should only be used for development purposes
- Parameters
redis (Redis) – Async redis client
scrapers_storage (sneakpeek.lib.storage.base.ScrapersStorage) –
- Return type
None
- async add_scraper_job(scraper_job: sneakpeek.lib.models.ScraperJob) sneakpeek.lib.models.ScraperJob
- Parameters
scraper_job (ScraperJob) – scraper job to add
- Returns
Created scraper job
- Return type
- async delete_old_scraper_jobs(keep_last: int = 50) None
Delete old historical scraper jobs
- Parameters
keep_last (int, optional) – How many historical scraper jobs to keep. Defaults to 50.
- Return type
None
- async dequeue_scraper_job(priority: sneakpeek.lib.models.ScraperJobPriority) sneakpeek.lib.models.ScraperJob | None
Try to dequeue pending scraper job of given priority
- Parameters
priority (ScraperJobPriority) – Queue priority
- Returns
First pending scraper job or None if the queue is empty
- Return type
ScraperJob | None
- async get_queue_len(priority: sneakpeek.lib.models.ScraperJobPriority) int
Get number of pending scraper jobs in the queue
- Parameters
priority (ScraperJobPriority) – Queue priority
- Returns
Number of pending scraper jobs in the queue
- Return type
int
- async get_scraper_job(scraper_id: int, scraper_job_id: int) sneakpeek.lib.models.ScraperJob
Get scraper job by ID. Throws
ScraperNotFoundError
if scraper doesn’t exist ThrowsScraperJobNotFoundError
if scraper job doesn’t exist- Parameters
scraper_id (int) – Scraper ID
scraper_job_id (int) – scraper job ID
- Returns
Found scraper job
- Return type
- async get_scraper_jobs(scraper_id: int) list[sneakpeek.lib.models.ScraperJob]
- Parameters
scraper_id (int) – Scraper ID
- Returns
List of scraper jobs
- Return type
List[ScraperJob]
- async update_scraper_job(scraper_job: sneakpeek.lib.models.ScraperJob) sneakpeek.lib.models.ScraperJob
- Parameters
scraper_job (ScraperJob) – scraper job to update
- Returns
Updated scraper job
- Return type
- class sneakpeek.lib.storage.redis_storage.RedisScrapersStorage(redis: redis.asyncio.client.Redis, is_read_only: bool = False)
Bases:
sneakpeek.lib.storage.base.ScrapersStorage
Redis scrapers storage implementation
- Parameters
redis (Redis) – Async redis client
is_read_only (bool, optional) – Whether to allow modifications of the scrapers list. Defaults to False.
- Return type
None
- async create_scraper(scraper: sneakpeek.lib.models.Scraper) sneakpeek.lib.models.Scraper
- async delete_scraper(id: int) sneakpeek.lib.models.Scraper
- Parameters
id (int) – Scraper ID
- Returns
Deleted scraper
- Return type
- async get_scraper(id: int) sneakpeek.lib.models.Scraper
Get scraper by ID. Throws
ScraperNotFoundError
if scraper doesn’t exist- Parameters
id (int) – Scraper ID
- Returns
Scraper metadata
- Return type
- async get_scrapers() list[sneakpeek.lib.models.Scraper]
- Returns
List of all available scrapers
- Return type
List[Scraper]
- async is_read_only() bool
- Returns
Whether the storage allows modifiying scrapers list and metadata
- Return type
bool
- async maybe_get_scraper(id: int) sneakpeek.lib.models.Scraper | None
Get scraper by ID. Return None if scraper doesn’t exist
- Parameters
id (int) – Scraper ID
- Returns
Scraper metadata
- Return type
- async search_scrapers(name_filter: str | None = None, max_items: int | None = None, offset: int | None = None) list[sneakpeek.lib.models.Scraper]
Search scrapers using given filters
- Parameters
name_filter (str | None, optional) – Search scrapers that have given substring in the name. Defaults to None.
max_items (int | None, optional) – Maximum number of items to return. Defaults to None.
offset (int | None, optional) – Offset for search results. Defaults to None.
- Returns
Found scrapers
- Return type
List[Scraper]
- async update_scraper(scraper: sneakpeek.lib.models.Scraper) sneakpeek.lib.models.Scraper
- class sneakpeek.plugins.proxy_plugin.ProxyPlugin(default_config: Optional[sneakpeek.plugins.proxy_plugin.ProxyPluginConfig] = None)
Bases:
sneakpeek.scraper_context.BeforeRequestPlugin
Proxy plugin automatically sets proxy arguments for all HTTP requests.
- Parameters
default_config (sneakpeek.plugins.proxy_plugin.ProxyPluginConfig | None) –
- Return type
None
- async before_request(request: sneakpeek.scraper_context.Request, config: Optional[Any]) sneakpeek.scraper_context.Request
Function that is called on each (HTTP) request before its dispatched.
- property name: str
Name of the plugin
- class sneakpeek.plugins.proxy_plugin.ProxyPluginConfig(*, proxy: str | yarl.URL | None = None, proxy_auth: aiohttp.helpers.BasicAuth | None = None)
Bases:
pydantic.main.BaseModel
Proxy plugin config
Create a new model by parsing and validating input data from keyword arguments.
Raises ValidationError if the input data cannot be parsed to form a valid model.
- Parameters
proxy (str | yarl.URL | None) –
proxy_auth (aiohttp.helpers.BasicAuth | None) –
- Return type
None
- proxy: str | yarl.URL | None
Proxy URL
- proxy_auth: aiohttp.helpers.BasicAuth | None
Proxy authentication info to use
- exception sneakpeek.plugins.rate_limiter_plugin.RateLimitedException
Bases:
Exception
Request is rate limited because too many requests were made to the host
- class sneakpeek.plugins.rate_limiter_plugin.RateLimitedStrategy(value)
Bases:
enum.Enum
What to do if the request is rate limited
- THROW = 1
Throw an exception
- WAIT = 2
Wait until request is no longer rate limited
- class sneakpeek.plugins.rate_limiter_plugin.RateLimiterPlugin(default_config: Optional[sneakpeek.plugins.rate_limiter_plugin.RateLimiterPluginConfig] = None)
Bases:
sneakpeek.scraper_context.BeforeRequestPlugin
Rate limiter implements leaky bucket algorithm to limit number of requests made to the hosts. If the request is rate limited it can either raise an exception or wait until the request won’t be limited anymore.
- Parameters
default_config (sneakpeek.plugins.rate_limiter_plugin.RateLimiterPluginConfig | None) –
- Return type
None
- async before_request(request: sneakpeek.scraper_context.Request, config: Optional[Any]) sneakpeek.scraper_context.Request
Function that is called on each (HTTP) request before its dispatched.
- property name: str
Name of the plugin
- class sneakpeek.plugins.rate_limiter_plugin.RateLimiterPluginConfig(*, max_requests: int = 60, rate_limited_strategy: sneakpeek.plugins.rate_limiter_plugin.RateLimitedStrategy = RateLimitedStrategy.WAIT, time_window: datetime.timedelta = datetime.timedelta(seconds=60))
Bases:
pydantic.main.BaseModel
Rate limiter plugin configuration
Create a new model by parsing and validating input data from keyword arguments.
Raises ValidationError if the input data cannot be parsed to form a valid model.
- Parameters
max_requests (int) –
rate_limited_strategy (sneakpeek.plugins.rate_limiter_plugin.RateLimitedStrategy) –
time_window (datetime.timedelta) –
- Return type
None
- max_requests: int
Maximum number of allowed requests per host within time window
- rate_limited_strategy: sneakpeek.plugins.rate_limiter_plugin.RateLimitedStrategy
What to do if the request is rate limited
- time_window: datetime.timedelta
Time window to aggregate requests
- class sneakpeek.plugins.requests_logging_plugin.RequestsLoggingPlugin(default_config: Optional[sneakpeek.plugins.requests_logging_plugin.RequestsLoggingPluginConfig] = None)
Bases:
sneakpeek.scraper_context.BeforeRequestPlugin
,sneakpeek.scraper_context.AfterResponsePlugin
Requests logging middleware logs all requests being made and received responses.
- Parameters
default_config (sneakpeek.plugins.requests_logging_plugin.RequestsLoggingPluginConfig | None) –
- Return type
None
- async after_response(request: sneakpeek.scraper_context.Request, response: aiohttp.client_reqrep.ClientResponse, config: Optional[Any]) aiohttp.client_reqrep.ClientResponse
Function that is called on each (HTTP) response before its result returned to the caller.
- Parameters
request (Request) – Request metadata
response (aiohttp.ClientResponse) – HTTP Response
config (Any | None, optional) – Plugin configuration. Defaults to None.
- Returns
HTTP Response
- Return type
aiohttp.ClientResponse
- async before_request(request: sneakpeek.scraper_context.Request, config: Optional[Any]) sneakpeek.scraper_context.Request
Function that is called on each (HTTP) request before its dispatched.
- property name: str
Name of the plugin
- class sneakpeek.plugins.requests_logging_plugin.RequestsLoggingPluginConfig(*, log_request: bool = True, log_response: bool = True)
Bases:
pydantic.main.BaseModel
Requests logging plugin config
Create a new model by parsing and validating input data from keyword arguments.
Raises ValidationError if the input data cannot be parsed to form a valid model.
- Parameters
log_request (bool) –
log_response (bool) –
- Return type
None
- log_request: bool
Whether to log request being made
- log_response: bool
Whether to log response being made
- class sneakpeek.plugins.robots_txt_plugin.RobotsTxtPlugin(default_config: Optional[sneakpeek.plugins.robots_txt_plugin.RobotsTxtPluginConfig] = None)
Bases:
sneakpeek.scraper_context.BeforeRequestPlugin
Robots.txt plugin can log and optionally block requests if they are disallowed by website robots.txt.
- Parameters
default_config (sneakpeek.plugins.robots_txt_plugin.RobotsTxtPluginConfig | None) –
- Return type
None
- async before_request(request: sneakpeek.scraper_context.Request, config: Optional[Any]) sneakpeek.scraper_context.Request
Function that is called on each (HTTP) request before its dispatched.
- property name: str
Name of the plugin
- class sneakpeek.plugins.robots_txt_plugin.RobotsTxtPluginConfig(*, violation_strategy: sneakpeek.plugins.robots_txt_plugin.RobotsTxtViolationStrategy = RobotsTxtViolationStrategy.LOG)
Bases:
pydantic.main.BaseModel
robots.txt plugin configuration
Create a new model by parsing and validating input data from keyword arguments.
Raises ValidationError if the input data cannot be parsed to form a valid model.
- Parameters
violation_strategy (sneakpeek.plugins.robots_txt_plugin.RobotsTxtViolationStrategy) –
- Return type
None
- exception sneakpeek.plugins.robots_txt_plugin.RobotsTxtViolationException
Bases:
Exception
Exception which is raised if request is disallowed by website robots.txt
- class sneakpeek.plugins.robots_txt_plugin.RobotsTxtViolationStrategy(value)
Bases:
enum.Enum
What to do if the request is disallowed by website robots.txt
- LOG = 1
Only log violation
- THROW = 2
Raise an exception on vioalation
- class sneakpeek.plugins.user_agent_injecter_plugin.UserAgentInjecterPlugin(default_config: Optional[sneakpeek.plugins.user_agent_injecter_plugin.UserAgentInjecterPluginConfig] = None)
Bases:
sneakpeek.scraper_context.BeforeRequestPlugin
This plugin automatically adds
User-Agent
header if it’s not present. It uses fake-useragent in order to generate fake real world user agents.- Parameters
default_config (sneakpeek.plugins.user_agent_injecter_plugin.UserAgentInjecterPluginConfig | None) –
- Return type
None
- async before_request(request: sneakpeek.scraper_context.Request, config: Optional[Any]) sneakpeek.scraper_context.Request
Function that is called on each (HTTP) request before its dispatched.
- property name: str
Name of the plugin
- class sneakpeek.plugins.user_agent_injecter_plugin.UserAgentInjecterPluginConfig(*, use_external_data: bool = True, browsers: list[str] = ['chrome', 'edge', 'firefox', 'safari', 'opera'])
Bases:
pydantic.main.BaseModel
Plugin configuration
Create a new model by parsing and validating input data from keyword arguments.
Raises ValidationError if the input data cannot be parsed to form a valid model.
- Parameters
use_external_data (bool) –
browsers (list[str]) –
- Return type
None
- browsers: list[str]
List of browsers which are used to generate user agents
- use_external_data: bool
Whether to use external data as a fallback