cmoncrawl.processor.pipeline.extractor.BaseExtractor
Contents
cmoncrawl.processor.pipeline.extractor.BaseExtractor#
- class cmoncrawl.processor.pipeline.extractor.BaseExtractor(encoding: Optional[str] = None)#
- __init__(encoding: Optional[str] = None)#
Methods
__init__
([encoding])extract
(response, metadata)extract_soup
(soup, metadata)filter_raw
(response, metadata)filter_soup
(soup, metadata)preprocess
(response, metadata)