cmoncrawl.processor.pipeline.extractor#

Classes

BaseExtractor([encoding])

DomainRecordExtractor([filter_non_ok])

Dummy Extractor which simply extracts the html

HTMLExtractor([filter_non_ok])

Dummy Extractor which simply extracts the html

IExtractor()