cmoncrawl.processor.pipeline.downloader.DownloaderDummy
Contents
cmoncrawl.processor.pipeline.downloader.DownloaderDummy#
- class cmoncrawl.processor.pipeline.downloader.DownloaderDummy(files: List[Path], url: Optional[str] = None, date: Optional[datetime] = None)#
Dummy downloader for testing It doesn’t download anything but return files passed in the constructor and extracts metadata from the file
- __init__(files: List[Path], url: Optional[str] = None, date: Optional[datetime] = None)#
Methods
__init__
(files[, url, date])download
(domain_record)extract_url
(content)extract_year
(file_path)mine_metadata
(content, file_path)