The configuration file is the basic specification of the extractor required. It contains the URL for the web page to be loaded, the selector expressions for the data to be extracted and in the case of crawlers, the selector expression for the links to be crawled through.
The keys used in the configuration file are :
project_name : Specifies the name of the project with which the configuration file is associated.
selector_type : Specifies the type of selector expressions used. This could be “xpath” or “css”.
url : Specifies the URL of the base web page to be loaded.
The main objective of the configuration file is to specify extraction rules in terms of selector expressions and the attribute to be extracted. There are certain set forms of selector/attribute value pairs that perform various types of content extraction.
Selector expressions :
Attribute selectors :