Package lxml :: Module etree :: Class HTMLParser
[hide private]
[frames] | no frames]

Class HTMLParser



 object --+    
          |    
_BaseParser --+
              |
             HTMLParser

The HTML parser. This parser allows reading HTML into a normal XML tree. By default, it can read broken (non well-formed) HTML, depending on the capabilities of libxml2. Use the 'recover' option to switch this off.

Available boolean keyword arguments: * recover - try hard to parse through broken HTML (default: True) * no_network - prevent network access (default: True) * remove_blank_text - discard empty text nodes * remove_comments - discard comments * remove_pis - discard processing instructions * compact - safe memory for short text content (default: True)

Note that you should avoid sharing parsers between threads for performance reasons.

Instance Methods [hide private]
 
__init__(...)
x.__init__(...) initializes x; see x.__class__.__doc__ for signature
 
__new__(T, S, ...)
Returns: a new object with type S, a subtype of T

Inherited from _BaseParser: copy, makeelement, setElementClassLookup, set_element_class_lookup

Inherited from object: __delattr__, __getattribute__, __hash__, __reduce__, __reduce_ex__, __repr__, __setattr__, __str__

Properties [hide private]

Inherited from _BaseParser: error_log, resolvers

Inherited from object: __class__

Method Details [hide private]

__init__(...)
(Constructor)

 
x.__init__(...) initializes x; see x.__class__.__doc__ for signature
Overrides: _BaseParser.__init__

__new__(T, S, ...)

 
Returns:
a new object with type S, a subtype of T

Overrides: _BaseParser.__new__