Coverage for lino/utils/html2xhtml.py : 68%

Hot-keys on this page
r m x p toggle line displays
j k next/prev highlighted chunk
0 (zero) top of page
1 (one) first highlighted chunk
# -*- coding: utf-8 -*- # Copyright 2011-2016 Luc Saffre # License: BSD (see file COPYING for details)
# How to test this document: # # $ python setup.py test -s tests.UtilsTests.test_tidy
valid XHTML.
It uses Jason Stitt's `pytidylib <http://countergram.com/open-source/pytidylib/docs/index.html>`__ module. This module requires the `HTML Tidy library <http://tidy.sourceforge.net/>`__ to be installed on the system::
$ sudo aptitude install tidy
Some examples:
>>> print(html2xhtml('''\ ... <p>Hello, world!<br>Again I say: Hello, world!</p> ... <img src="foo.org" alt="Foo">''')) ... #doctest: +NORMALIZE_WHITESPACE -SKIP <p>Hello, world!<br /> Again I say: Hello, world!</p> <img src="foo.org" alt="Foo" />
Above test is currently skipped because tidylib output can slightly differ (``alt="Foo">`` versus ``alt="Foo" >``) depending on the installed version of tidylib.
>>> html = '''\ ... <p style="font-family: "Verdana";">Verdana</p>''' >>> print(html2xhtml(html)) <p style="font-family: "Verdana";">Verdana</p>
>>> print(html2xhtml('A & B')) A & B
>>> print(html2xhtml('a < b')) a < b
A `<div>` inside a `<span>` is not valid XHTML. Neither is a `<li>` inside a `<strong>`.
But how to convert it? Inline tags must be "temporarily" closed before and reopended after a block element.
>>> print(html2xhtml('<p>foo<span class="c">bar<div> oops </div>baz</span>bam</p>')) <p>foo<span class="c">bar</span></p> <div><span class="c">oops</span></div> <span class="c">baz</span>bam
>>> print(html2xhtml('''<strong><ul><em><li>Foo</li></em><li>Bar</li></ul></strong>''')) <ul> <li><strong><em>Foo</em></strong></li> <li><strong>Bar</strong></li> </ul>
In HTML it was tolerated to not end certain tags. For example, a string "<p>foo<p>bar<p>baz" converts to "<p>foo</p><p>bar</p><p>baz</p>".
>>> print(html2xhtml('<p>foo<p>bar<p>baz')) <p>foo</p> <p>bar</p> <p>baz</p>
"""
# from __future__ import print_function, unicode_literals
<html> <head> <title></title> </head> <body> """
# http://tidy.sourceforge.net/docs/quickref.html
# options.update(output_xml=1) #~ raise Exception(repr(errors)) raise Exception("Errors while processing %s\n==========\n%s" % (html, errors)) # if document.startswith(WRAP_BEFORE): # document = document[len(WRAP_BEFORE):] # document = document[:-15]
except OSError: # happens on readthedocs.org and Travis CI: OSError: Could not # load libtidy using any of these names: # libtidy,libtidy.so,libtidy-0.99.so.0,cygtidy-0-99-0,tidylib, # libtidy.dylib,tidy
# We can simply ignore it since it is just for building the docs. from lino.utils.mytidylib import html2xhtml # TODO: emulate it well enough so that at least the test suite passes
HAS_TIDYLIB = False
import doctest doctest.testmod()
_test() |