Usage

Converting files using the command line interface

Using the pydocx command, you can specify the output format with the input and output files:

$ pydocx --html input.docx output.html

Converting files using the library directly

If you don’t want to mess around having to create exporters, you can use the PyDocX.to_html helper method:

from pydocx.pydocx import PyDocX

# Pass in a path
html = PyDocX.to_html('file.docx')

# Pass in a file object
html = PyDocX.to_html(open('file.docx', 'rb'))

# Pass in a file-like object
from cStringIO import StringIO
buf = StringIO()
with open('file.docx') as f:
   buf.write(f.read())

html = PyDocX.to_html(buf)

Of course, you can do the same using the exporter class:

from pydocx.export import PyDocXHTMLExporter

# Pass in a path
exporter = PyDocXHTMLExporter('file.docx')
html = exporter.parsed

# Pass in a file object
exporter = PyDocXHTMLExporter(open('file.docx', 'rb'))
html = exporter.parsed

# Pass in a file-like object
from cStringIO import StringIO
buf = StringIO()
with open('file.docx') as f:
   buf.write(f.read())

exporter = PyDocXHTMLExporter(buf)
html = exporter.parsed

Currently Supported HTML elements

  • tables
    • nested tables
    • rowspans
    • colspans
    • lists in tables
  • lists
    • list styles
    • nested lists
    • list of tables
    • list of pragraphs
  • justification
  • images
  • styles
    • bold
    • italics
    • underline
    • hyperlinks
  • headings

HTML Styles

The export class pydocx.export.PyDocXHTMLExporter relies on certain CSS classes being defined for certain behavior to occur.

Currently these include:

  • class pydocx-insert -> Turns the text green.
  • class pydocx-delete -> Turns the text red and draws a line through the text.
  • class pydocx-center -> Aligns the text to the center.
  • class pydocx-right -> Aligns the text to the right.
  • class pydocx-left -> Aligns the text to the left.
  • class pydocx-comment -> Turns the text blue.
  • class pydocx-underline -> Underlines the text.
  • class pydocx-caps -> Makes all text uppercase.
  • class pydocx-small-caps -> Makes all text uppercase, however truly lowercase letters will be small than their uppercase counterparts.
  • class pydocx-strike -> Strike a line through.
  • class pydocx-hidden -> Hide the text.
  • class pydocx-tab -> Represents a tab within the document.

Exceptions

There is only one custom exception (MalformedDocxException). It is raised if either the xml or zipfile libraries raise an exception.

Deviations from the ECMA-376 Specification

Missing val attribute in underline tag