Package nsi :: Package granulate :: Module GranulateOffice' :: Class GranulateOffice
[hide private]
[frames] | no frames]

Class GranulateOffice

source code

object --+
         |
        GranulateOffice


- Provide the grain extraction functionality for ms-office and odf documents
- Retrieve tables, images, thumbnails and summary

Instance Methods [hide private]
 
__call__(self) source code
 
__convertDocumentToOdf(self)
Convert a ms-office document to Open Document Format (odf)
source code
 
__createNewOOoDocument(self)
Creates a new odt document based in a blank template
source code
 
__getAttrStyles(self, Node)
Get the associated Styles of given node
source code
 
__getAttributesR(self, Node, styles=[]) source code
 
__getImageDocumentList(self)
Extract the images from a document and return a list of Grain instances
source code
 
__getNodeText(self, node)
Get text value in a xml node
source code
 
__getSummaryDocument(self)
Get the Summary of an odf document
source code
 
__getTableDocumentList(self)
Extract the tables from a document and return a list of Grain instances
source code
 
__getTextChildNodesImage(self, node, text=[])
Get the subtitle text of image in odf document
source code
 
__getTextChildNodesTable(self, node, text=[])
Get the subtitle text of a table in odf document
source code
 
__getThumbnailsDocument(self)
Get the Thumbnails of an odf document
source code
 
__init__(self, Document=None, ooodServer=None)
- The parameter "Document" is a instance of the class "File" what it is in the FileUtils module...
source code
 
__mkServer(self)
Create a connection to the OpenOffice(oood-ERP5) Server
source code
 
__parseXmlZipFile(self)
Uncompress an odf file and parse the "content.xml" file.
source code
 
getImageDocumentList(self)
Invoke the private method __getImageDocumentList in order to retrieve the document's images
source code
 
getSummaryDocument(self)
Get document's summary
source code
 
getTableDocumentList(self)
Invoke the private method __getTableDocumentList in order to retrieve the document's tables
source code
 
getThumbnailsDocument(self)
Get document's thumbnails
source code
 
granulateDocument(self)
Extract the grains from a document, returning a dictionary with a list of tables and a list of images
source code

Inherited from object: __delattr__, __getattribute__, __hash__, __new__, __reduce__, __reduce_ex__, __repr__, __setattr__, __str__

Class Variables [hide private]
  Document = None
  __ooodServer = None
  __parseContent = None
  __zipFile = None
  supportedConvertionMimeTypes = ('application/msword', 'applica...
  supportedGranulateMimeTypes = ('application/vnd.oasis.opendocu...
  supportedMimeType = ('application/vnd.oasis.opendocument.text'...
Properties [hide private]

Inherited from object: __class__

Method Details [hide private]

__init__(self, Document=None, ooodServer=None)
(Constructor)

source code 

- The parameter "Document" is a instance of the class "File" what it is in the FileUtils module
- The ooodServer MUST be specified if the file is a ms-office file, or else the convertion server
  will be not found, and the grains will not be extracted

Overrides: object.__init__

Class Variable Details [hide private]

supportedConvertionMimeTypes

Value:
('application/msword',
 'application/rtf',
 'application/vnd.ms-powerpoint')

supportedGranulateMimeTypes

Value:
('application/vnd.oasis.opendocument.text',
 'application/vnd.oasis.opendocument.presentation')

supportedMimeType

Value:
('application/vnd.oasis.opendocument.text',
 'application/vnd.sun.xml.writer',
 'application/msword',
 'application/rtf',
 'application/vnd.stardivision.writer',
 'application/x-starwriter',
 'text/plain',
 'application/vnd.oasis.opendocument.spreadsheet',
...