matutils – Math utils

This module contains math helper functions.

class gensim.matutils.MmReader(fname)

Wrap a term-document matrix on disk (in matrix-market format), and present it as an object which supports iteration over the rows (~documents).

Note that the file is read into memory one document at a time, not the whole matrix at once (unlike scipy.io.mmread). This allows for representing corpora which are larger than the available RAM.

Initialize the matrix reader.

The fname is a path to a file on local filesystem, which is expected to be in sparse (coordinate) Matrix Market format. Documents are assumed to be rows of the matrix (and document features are columns).

class gensim.matutils.MmWriter(fname)

Store corpus in Matrix Market format.

static writeCorpus(fname, corpus)

Save the vector space representation of an entire corpus to disk.

Note that the documents are processed one at a time, so the whole corpus is allowed to be larger than the available RAM.

writeVector(docNo, vector)

Write a single sparse vector to the file.

Sparse vector is any iterable yielding (field id, field value) pairs.

gensim.matutils.doc2vec(doc, length)
Convert document in sparse format (sequence of 2-tuples) into a full numpy array (of size length).
gensim.matutils.pad(mat, padRow, padCol)
Add additional rows/columns to a numpy.matrix mat. The new rows/columns will be initialized with zeros.
gensim.matutils.unitVec(vec)
Scale a sparse vector to another sparse vector of unit length.

Previous topic

utils – Various utility functions

Next topic

corpora.bleicorpus – Corpus in Blei’s LDA-C format