This module contains math helper functions.
Treat dense numpy array as a sparse gensim corpus.
No data copy is made (changes to the underlying matrix imply changes in the corpus).
Wrap a term-document matrix on disk (in matrix-market format), and present it as an object which supports iteration over the rows (~documents).
Note that the file is read into memory one document at a time, not the whole matrix at once (unlike scipy.io.mmread). This allows us to process corpora which are larger than the available RAM.
Initialize the matrix reader.
The input refers to a file on local filesystem, which is expected to be in the sparse (coordinate) Matrix Market format. Documents are assumed to be rows of the matrix (and document features are columns).
input is either a string (file path) or a file-like object that supports seek(0) (e.g. gzip.GzipFile, bz2.BZ2File).
Store corpus in Matrix Market format.
Save the vector space representation of an entire corpus to disk.
Note that the documents are processed one at a time, so the whole corpus is allowed to be larger than the available RAM.
Write a single sparse vector to the file.
Sparse vector is any iterable yielding (field id, field value) pairs.
Convert a dense numpy array into the sparse corpus format (sequence of 2-tuples).
Values of magnitude < eps are treated as zero (ignored).
Scale a vector to unit length. The only exception is the zero vector, which is returned back unchanged.
If the input is sparse (list of 2-tuples), output will also be sparse. Otherwise, output will be a numpy array.