Previous topic

models.lsimodel – Latent Semantic Indexing

Next topic

models.rpmodel – Random Projections

models.tfidfmodel – TF-IDF model

class gensim.models.tfidfmodel.TfidfModel(corpus=None, id2word=None, dictionary=None, normalize=True)

Objects of this class realize the transformation between word-document co-occurence matrix (integers) into a locally/globally weighted matrix (positive floats).

This is done by combining the term frequency counts (the TF part) with inverse document frequency counts (the IDF part), optionally normalizing the resulting documents to unit length.

The main methods are:

  1. constructor, which calculates IDF weights for all terms in the training corpus.
  2. the [] method, which transforms a simple count representation into the TfIdf space.
>>> tfidf = TfidfModel(corpus)
>>> print = tfidf[some_doc]
>>> tfidf.save('/tmp/foo.tfidf_model')

Model persistency is achieved via its load/save methods.

normalize dictates whether the transformed vectors will be set to unit length.

If dictionary is specified, it must be a corpora.Dictionary object and it will be used to directly construct the inverse document frequency mapping (then corpus, if specified, is ignored).

initialize(corpus)

Compute inverse document weights, which will be used to modify term frequencies for documents.

classmethod load(fname)

Load a previously saved object from file (also see save).

save(fname)

Save the object to file via pickling (also see load).

gensim.models.tfidfmodel.dfs2idfs(dfs, totaldocs)

Given a mapping of term->document frequency, construct a mapping of term->inverse document frequency.

gensim.models.tfidfmodel.idfs2dfs(idfs, totaldocs)

Inverse mapping for dfs2idfs.