What’s new?
Version 0.7 is out!
gensim now completes LSI of the English Wikipedia (3.2 million documents) in 5 hours 14 minutes, using a one-pass SVD algorithm, on a single Macbook Pro laptop. Be sure to check out the distributed mode, too.
For an overview on what gensim does (or does not do), go to the introduction.
To download and install gensim, consult the install page.
For examples on how to use it, try the tutorials.
>>> from gensim import corpora, models, similarities
>>>
>>> # load corpus iterator from a Matrix Market file on disk
>>> corpus = corpora.MmCorpus('/path/to/corpus.mm')
>>>
>>> # initialize a transformation (Latent Semantic Indexing with 200 latent dimensions)
>>> lsi = models.LsiModel(corpus, numTopics=200)
>>>
>>> # convert the same corpus to latent space and index it
>>> index = similarities.MatrixSimilarity(lsi[corpus])
>>>
>>> # perform similarity query of another vector in LSI space against the whole corpus
>>> sims = index[query]