Metadata-Version: 2.4
Name: ellama
Version: 0.3.2
Summary: Embeddings interface for Ollama
Author-email: Casper da Costa-Luis <casper.dcl@physics.org>
License-Expression: MPL-2.0
Project-URL: repository, https://github.com/casperdcl/ellama
Keywords: vector,embeddings,pca,t-sne,semantic-search,umap,faiss,langchain,ollama
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.9
Description-Content-Type: text/markdown
Requires-Dist: langchain-community
Requires-Dist: langchain-ollama
Requires-Dist: requests
Requires-Dist: tqdm
Provides-Extra: cpu
Requires-Dist: faiss-cpu; extra == "cpu"
Provides-Extra: gpu
Requires-Dist: faiss-gpu; extra == "gpu"
Provides-Extra: dev
Requires-Dist: pytest-cov; extra == "dev"
Provides-Extra: plot
Requires-Dist: matplotlib; extra == "plot"
Requires-Dist: scikit-learn; extra == "plot"
Provides-Extra: umap
Requires-Dist: umap-learn; extra == "umap"

# Ellama

*Embeddings library*

[![ollama](https://img.shields.io/badge/models-ollama-black?logo=ollama)](https://ollama.com/search?c=embedding)
[![faiss](https://img.shields.io/badge/database-faiss-blue?logo=facebook)](https://github.com/facebookresearch/faiss)
[![langchain](https://img.shields.io/badge/glue-langchain-cyan?logo=langchain)](https://github.com/langchain-ai/langchain)

[![pca](https://img.shields.io/badge/projection-PCA-orange?logo=scikit-learn)](https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.PCA.html)
[![t-sne](https://img.shields.io/badge/projection-t--SNE-orange?logo=scikit-learn)](https://scikit-learn.org/stable/modules/generated/sklearn.manifold.TSNE.html)
[![umap](https://img.shields.io/badge/projection-UMAP-orange)](https://umap-learn.readthedocs.io)
[![plot](https://img.shields.io/badge/plot-matplotlib-green)](https://matplotlib.org/stable/api/_as_gen/matplotlib.axes.Axes.scatter.html)

[![test](https://github.com/casperdcl/ellama/actions/workflows/test.yml/badge.svg)](https://github.com/casperdcl/ellama/actions/workflows/test.yml)

Unlike the overwhelming majority of alternatives:

- handles long inputs without truncation even if the underlying model has a small context window
- minimal config
- 100% human-written (thus sane & clean) codebase

> [!TIP]
> Please open an issue if you know of any better alternatives. I would love to archive this repo.

```py
from ellama import EllamaDB, Document

db = EllamaDB("test")
db.add_documents([
    Document("hello world", id="salutation"),
    Document("goodbye and goodnight", id="farewell")])
docs = db.similarity_search("Greetings, Earth!", k=1)
assert len(docs) == 1
assert docs[0].id == "salutation"
```

## plot

Embedding database visualisation:

```py
import matplotlib.pyplot as plt
from ellama import EllamaDB, Document
from sklearn.datasets import fetch_20newsgroups

raw = fetch_20newsgroups(data_home='.cache')
db = EllamaDB("20newsgroups")
db.add_documents([
    Document(raw.data[i], id=str(i), metadata={'name': raw.target_names[raw.target[i]]})
    for i in range(200)])

for group in ['alt.atheism', 'comp', 'misc.forsale', 'rec', 'rec.sport', 'sci',
              'soc.religion', 'talk.politics', 'talk.religion']:
    db.plot('t-SNE', label=group,
            filter=lambda metadata: metadata['name'].startswith(f'{group}.'))

plt.title(f"Newsgroup {db.embeddings.model} embeddings t-SNE")
plt.legend()
plt.show()
```

![](https://img.cdcl.ml/ellama-plot.svg)

## install

### `pip` (CPU)

```sh
pip install "ellama[cpu]"           # basic
pip install "ellama[cpu,plot]"      # plot('PCA' or 't-SNE')
pip install "ellama[cpu,plot,umap]" # plot('UMAP')
```

### `conda` (GPU)

```yml
name: ellama
channels: [pytorch, nvidia, conda-forge]
dependencies:
- langchain 1.*
- langchain-community
- faiss-gpu
- requests
- tqdm
#- matplotlib   # ellama plot()
#- scikit-learn # ellama plot()
#- umap-learn   # ellama plot('UMAP')
- pip
- pip:
  - ellama
```
