Metadata-Version: 2.4
Name: pyllmsearch
Version: 0.9.5
Summary: LLM Powered Advanced RAG Application
Project-URL: Homepage, https://github.com/snexus/llm-search
Project-URL: Documentation, https://llm-search.readthedocs.io/en/latest/
Keywords: llm,rag,retrieval-augemented-generation,large-language-models,local,splade,hyde,reranking,chroma,openai
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3 :: Only
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: langchain-community>=0.4.1
Requires-Dist: langchain>=1.2.4
Requires-Dist: langchain-huggingface>=1.2.0
Requires-Dist: langchain-chroma>=1.1.0
Requires-Dist: python-dotenv>=1.2.1
Requires-Dist: loguru>=0.7.3
Requires-Dist: click>=8.3.1
Requires-Dist: openai>=2.15.0
Requires-Dist: streamlit>=1.53.0
Requires-Dist: tenacity>=9.1.2
Requires-Dist: tqdm>=4.67.1
Requires-Dist: gmft==0.2.1
Requires-Dist: pypdf2>=3.0.1
Requires-Dist: pydantic>=2.12.5
Requires-Dist: instructorembedding>=1.0.1
Requires-Dist: unstructured>=0.18.27
Requires-Dist: tiktoken>=0.12.0
Requires-Dist: tokenizers>=0.22.2
Requires-Dist: langchain-openai>=1.1.7
Requires-Dist: python-docx>=1.2.0
Requires-Dist: pymupdf>=1.26.7
Requires-Dist: termcolor>=3.3.0
Requires-Dist: fastapi-mcp>=0.4.0
Requires-Dist: scipy>=1.15.3
Requires-Dist: sentence-transformers>=5.2.0
Provides-Extra: dev
Requires-Dist: black; extra == "dev"
Requires-Dist: pytest; extra == "dev"
Requires-Dist: pytest-cov; extra == "dev"
Requires-Dist: ruff; extra == "dev"
Requires-Dist: autodoc_pydantic; extra == "dev"
Requires-Dist: sphinx; extra == "dev"
Requires-Dist: sphinx-markdown-builder; extra == "dev"
Requires-Dist: sphinx_rtd_theme; extra == "dev"
Provides-Extra: azureparser
Requires-Dist: azure-ai-documentintelligence==1.0.0b3; extra == "azureparser"
Requires-Dist: azure-identity==1.17.1; extra == "azureparser"
Provides-Extra: googleparser
Requires-Dist: google-generativeai>=0.8.5; extra == "googleparser"
Dynamic: license-file

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://githubtocolab.com/snexus/llm-search/blob/main/notebooks/llmsearch_google_colab_demo.ipynb)

# pyLLMSearch - Advanced RAG

[Documentation](https://llm-search.readthedocs.io/en/latest/)

The purpose of this package is to offer an advanced question-answering (RAG) system with a simple YAML-based configuration that enables interaction with a collection of local documents. Special attention is given to improvements in various components of the system **in addition to basic LLM-based RAGs** - better document parsing, hybrid search, HyDE, chat history, deep linking, re-ranking, the ability to customize embeddings, and more. The package is designed to work with custom Large Language Models (LLMs) – whether from OpenAI or installed locally.

Interaction with the package is supported through the built-in frontend, or by exposing an MCP server, allowing clients like Cursor, Windsurf or VSCode GH Copilot to interact with the RAG system.

## Features

* Fast parsing and embedding of medium size document bases (tested on up to few gigabytes of markdown and pdfs)
* Incremental updates for new documents, without a need to re-index the entire document base.
* Supported document formats
    * Build-in parsers:
        * `.md` - Divides files based on logical components such as headings, subheadings, and code blocks. Supports additional features like cleaning image links, adding custom metadata, and more.
        * `.pdf` - MuPDF-based parser.
        * `.docx` - custom parser, supports nested tables.
    * Other common formats are supported by `Unstructured` pre-processor:
        * List of formats see [here](https://unstructured-io.github.io/unstructured/core/partition.html).
* FastAPI based API + MCP server, allowing communicating with RAG via any mcp client, including VSCode/Windsurf/Cursor and others.

* Deep linking into document sections - jump to an individual PDF page or a header in a markdown file.

* Allows interaction with embedded documents, internally supporting the following models and 
methods (including locally hosted):
    * OpenAI compatible models and APIs.
    * HuggingFace models.

* Interoperability with LiteLLM + Ollama via OpenAI API, supporting hundreds of different models (see [Model configuration for LiteLLM](sample_templates/llm/litellm.yaml))

* SSE MCP Server enabling interface with popular MCP clients.

* Hybrid search and Reranking
    * Dense embeddings from a folder of documents and stores them in a vector database ([ChromaDB](https://github.com/chroma-core/chroma)).
    * The following embedding models are supported:
        * Hugging Face embeddings.
        * Sentence-transformers-based models.
        * Instructor-based models.
        * OpenAI embeddings.

    * Sparse embeddings using SPLADE (https://github.com/naver/splade) to enable hybrid search (sparse + dense).

    * Supports the "Retrieve and Re-rank" strategy for semantic search, see [here](https://www.sbert.net/examples/applications/retrieve_rerank/README.html).
        * Besides the originally `ms-marco-MiniLM` cross-encoder, more modern `bge-reranker-v2-m3` and `zerank-2` is supported.


* Support for table parsing via open-source gmft (https://github.com/conjuncts/gmft) or Azure Document Intelligence.

* Optional support for image parsing using Gemini API.


* Supports HyDE (Hypothetical Document Embeddings) - see [here](https://arxiv.org/pdf/2212.10496.pdf).
    * WARNING: Enabling HyDE (via config OR webapp) can significantly alter the quality of the results. Please make sure to read the paper before enabling.
    * From my own experiments, enabling HyDE significantly boosts quality of the output on a topics where user can't formulate the quesiton using domain specific language of the topic - e.g. when learning new topics.

* Support for multi-querying, inspired by `RAG Fusion` - https://towardsdatascience.com/forget-rag-the-future-is-rag-fusion-1147298d8ad1
    * When multi-querying is turned on (either config or webapp), the original query will be replaced by 3 variants of the same query, allowing to bridge the gap in the terminology and "offer different angles or perspectives" according to the article.

* Supprts optional chat history with question contextualization

* Other features
    * Simple web interfaces.
    * Ability to save responses to an offline database for future analysis.



## Demo

![Demo](media/llmsearch-demo-v2.gif)


## Documentation

[Browse Documentation](https://llm-search.readthedocs.io/en/latest/)
