Metadata-Version: 2.4
Name: deep-semantic-search
Version: 0.1.0
Summary: A library for embedding, indexing, and applying semantic search for text and image data
Home-page: https://github.com/yourusername/deep-semantic-search
Author: Your Name
Author-email: Your Name <your.email@example.com>
License-Expression: MIT
Project-URL: Homepage, https://github.com/yourusername/deep-semantic-search
Project-URL: Bug Tracker, https://github.com/yourusername/deep-semantic-search/issues
Classifier: Programming Language :: Python :: 3
Classifier: Operating System :: OS Independent
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Text Processing :: Indexing
Requires-Python: >=3.8
Description-Content-Type: text/markdown
Requires-Dist: sentence-transformers>=2.7.0
Requires-Dist: torch>=2.0.0
Requires-Dist: transformers>=4.30.0
Requires-Dist: faiss-cpu>=1.8.0
Requires-Dist: pandas>=2.0.0
Requires-Dist: numpy>=1.22.0
Requires-Dist: matplotlib>=3.5.0
Requires-Dist: tqdm>=4.60.0
Requires-Dist: beautifulsoup4>=4.9.0
Requires-Dist: kmeans-pytorch>=0.3
Requires-Dist: langchain>=0.1.0
Requires-Dist: langchain-community>=0.0.1
Requires-Dist: pillow>=10.0.0
Dynamic: author
Dynamic: home-page
Dynamic: requires-python

# Deep Semantic Search

A Python library for embedding, indexing, and applying semantic search for text and image data.

## Features

- **Multi-modal Semantic Search**:
  - Embedding and indexing text data using the nli-mpnet-base-v2 model
  - Embedding and indexing image data using the CLIP model
  - Semantic search for both text and image data
  - Search images by both image and text queries

- **Clustering and Image Captioning**:
  - Cluster image embeddings using PyTorch KMeans (with GPU support)
  - Caption images using the BLIP model

- **Retrieval-Augmented Generation (RAG)**:
  - Answer questions based on search results
  - Summarize search results
  - Generate topics for image captions

## Installation

```bash
pip install deep-semantic-search
```

## Quick Start

### Text Search

```python
from deep_semantic_search import LoadTextData, TextEmbedder, TextSearch

# Load text data
loader = LoadTextData()
corpus_dict = loader.from_folder("path/to/text/files")

# Embed the text data
embedder = TextEmbedder()
embedder.embed(corpus_dict)

# Search for similar texts
search = TextSearch()
results = search.find_similar("your search query", top_n=5)

for result in results:
    print(f"Score: {result['score']}, Text: {result['text'][:100]}...")
```

### Image Search

```python
from deep_semantic_search import LoadImageData, ImageSearch

# Load image data
loader = LoadImageData()
image_paths = loader.from_folder("path/to/images")

# Set up image search
searcher = ImageSearch(image_paths)

# Search for similar images to a text query
results = searcher.get_similar_images_to_text("cat on a sofa", number_of_images=5)

# Display results
for path, score in results.items():
    print(f"Score: {score}, Image: {path}")
```

### RAG (Retrieval-Augmented Generation)

```python
from deep_semantic_search import ask_question

# Ask a question based on provided text data
texts = ["Text document 1...", "Text document 2..."]
answer = ask_question(texts, "What is the main topic discussed?")
print(answer)
```

## Requirements

- Python 3.8+
- PyTorch
- Sentence Transformers
- Hugging Face Transformers
- FAISS
- LangChain

## License

MIT
