Metadata-Version: 2.1
Name: vector-inspector
Version: 0.7.0
Summary: A comprehensive desktop application for visualizing, querying, and managing vector database data
Author-Email: Anthony Dawson <anthonypdawson+github@gmail.com>
License: MIT
Project-URL: Homepage, https://vector-inspector.divinedevops.com
Project-URL: Source, https://github.com/anthonypdawson/vector-inspector
Project-URL: Issues, https://github.com/anthonypdawson/vector-inspector/issues
Project-URL: Documentation, https://github.com/anthonypdawson/vector-inspector#readme
Requires-Python: >=3.11
Requires-Dist: chromadb>=0.4.22
Requires-Dist: qdrant-client>=1.7.0
Requires-Dist: pyside6>=6.6.0
Requires-Dist: PySide6-Addons>=6.6.3.1
Requires-Dist: pandas>=2.1.0
Requires-Dist: numpy>=1.26.0
Requires-Dist: scikit-learn>=1.3.0
Requires-Dist: umap-learn>=0.5.5
Requires-Dist: plotly>=5.18.0
Requires-Dist: sentence-transformers>=2.2.0
Requires-Dist: fastembed>=0.7.4
Requires-Dist: pyarrow>=14.0.0
Requires-Dist: pinecone>=8.0.0
Requires-Dist: keyring>=25.7.0
Requires-Dist: hf-xet>=1.2.0
Requires-Dist: lancedb>=0.27.0
Requires-Dist: psycopg2-binary>=2.9.11
Requires-Dist: pgvector>=0.4.2
Requires-Dist: pymilvus>=2.6.8
Requires-Dist: hdbscan>=0.8.41
Requires-Dist: weaviate-client>=4.19.2
Requires-Dist: gputil>=1.4.0
Requires-Dist: torch>=2.0.0
Requires-Dist: transformers>=4.40.0
Requires-Dist: Pillow>=10.0.0
Requires-Dist: pypdf>=4.0.0
Requires-Dist: python-docx>=1.1.0
Provides-Extra: llm
Requires-Dist: llama-cpp-python>=0.3.0; extra == "llm"
Description-Content-Type: text/markdown

# Vector Inspector 0.7.0 — April 5, 2026  
A major milestone release introducing **full text and image ingestion**, **multimodal embeddings**, and **inline file previews**.  
You can now build real multimodal collections, inspect their embeddings, and run text→image semantic search directly inside Vector Inspector.

---

# 🚀 Highlights

### **✓ Multimodal Ingestion (Images + Documents)**
Import images, text files, PDFs, Word docs, and source files into any collection.  
Images are embedded with CLIP (512‑dim).  
Documents are chunked and embedded with MiniLM (384‑dim).

### **✓ Text → Image Semantic Search**
You can now type a natural‑language query and retrieve matching images from your collection using CLIP’s shared embedding space.

### **✓ Inline File Previews**
Images and text files now show thumbnails or text snippets directly in the details pane and item dialog, making it easy to verify ingestion and debug metadata.

### **✓ Robust, Production‑Ready Ingestion Pipeline**
Chunking, duplicate detection, partial‑ingest recovery, and detailed logging ensure ingestion is deterministic, resumable, and transparent.

---

# 🧩 Ingestion

- **Image ingestion pipeline** using CLIP (`openai/clip-vit-base-patch32`, 512‑dim)  
- **Document ingestion pipeline** using sentence-transformers (`all-MiniLM-L6-v2`, 384‑dim)  
- Import via **“Import Images…”** and **“Import Documents…”** in the Tools menu  
- Paragraph‑aware chunking for documents (1000 chars default) with `chunk_index`, `chunk_total`, `parent_id`, and file metadata  
- **Three‑way duplicate detection:**  
  - new files ingested  
  - fully-present files skipped  
  - partially-ingested files automatically cleaned and re‑ingested  
- **Re-ingest file…** context menu option for single-file overwrite  
- Lazy loading of heavy dependencies with clear install guidance  
- Ingestion dialog shows filename + progress (e.g. “3 of 42”)  
- Collections auto-refresh after ingestion  
- Per-file log entries restored (`Ingested image: …`, `Ingested document: …`)  
- Telemetry: `ingestion.started` and `ingestion.completed` with full metrics  
- New collections created at ingestion time via `CollectionService`  
- Backends without configurable vector size show read-only dimension label  

---

# 🖼️ File Preview

- New **File Preview** section in the inline details pane  
- Image thumbnails:  
  - 160×120 inline  
  - 320×240 in item dialog  
- Text previews:  
  - 30 lines / 2 KB inline  
  - 100 lines / 8 KB in dialog  
- Right-click actions: **Open** and **Reveal in Explorer/Finder/Files**  
- Double-click image → open in OS viewer  
- Metadata table now shows a 📎 icon for rows with previewable files  
- Preview detection via `find_preview_paths()` with safe fallbacks  
- Text detection via `mimetypes.guess_type` + null-byte sniff  
- Collapsed state persisted in settings  

---

# 🛠️ Bug Fixes & Stability

- Fixed `_flush()` in ingestion pipelines to correctly detect and raise on failed writes  
- Fixed CLIP crash on tiny images (<3×3 px) with a clear error message  
- Fixed embedding nesting: `_l2_normalize` now flattens to 1D  
- Truncated long error strings to avoid log flooding  
- Silenced noisy third‑party loggers (`chromadb`, `sentence_transformers`, etc.)  
- Fixed UI crash when metadata contained non‑JSON‑serializable types (e.g. `uuid.UUID`) via a new metadata sanitizer  

---

# 🎉 Summary

0.7.0 transforms Vector Inspector into a **true multimodal semantic debugging tool**.  
You can now ingest real documents and images, inspect their embeddings, preview their contents, and run text→image semantic search — all with a stable, production‑grade ingestion pipeline.

---

# Vector Inspector

[![CI](https://github.com/anthonypdawson/vector-inspector/actions/workflows/ci-tests.yml/badge.svg?branch=master)](https://github.com/anthonypdawson/vector-inspector/actions/workflows/ci-tests.yml) [![Coverage Status](https://coveralls.io/repos/github/anthonypdawson/vector-inspector/badge.svg?branch=master)](https://coveralls.io/github/anthonypdawson/vector-inspector?branch=master)
[![Publish](https://github.com/anthonypdawson/vector-inspector/actions/workflows/release-and-publish.yml/badge.svg?branch=master)](https://github.com/anthonypdawson/vector-inspector/actions/workflows/release-and-publish.yml)

[![PyPI Version](https://img.shields.io/pypi/v/vector-inspector.svg?cacheSeconds=300)](https://pypi.org/project/vector-inspector/)
[![PyPI Downloads](https://static.pepy.tech/personalized-badge/vector-inspector?period=total&units=INTERNATIONAL_SYSTEM&left_color=BLACK&right_color=GREEN&left_text=downloads)](https://pepy.tech/projects/vector-inspector)

The ultimate toolkit for vector databases - a comprehensive desktop app to inspect, query, and visualize your embeddings across Chroma, Qdrant, Weaviate, Pinecone, LanceDB, pgvector and more.


Similar to SQL viewers (DBeaver/TablePlus) but built for vector databases, Vector Inspector offers an intuitive GUI for exploring embeddings, metadata, similarity search, and CRUD across multiple providers.

<p align="center">
  <a href="site/images/demo.gif" target="_blank">
    <img src="site/images/demo.gif" alt="Vector Inspector Demo" width="600"/>
  </a>
</p>

**Quick Demo:** See Vector Inspector in action!

## Overview

Vector Inspector bridges the gap between vector databases and user-friendly data exploration tools. While vector databases are powerful for semantic search and AI applications, they often lack the intuitive inspection and management tools that traditional SQL databases have. This project aims to provide that missing layer.

---

## Homepage
[https://vector-inspector.divinedevops.com](https://vector-inspector.divinedevops.com)


# 🟦 Installation

## Quick Install (recommended)

These installers work on **macOS, Linux, and Windows (PowerShell or Git Bash)**.

### macOS & Linux
```
curl -fsSL https://vector-inspector.divinedevops.com/install.sh | bash
```

### Windows (PowerShell)
```
powershell -c "iwr https://vector-inspector.divinedevops.com/install.ps1 -UseBasicParsing | iex"
```

### Windows (Git Bash)
```
curl -fsSL https://vector-inspector.divinedevops.com/install.sh | bash
```

These scripts:

- install Vector Inspector  
- create a desktop shortcut  
- launch the app immediately  

This is the easiest and most reliable way to get started.

## From PyPI

```bash
pip install vector-inspector
vector-inspector
```

## From a Downloaded Wheel or Tarball (e.g., GitHub Release)

Download the `.whl` or `.tar.gz` file from the [GitHub Releases](https://github.com/anthonypdawson/vector-inspector/releases) page, then install with:

```bash
pip install <your-filename.whl>
# or
pip install <your-filename.tar.gz>
```

After installation, run the application with:

```bash
vector-inspector
```
Note: pip install does **not** create a desktop shortcut.  
Use the bootstrap installer for the full experience.

### From Source

```bash
# Clone the repository
git clone https://github.com/anthonypdawson/vector-inspector.git
cd vector-inspector

# Install dependencies using PDM
pdm install

# Launch application
scripts/run.sh     # Linux/macOS
scripts/run.bat    # Windows
```
---

# 🟩 Running Vector Inspector

```
vector-inspector
```
> Note: The Quick Install script launches the app automatically. If you installed via pip or from source, use the command above.
> This opens the full desktop application.

### Optional LLM runtime (llama-cpp-python)
llama-cpp-python is optional and only needed for the in-process LLM provider (`llama-cpp`).

- Install via PDM optional-dependency group (developer / recommended):

```bash
pdm install -G llm
```

- Platform-specific pip install (end users / PyPI):

Windows (pre-built CPU wheel index):
```powershell
pip install llama-cpp-python --prefer-binary \
  --extra-index-url https://abetlen.github.io/llama-cpp-python/whl/cpu
```

Linux / macOS (official wheels / source build):
```bash
pip install llama-cpp-python
```

CUDA / GPU wheels (pick matching CUDA version):
```bash
pip install llama-cpp-python --prefer-binary \
  --extra-index-url https://abetlen.github.io/llama-cpp-python/whl/cu121
```

- Scripts included in this repo:
  - `scripts/install-llm-windows.ps1` — PowerShell helper for Windows
  - `scripts/install-llm-unix.sh` — bash helper for Linux/macOS

Notes:
- If you cannot build native wheels on Windows, use the Windows pre-built index above.
- **Vector Studio users** can use **Settings → LLM → "Download default model"** to automatically download the default Phi-3-mini GGUF model into the local cache. This button is disabled in the free Vector Inspector tier.
- **Free-tier users** should download a GGUF model manually (or use the scripts above) and set the path in Settings → LLM, or configure Ollama (local server) or an OpenAI-compatible API instead.

---
## Table of Contents

- [Key Features](#key-features)
- [Architecture](#architecture)
- [Use Cases](#use-cases)
- [Feature Access](#feature-access)
- [Roadmap](#roadmap)
- [Configuration](#configuration)
- [Development Setup](#development-setup)
- [Contributing](#contributing)
- [License](#license)
- [Acknowledgments](#acknowledgments)

## Key Features

> **Note:** Some features listed below may be not started or currently in progress.

### 1. **Multi-Provider Support**
- Connect to vector databases:
  - ChromaDB (persistent local storage)
  - Qdrant (remote server or embedded local)
  - Pinecone (cloud-hosted)
  - Milvus (remote server or Milvus Lite) (Only on MacOs/Linux, experimental) - In Progress
  - LanceDB (persistent local storage) — requires `lancedb>=0.27.0`, `pyarrow>=14.0.0`
  - PgVector/PostgreSQL (remote server)
  - Weaviate (Local/Remote + WCD/Cloud)
- Unified interface regardless of backend provider
- Automatically saves last connection configuration
- Secure API key storage for cloud providers

### 2. **Data Visualization**
- **Metadata Explorer**: Browse and filter vector entries by metadata fields
- **Vector Dimensionality Reduction**: Visualize high-dimensional vectors in 2D/3D using:
  - t-SNE
  - UMAP
  - PCA
- **Cluster Visualization**: Color-code vectors by metadata categories or clustering results
- **Interactive Plots**: Zoom, pan, and select vectors for detailed inspection
- **Data Distribution Charts**: Histograms and statistics for metadata fields

### 3. **Search & Query Interface**
- **Similarity Search**: 
  - Text-to-vector search (with embedding model integration)
  - Vector-to-vector search
  - Find similar items to selected entries
  - Adjustable top-k results and similarity thresholds
- **Metadata Filtering**:
  - SQL-like query builder for metadata
  - Combine vector similarity with metadata filters
  - Advanced filtering: ranges, IN clauses, pattern matching
- **Hybrid Search**: Combine semantic search with keyword search
- **Query History**: Save and reuse frequent queries

### 4. **Data Management**
- **Browse Collections/Indexes**: View all available collections with statistics
- **CRUD Operations**:
  - View individual vectors and their metadata
  - Add new vectors (with auto-embedding options)
  - Update metadata fields
  - Delete vectors (single or batch)
- **Bulk Import/Export**:
  - Import from CSV, JSON, Parquet
  - Export query results to various formats
  - Backup and restore collections
- **Schema Inspector**: View collection configuration, vector dimensions, metadata schema

### 5. **SQL-Like Experience**
- **Query Console**: Write queries in a familiar SQL-like syntax (where supported)
- **Results Grid**: 
  - Sortable, filterable table view
  - Pagination for large result sets
  - Column customization
- **Data Inspector**: Click any row to see full details including raw vector
- **Query Execution Plans**: Understand how queries are executed
- **Auto-completion**: Intelligent suggestions for collection names, fields, and operations

### 6. **Advanced Features**
- **Embedding Model Integration**:
  - Use OpenAI, Cohere, HuggingFace models for text-to-vector conversion
  - Local model support (sentence-transformers)
  - Custom model integration
- **Vector Analysis**:
  - Compute similarity matrices
  - Identify outliers and anomalies
  - Cluster analysis with k-means, DBSCAN
- **Embedding Inspector**:
  - For similar collections or items, automatically identify which vector dimensions (activations) most contribute to the similarity
  - Map key activations to interpretable concepts (e.g., 'humor', 'sadness', 'anger') using metadata or labels
  - Generate human-readable explanations for why items are similar
- **Performance Monitoring**:
  - Query latency tracking
  - Index performance metrics
  - Connection health monitoring

## Architecture

Vector Inspector is built with PySide6 (Qt for Python) for the GUI, providing a native desktop experience. The backend uses Python with support for multiple vector database providers through a unified interface.

For detailed architecture information, see [docs/architecture.md](docs/architecture.md).

## Use Cases

1. **AI/ML Development**: Inspect embeddings generated during model development
2. **RAG System Debugging**: Verify what documents are being retrieved
3. **Data Quality Assurance**: Identify poorly embedded or outlier vectors
4. **Production Monitoring**: Check vector database health and data consistency
5. **Data Migration**: Transfer data between vector database providers
6. **Education**: Learn and experiment with vector databases interactively

## Feature Access

Vector Inspector follows a user-friendly monetization model:

- **All vector database providers are free** — Try the full app with any database
- **Core workflows remain free** — Connect, browse, search, visualize, and manage your data
- **Pro adds power tools** — Advanced analytics, enterprise formats, workflow automation, and collaboration

**Nothing currently in Free will ever move to Pro.** See [FEATURES.md](docs/FEATURES.md) for a detailed comparison.

## Roadmap

**Current Status**: ✅ Phase 2 Complete

See [ROADMAP.md](docs/ROADMAP.md) for the complete development roadmap and planned features.


## Configuration

Paths are resolved relative to the project root (where `pyproject.toml` is). For example, entering `./data/chroma_db` will use the absolute path resolved from the project root.

The application automatically saves your last connection configuration to `~/.vector-inspector/settings.json`. The next time you launch the application, it will attempt to reconnect using the last saved settings.

Example settings structure:
```json
{
  "last_connection": {
    "provider": "chromadb",
    "connection_type": "persistent",
    "path": "./data/chroma_db"
  }
}
```

## Development Setup

```bash
# Install PDM if you haven't already
pip install pdm

# Install dependencies with development tools (PDM will create venv automatically)
pdm install -d

# Run tests
pdm run pytest

# Run application in development mode
./run.sh     # Linux/macOS
./run.bat    # Windows

# Or use Python module directly from src directory:
cd src
pdm run python -m vector_inspector
```

## Contributing

Contributions are welcome! Areas where help is needed:
- Additional vector database provider integrations
- UI/UX improvements
- Performance optimizations
- Documentation
- Test coverage

Please see [CONTRIBUTING.md](CONTRIBUTING.md) for guidelines.

## License

MIT License - See [LICENSE](LICENSE) file for details.

## Acknowledgments

This project draws inspiration from:
- DBeaver (SQL database viewer)
- MongoDB Compass (NoSQL database GUI)
- Pinecone Console
- Various vector database management tools

---


See [CHANGELOG.md](CHANGELOG.md) for the latest status and what's new in each release.

See [GETTING_STARTED.md](GETTING_STARTED.md) for usage instructions and [IMPLEMENTATION_SUMMARY.md](IMPLEMENTATION_SUMMARY.md) for technical details.

**Contact**: Anthony Dawson
