Metadata-Version: 2.4
Name: commitmind
Version: 0.1.0
Summary: Semantic search for Git commit history, powered by TurboQuant vector compression
Project-URL: Homepage, https://github.com/wjddusrb03/commitmind
Project-URL: Repository, https://github.com/wjddusrb03/commitmind
Project-URL: Issues, https://github.com/wjddusrb03/commitmind/issues
Author: wjddusrb03
License: MIT
License-File: LICENSE
Keywords: cli,commit,developer-tools,embeddings,git,langchain,search,semantic-search,turboquant,vector-compression
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Software Development :: Version Control :: Git
Classifier: Topic :: Text Processing :: Indexing
Requires-Python: >=3.9
Requires-Dist: click>=8.0
Requires-Dist: gitpython>=3.1.0
Requires-Dist: langchain-turboquant>=0.1.0
Requires-Dist: numpy>=1.21
Requires-Dist: rich>=13.0
Requires-Dist: sentence-transformers>=2.2.0
Provides-Extra: dev
Requires-Dist: pytest-cov>=4.0; extra == 'dev'
Requires-Dist: pytest>=7.0; extra == 'dev'
Description-Content-Type: text/markdown

# CommitMind

**Semantic search for Git commit history, powered by TurboQuant vector compression (ICLR 2026).**

> Stop searching by keywords. Search by *meaning*.

[![PyPI version](https://img.shields.io/pypi/v/commitmind)](https://pypi.org/project/commitmind/)
[![Python 3.9+](https://img.shields.io/badge/python-3.9+-blue.svg)](https://www.python.org/downloads/)
[![License: MIT](https://img.shields.io/badge/License-MIT-green.svg)](LICENSE)

## The Problem

```bash
# Current: keyword matching only
git log --grep="memory leak"     # Only finds commits with exact text "memory leak"
                                  # Misses: "fix kfree_skb double free"
                                  # Misses: "plug UAF in reset path"
                                  # Misses: "resolve dangling pointer"
```

## The Solution

```bash
# CommitMind: semantic search
commitmind search "memory leak"
# >> #1 [0.94] a3f2c1d  Fix kfree_skb double free in netfilter
# >> #2 [0.91] b7e4a2f  Plug use-after-free in device reset path
# >> #3 [0.87] c9d1b3e  Resolve dangling pointer in slab allocator
```

CommitMind understands the **meaning** of your query and finds semantically related commits - even when the exact words don't match.

## How It Works

```
Git commits --> Sentence embeddings --> TurboQuant compression --> Semantic search
                (all-MiniLM-L6-v2)      (7.6x compression)       (asymmetric scoring)
```

1. **Extract** commit messages + file change metadata from git history
2. **Embed** each commit into a 384-dimensional vector (local model, no API needed)
3. **Compress** vectors with TurboQuant (Google's ICLR 2026 algorithm) - 87% memory savings
4. **Search** using asymmetric inner-product estimation (no decompression needed)

## Installation

```bash
pip install commitmind
```

Or install from source:

```bash
git clone https://github.com/wjddusrb03/commitmind.git
cd commitmind
pip install -e ".[dev]"
```

## Quick Start

```bash
# 1. Index your repository
cd your-project
commitmind index

# Output:
# Indexing complete!
#   > 3,842 commits indexed
#   > Compressed: 18.2 MB -> 2.4 MB (7.6x)
#   > Saved to .commitmind/index.pkl

# 2. Search by meaning
commitmind search "authentication bug fix"

# 3. View stats
commitmind stats
```

## CLI Commands

| Command | Description |
|---|---|
| `commitmind index` | Index commits with TurboQuant compression |
| `commitmind search "query"` | Semantic search over commits |
| `commitmind stats` | Show index statistics |
| `commitmind update` | Add new commits to existing index |

### Options

```bash
# Index with options
commitmind index --max-commits 1000    # Limit to recent 1000 commits
commitmind index --branch main         # Index specific branch
commitmind index --bits 2              # Use 2-bit quantization (more compression)

# Search with options
commitmind search "query" -k 10        # Return top 10 results
```

## Use Cases

- **New team member**: "What authentication changes were made recently?"
- **Bug tracking**: "Find commits related to network timeout issues"
- **Security audit**: "Show all SQL injection related fixes"
- **Code archaeology**: Search Linux kernel's 1M+ commits by meaning
- **Cross-language**: Search English commits with Korean queries (and vice versa)

## Memory Efficiency

Thanks to TurboQuant compression:

| Commits | Uncompressed | CommitMind | Savings |
|---|---|---|---|
| 1,000 | 1.5 MB | 0.2 MB | 87% |
| 10,000 | 15 MB | 2.0 MB | 87% |
| 100,000 | 150 MB | 20 MB | 87% |
| 1,000,000 | 1.5 GB | 200 MB | 87% |

## How TurboQuant Works

CommitMind uses [TurboQuant](https://openreview.net/forum?id=mMWatwUUkn) (Google Research, ICLR 2026):

1. **PolarQuant**: Random orthogonal rotation + Lloyd-Max scalar quantization (3-bit)
2. **QJL**: Quantized Johnson-Lindenstrauss residual correction (1-bit)
3. **Asymmetric scoring**: Compute similarity WITHOUT decompressing vectors

This achieves ~7.6x compression with minimal accuracy loss.

## Requirements

- Python 3.9+
- Git repository
- CPU only (no GPU required)
- ~500 MB disk for embedding model (downloaded once)

## Contributing

Issues and pull requests are welcome! If you find a bug or have suggestions, please [open an issue](https://github.com/wjddusrb03/commitmind/issues).

## License

MIT License

## Citation

If you use CommitMind in your research:

```bibtex
@software{commitmind2026,
  title={CommitMind: Semantic Git Commit Search with TurboQuant Compression},
  author={wjddusrb03},
  year={2026},
  url={https://github.com/wjddusrb03/commitmind}
}
```

## Related

- [langchain-turboquant](https://github.com/wjddusrb03/langchain-turboquant) - LangChain VectorStore with TurboQuant compression
- [TurboQuant paper](https://openreview.net/forum?id=mMWatwUUkn) - Original ICLR 2026 paper by Google Research
