Metadata-Version: 2.4
Name: justembed
Version: 0.1.0a1
Summary: A semantic engine that just works - offline-first semantic search for everyday laptops
Author-email: Krishnamoorthy Sankaran <your.email@example.com>
License: MIT
Project-URL: Homepage, https://github.com/sekarkrishna/justembed
Project-URL: Documentation, https://github.com/sekarkrishna/justembed/tree/main/docs
Project-URL: Repository, https://github.com/sekarkrishna/justembed
Project-URL: Issues, https://github.com/sekarkrishna/justembed/issues
Keywords: semantic-search,embeddings,offline,onnx,nlp,justembed
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Text Processing :: Linguistic
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: onnxruntime>=1.15.0
Requires-Dist: tokenizers>=0.13.0
Requires-Dist: numpy<2.0.0,>=1.20.0
Requires-Dist: polars>=0.19.0
Requires-Dist: pyarrow>=10.0.0
Requires-Dist: psutil>=5.9.0
Provides-Extra: dev
Requires-Dist: pytest>=7.4.0; extra == "dev"
Requires-Dist: pytest-cov>=4.1.0; extra == "dev"
Requires-Dist: hypothesis>=6.82.0; extra == "dev"
Requires-Dist: black>=23.7.0; extra == "dev"
Requires-Dist: ruff>=0.0.285; extra == "dev"
Requires-Dist: mypy>=1.5.0; extra == "dev"
Dynamic: license-file

# JustEmbed

**A semantic engine that just works.**

Offline-first semantic search for everyday laptops.

---

## ⚠️ Alpha Release

**This is v0.1.0a1**

Full functionality coming in v0.1.0 (expected: February 2026).

---

## What is JustEmbed?

JustEmbed is an offline-first semantic search library designed for everyday laptops. No cloud. No API keys. No telemetry. Just embed your documents and search.

### Philosophy

- **One model only**: multilingual-e5-small (100+ languages)
- **Offline-first**: Zero network dependencies
- **Just works**: No configuration, no choices, no surprises
- **Hardware-aware**: Automatic limits based on your laptop
- **Privacy-first**: Everything stays on your machine

---

## Planned Features (v0.1.0)

```python
import justembed as je

# Load documents
je.load("docs/")

# Generate embeddings
je.embed()

# Search semantically
results = je.search("fruits that are red in color")

# Check status
je.status()
```

### Core Features

- ✅ Single model (multilingual-e5-small.onnx)
- ✅ Offline-first (zero network dependencies)
- ✅ Python 3.8+ support
- ✅ Polars-based storage (Parquet files)
- ✅ Hardware-aware limits (2-3s soft, 10s hard)
- ✅ Text-only input
- ✅ Simple API (5 functions)

---

## Installation

```bash
pip install justembed
```

**Note:** v0.1.0a1 is a placeholder release. Full functionality coming soon!

---

## Requirements

- Python 3.8+
- ~340MB disk space (model + dependencies)
- 4GB+ RAM recommended

---

## Dependencies

- `onnxruntime` - ONNX inference
- `tokenizers` - Tokenization (standalone, not transformers!)
- `numpy` - Array operations
- `polars` - DataFrame operations
- `pyarrow` - Parquet I/O
- `psutil` - Hardware detection

**No pandas. No transformers. No network dependencies.**

---

## Roadmap

### v0.1.0a1 (Current) - Name Reservation
- ✅ Package name locked on PyPI
- ✅ Basic structure
- ⏳ Placeholder functions

### v0.1.0 (February 2026) - First Release
- ⏳ Full implementation
- ⏳ Working examples
- ⏳ Documentation
- ⏳ Tests

### v0.2.0 (Future)
- ⏳ Query caching
- ⏳ Batch operations
- ⏳ Advanced search options

---

## Why "JustEmbed"?

Because that's all you need to do:

1. **Just embed** your documents
2. **Just search** with natural language
3. **Just works** - no configuration needed

---

## Design Decisions

### One Model Only
We use **multilingual-e5-small.onnx** (384 dimensions, 100+ languages). No model zoo. No choices. No confusion.

### Offline-First
Zero network dependencies. Everything runs locally. No telemetry. No surprises.

### Hardware-Aware
Automatic limits based on your laptop's capabilities. Soft limit: 2-3s. Hard limit: 10s.

### Polars, Not Pandas
We use Polars for speed and efficiency. No pandas dependency.

### Tokenizers, Not Transformers
We use the standalone `tokenizers` library (3MB) instead of `transformers` (40MB). 93% smaller!

---

## Target Users

- Non-ML engineers learning AI for the first time
- Business users in paranoid/restricted environments
- Developers who need offline semantic search
- Anyone who wants a safe sandbox to experiment

---

## License

MIT License - see LICENSE file for details.

---

## Author

Krishnamoorthy Sankaran

---

## Links

- **GitHub**: https://github.com/sekarkrishna/justembed
- **PyPI**: https://pypi.org/project/justembed/
- **Issues**: https://github.com/sekarkrishna/justembed/issues

---

## Status

🚧 **Under Active Development** 🚧

This is an alpha release to reserve the package name. Full functionality coming in v0.1.0.

Stay tuned!

---

**JustEmbed - A semantic engine that just works.**
