Metadata-Version: 2.4
Name: oboyu
Version: 0.1.0a4
Summary: A Japanese-enhanced semantic search system for your local documents.
Author-email: sonesuke <iamsonesuke@gmail.com>
License: MIT
License-File: LICENSE.md
Keywords: document-search,japanese,semantic-search,text-processing,vector-search
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Natural Language :: Japanese
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Text Processing :: Indexing
Requires-Python: >=3.13
Requires-Dist: chardet>=5.2.0
Requires-Dist: charset-normalizer>=3.4.2
Requires-Dist: cryptography>=45.0.4
Requires-Dist: duckdb>=1.3.0
Requires-Dist: fasttext>=0.9.3
Requires-Dist: ftfy>=6.1.0
Requires-Dist: fugashi>=1.4.3
Requires-Dist: gitignore-parser>=0.1.12
Requires-Dist: huggingface-hub>=0.27.0
Requires-Dist: instructor>=1.8.0
Requires-Dist: jaconv>=0.4.0
Requires-Dist: llama-cpp-python>=0.3.9
Requires-Dist: mcp[cli]>=1.9.2
Requires-Dist: mojimoji>=0.0.12
Requires-Dist: neologdn>=0.5.2
Requires-Dist: numpy>=2.2.6
Requires-Dist: onnx>=1.18.0
Requires-Dist: onnxruntime>=1.22.0
Requires-Dist: optimum>=1.25.3
Requires-Dist: pandas>=2.2.3
Requires-Dist: protobuf>=6.31.1
Requires-Dist: pydantic>=2.11.5
Requires-Dist: pymupdf4llm>=0.0.25
Requires-Dist: python-frontmatter>=1.1.0
Requires-Dist: pyyaml>=6.0.2
Requires-Dist: rich<14.0.0,>=13.7.0
Requires-Dist: sentence-transformers>=4.1.0
Requires-Dist: sentencepiece>=0.2.0
Requires-Dist: torch>=2.7.0
Requires-Dist: transformers>=4.52.4
Requires-Dist: typer>=0.16.0
Requires-Dist: unidic-lite>=1.0.8
Requires-Dist: xdg-base-dirs>=6.0.2
Description-Content-Type: text/markdown

# Oboyu (覚ゆ)

[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
[![Python Version](https://img.shields.io/badge/python-3.13%2B-blue)](https://www.python.org/downloads/)
[![PyPI Version](https://img.shields.io/pypi/v/oboyu.svg)](https://pypi.org/project/oboyu/)

> Lightning-fast semantic search for your local documents with best-in-class Japanese support.

![demo](https://github.com/sonesuke/oboyu/blob/main/docs/assets/demo.gif?raw=true)

## What is Oboyu?

**Oboyu** (覚ゆ - "to remember" in ancient Japanese) is a powerful local semantic search engine that helps you instantly find information in your documents using natural language queries. Unlike traditional keyword search, Oboyu understands the meaning behind your questions, making it perfect for finding relevant content even when you don't know the exact terms.

### Why Oboyu?

- 🚀 **Fast**: Indexes thousands of documents in seconds, searches in milliseconds
- 🎯 **Accurate**: Semantic search finds what you mean, not just what you type
- 🇯🇵 **Japanese Excellence**: First-class support with automatic encoding detection
- 🔒 **Private**: Everything runs locally - your documents never leave your machine
- 🤖 **AI-Ready**: Built-in MCP server for Claude, Cursor, and other AI assistants


## Quick Start

### Prerequisites

- Python 3.13 or higher (3.11+ supported)
- pip (latest version recommended)
- Operating System: Linux, macOS, or Windows with WSL

#### System Dependencies (for building from source)

**Linux (Ubuntu/Debian):**
```bash
sudo apt-get install -y \
    git \
    curl \
    build-essential \
    cmake \
    pkg-config \
    libfreetype6-dev \
    libfontconfig1-dev \
    libjpeg-dev \
    libpng-dev \
    zlib1g-dev \
    libssl-dev
```

**Linux (CentOS/RHEL):**
```bash
sudo yum install -y \
    git \
    curl \
    gcc-c++ \
    cmake \
    pkg-config \
    freetype-devel \
    fontconfig-devel \
    libjpeg-devel \
    libpng-devel \
    zlib-devel \
    openssl-devel
```

**macOS:**
```bash
# Install Xcode Command Line Tools
xcode-select --install

# Install additional dependencies via Homebrew
brew install cmake pkg-config
```

### Installation

Get up and running in under 5 minutes:

```bash
# Install Oboyu
pip install oboyu

# Index your documents
oboyu index ~/Documents

# Search your documents
oboyu search "your search term"
```

That's it! See our [Documentation](https://sonesuke.github.io/oboyu/) for complete guides and examples.

## Key Features

### 🔍 Advanced Search Capabilities
- **Hybrid Search**: Combines semantic understanding with keyword matching for best results
- **Multiple Modes**: Switch between semantic, keyword, or hybrid search modes
- **Smart Reranking**: Built-in AI reranker improves result accuracy
- **Flexible Querying**: Command-line search with various output formats

### 📚 Document Support
- **Rich Format Support**: PDF documents, plain text (.txt), Markdown (.md), HTML (.html), and source code files (.py, .java, etc.)
- **PDF Processing**: Full text extraction with metadata preservation from PDF documents
- **Incremental Indexing**: Only process new or changed files for lightning-fast updates
- **Smart Chunking**: Intelligent document splitting for optimal search results
- **Automatic Encoding**: Handles various text encodings seamlessly (UTF-8, Shift-JIS, EUC-JP, and more)

### 🇯🇵 Japanese Language Excellence
- **Native Support**: Purpose-built for Japanese text processing
- **Automatic Detection**: Detects and handles Shift-JIS, EUC-JP, and UTF-8
- **Specialized Models**: Optimized embedding models for Japanese content
- **Mixed Language**: Seamlessly handles Japanese and English in the same document

### 🚀 Performance & Integration
- **ONNX Acceleration**: 2-4x faster with automatic model optimization
- **MCP Server**: Direct integration with Claude Desktop and AI coding assistants
- **Rich CLI**: Beautiful terminal interface with progress tracking
- **Low Memory**: Efficient processing even on modest hardware

## Installation

### Using UV (Recommended)
```bash
uv tool install oboyu
```

### Using pip
```bash
pip install oboyu
```

### From Source
```bash
git clone https://github.com/sonesuke/oboyu.git
cd oboyu
pip install -e .
```

### System Requirements

- **Python**: 3.13 or higher (3.11+ supported)
- **OS**: macOS, Linux (Windows via WSL)
- **Memory**: 2GB RAM minimum (4GB recommended)
- **Storage**: 1GB for models and index
- **Build Tools**: See system dependencies above if building from source

> **Note**: Models are automatically downloaded on first use (~90MB).
> For installation from PyPI, most system dependencies are not required as we provide pre-built wheels.

## Usage Examples

### Basic Usage

```bash
# Index a directory
oboyu index ~/Documents/notes

# Search your documents
oboyu search "machine learning optimization techniques"

# Get results in JSON format for processing
oboyu search "machine learning" --format json
```

### Advanced Examples

```bash
# Index only specific file types
oboyu index ~/projects --include-patterns "*.md,*.txt"

# Search with different modes
oboyu search "API design" --mode vector

# Use semantic search mode
oboyu search "concepts similar to dependency injection" --mode semantic

# Enable reranking for better accuracy
oboyu search "complex technical topic" --rerank
```

### MCP Server for AI Assistants

```bash
# Start MCP server
oboyu mcp

# Or configure in Claude Desktop's settings
```

See our [MCP Integration Guide](https://sonesuke.github.io/oboyu/integration/mcp-integration) for detailed setup instructions.

## Documentation

### 🚀 Getting Started
- [**Installation**](https://sonesuke.github.io/oboyu/getting-started/installation) - Install and verify setup
- [**Your First Index**](https://sonesuke.github.io/oboyu/getting-started/first-index) - Create your first searchable index
- [**Your First Search**](https://sonesuke.github.io/oboyu/getting-started/first-search) - Learn to search effectively

### 💼 Real-world Usage
- [**Daily Workflows**](https://sonesuke.github.io/oboyu/usage-examples/basic-workflow) - Essential daily patterns
- [**Technical Documentation**](https://sonesuke.github.io/oboyu/real-world-scenarios/technical-docs) - Code and API docs
- [**Meeting Notes**](https://sonesuke.github.io/oboyu/real-world-scenarios/meeting-notes) - Track decisions and actions
- [**Research Papers**](https://sonesuke.github.io/oboyu/real-world-scenarios/research-papers) - Academic content search

### ⚙️ Configuration & Optimization
- [**Configuration Guide**](https://sonesuke.github.io/oboyu/configuration-optimization/configuration) - Customize for your needs
- [**Performance Tuning**](https://sonesuke.github.io/oboyu/configuration-optimization/performance-tuning) - Optimize speed and quality
- [**Japanese Support**](https://sonesuke.github.io/oboyu/reference-troubleshooting/japanese-support) - Japanese language features

### 🔗 Integration & Reference
- [**Claude MCP Integration**](https://sonesuke.github.io/oboyu/integration/mcp-integration) - AI-powered search
- [**CLI Reference**](https://sonesuke.github.io/oboyu/reference-troubleshooting/cli-reference) - All commands and options
- [**Troubleshooting**](https://sonesuke.github.io/oboyu/reference-troubleshooting/troubleshooting) - Solutions to common issues

**[📖 View Full Documentation →](https://sonesuke.github.io/oboyu/)**

## Common Use Cases

### 📚 Academic Research
Index and search through research notes and references:
```bash
oboyu index ~/research --include "*.md,*.txt"
oboyu search "transformer architecture improvements"
```

### 💻 Code Documentation
Search through project documentation and code comments:
```bash
oboyu index ~/projects/myapp --include "*.md,*.py"
oboyu search "authentication implementation"
```

### 📝 Personal Knowledge Base
Organize and search your notes and documents:
```bash
oboyu index ~/Documents/notes
oboyu search "meeting notes from last week"
```

### 🌏 Multilingual Documents
Perfect for mixed Japanese and English content:
```bash
oboyu index ~/Documents/bilingual
oboyu search "プロジェクト管理 best practices"
```

## Testing

### Unit and Integration Tests

```bash
# Run fast tests (recommended for development)
uv run pytest -m "not slow"

# Run all tests with coverage
uv run pytest --cov=src
```

### E2E Display Testing

Oboyu includes comprehensive E2E display testing using Claude Code SDK:

```bash
# Run all E2E display tests
python e2e/run_tests.py

# Run specific test category
python e2e/run_tests.py --test search
```

See our [Full Documentation](https://sonesuke.github.io/oboyu/) for more details.

## Contributing

We welcome contributions! See our [Contributing Guidelines](CONTRIBUTING.md) for details.

```bash
# Quick start for contributors
git clone https://github.com/YOUR_USERNAME/oboyu.git
cd oboyu
uv sync
uv run pytest -m "not slow"
```

## Support

- 📋 [GitHub Issues](https://github.com/sonesuke/oboyu/issues) - Report bugs or request features
- 📖 [Documentation](https://sonesuke.github.io/oboyu/) - Comprehensive guides and references
- 💬 [Discussions](https://github.com/sonesuke/oboyu/discussions) - Ask questions and share ideas

## License

This project is licensed under the MIT License - see the [LICENSE.md](LICENSE.md) file for details.

## Acknowledgments

- The name "Oboyu" (覚ゆ) comes from ancient Japanese, meaning "to remember"
- Built with ❤️ for the Japanese NLP community
- Inspired by the goal of making knowledge accessible across languages

---

<p align="center">
  Made with 🇯🇵 by <a href="https://github.com/sonesuke">sonesuke</a>
</p>
