Metadata-Version: 2.4
Name: flamehaven-filesearch
Version: 1.2.0
Summary: FLAMEHAVEN FileSearch - Open source semantic document search with API authentication powered by Google Gemini
Author-email: FLAMEHAVEN <info@flamehaven.space>
Project-URL: Homepage, https://github.com/flamehaven01/Flamehaven-Filesearch
Project-URL: Documentation, https://github.com/flamehaven01/Flamehaven-Filesearch#readme
Project-URL: Repository, https://github.com/flamehaven01/Flamehaven-Filesearch
Project-URL: Bug Tracker, https://github.com/flamehaven01/Flamehaven-Filesearch/issues
Keywords: flamehaven,file-search,document-search,RAG,gemini,google-ai,file-retrieval,semantic-search
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: pydantic>=2.0.0
Provides-Extra: api
Requires-Dist: fastapi>=0.121.1; extra == "api"
Requires-Dist: uvicorn[standard]>=0.24.0; extra == "api"
Requires-Dist: python-multipart>=0.0.6; extra == "api"
Requires-Dist: slowapi>=0.1.9; extra == "api"
Requires-Dist: requests>=2.31.0; extra == "api"
Requires-Dist: psutil>=5.9.0; extra == "api"
Requires-Dist: python-json-logger>=2.0.0; extra == "api"
Requires-Dist: cachetools>=5.3.0; extra == "api"
Requires-Dist: prometheus-client>=0.19.0; extra == "api"
Provides-Extra: dev
Requires-Dist: pytest>=7.0.0; extra == "dev"
Requires-Dist: pytest-cov>=4.0.0; extra == "dev"
Requires-Dist: pytest-asyncio>=0.21.0; extra == "dev"
Requires-Dist: black>=23.0.0; extra == "dev"
Requires-Dist: flake8>=6.0.0; extra == "dev"
Requires-Dist: mypy>=1.0.0; extra == "dev"
Requires-Dist: isort>=5.12.0; extra == "dev"
Requires-Dist: httpx>=0.24.1; extra == "dev"
Provides-Extra: google
Requires-Dist: google-genai>=0.2.0; extra == "google"
Provides-Extra: security
Requires-Dist: bandit>=1.7.0; extra == "security"
Requires-Dist: safety>=3.0.0; extra == "security"
Provides-Extra: all
Requires-Dist: flamehaven-filesearch[api,dev,google]; extra == "all"
Dynamic: license-file

# FLAMEHAVEN FileSearch

> **Your documents. Searchable in minutes. No infrastructure needed.**

<div align="center">

**Search your local documents with RAG instantly**

[![CI/CD](https://img.shields.io/badge/Status-Production%20Ready-brightgreen)](https://github.com/flamehaven01/Flamehaven-Filesearch)
[![Latest Version](https://img.shields.io/badge/Version-v1.2.0-blue)](CHANGELOG.md)
[![Python 3.8+](https://img.shields.io/badge/Python-3.8%2B-blue)](https://www.python.org/)
[![MIT License](https://img.shields.io/badge/License-MIT-yellow)](LICENSE)

[Quick Start (3 min)](#-3-minute-quick-start) • [Documentation](DEPLOYMENT_GUIDE_v1.2.0.md) • [Roadmap](#-roadmap) • [Contributing](CONTRIBUTING.md)

</div>

---

## Problem: Your Situation

```
✗ You have PDF, Word, and text documents scattered across your local drive
✗ You want to search them intelligently, not keyword-by-keyword
✗ You don't want to upload your data to external services (Pinecone, Cloudflare, etc.)
✗ You need production-ready security without complex setup
✗ You want zero infrastructure costs for prototype phase
```

---

## Solution: FLAMEHAVEN FileSearch

```
✓ Local RAG search engine in 5 minutes
✓ 100% self-hosted (your data stays yours)
✓ Single Docker command deployment
✓ Free tier Google Gemini (up to 1500 queries/month)
✓ v1.2.0: Enterprise-grade authentication & multi-user support
✓ Batch search API for 1-100 queries per request
✓ Optional Redis for distributed caching across workers
```

---

## [>] 3-Minute Quick Start

### 1. Docker (No Setup)

```bash
# Start with one command
docker run -d \
  -e GEMINI_API_KEY="your_gemini_api_key" \
  -p 8000:8000 \
  -v $(pwd)/data:/app/data \
  flamehaven-filesearch:1.2.0

# Available at http://localhost:8000 in 3 seconds
```

### 2. Your First Search (cURL)

```bash
# Step 1: Generate API key (v1.2.0 requirement)
curl -X POST http://localhost:8000/api/admin/keys \
  -H "X-Admin-Key: your_admin_key" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "My First Key",
    "permissions": ["upload", "search"]
  }'

# Response: {"key_id": "...", "plain_key": "sk_live_abc123..."}
# SAVE this key - it won't be shown again!

# Step 2: Upload a document
curl -X POST http://localhost:8000/api/upload/single \
  -H "Authorization: Bearer sk_live_abc123..." \
  -F "file=@sample.pdf" \
  -F "store=documents"

# Step 3: Search
curl -X POST http://localhost:8000/api/search \
  -H "Authorization: Bearer sk_live_abc123..." \
  -H "Content-Type: application/json" \
  -d '{
    "query": "What are the main points in this document?",
    "store": "documents"
  }'

# Response:
# {
#   "answer": "The main points are...",
#   "sources": [
#     {
#       "file": "sample.pdf",
#       "page": 3,
#       "excerpt": "..."
#     }
#   ]
# }
```

### 3. Python Code Example

```python
from flamehaven_filesearch import FlamehavenFileSearch, FileSearchConfig

# Configuration
config = FileSearchConfig(
    google_api_key="your_gemini_api_key",
    environment="offline"  # or "remote" for online
)

# Initialize
searcher = FlamehavenFileSearch(config)

# Create document store
searcher.create_store("my_documents")

# Upload files
searcher.upload_file("path/to/document.pdf", "my_documents")
searcher.upload_file("path/to/document.docx", "my_documents")

# Search with RAG
result = searcher.search(
    "Summarize the key findings",
    store="my_documents"
)

print(f"Answer: {result['answer']}")
print(f"Sources: {result['sources']}")
```

---

## Features

### Core Features (v1.1.0+)

| Feature | Description | Benefit |
|---------|-------------|---------|
| **Multi-format Support** | PDF, DOCX, MD, TXT (up to 50MB) | Works with all document types |
| **Semantic Search** | AI-powered natural language queries | Far more accurate than keyword search |
| **Source Attribution** | Original documents linked to answers | Transparency & trustworthiness |
| **Store Management** | Organize documents into collections | Structured search across categories |
| **REST API + Python SDK** | Two ways to integrate | Flexible deployment options |
| **LRU Cache** | 1-hour TTL, 1000 items | <10ms response on cache hits |
| **Prometheus Metrics** | 17 monitoring metrics | Operations visibility |
| **Security Headers** | OWASP-compliant | Enterprise-ready |

### New in v1.2.0 (Enterprise-Grade)

| Feature | Description | Use Case |
|---------|-------------|----------|
| **[*] API Key Authentication** | Bearer token OAuth2-compatible | Multi-user environments |
| **[*] Key Management API** | Programmatic key creation/revocation | Automated user provisioning |
| **[*] Audit Logging** | Complete request audit trail | Compliance & security |
| **[*] Per-Key Rate Limiting** | Customizable limits per API key | Fair resource allocation |
| **[*] Admin Dashboard** | Web UI for key management | User-friendly operations |
| **[&] Batch Search API** | Process 1-100 queries per request | High-throughput scenarios |
| **[&] Redis Cache Backend** | Distributed caching | Multi-worker deployments |

---

## Installation

### Option 1: Pip (Recommended for Development)

```bash
# Core functionality
pip install flamehaven-filesearch

# With API server
pip install flamehaven-filesearch[api]

# With Redis support
pip install flamehaven-filesearch[api,redis]

# Everything (for development)
pip install flamehaven-filesearch[all]
```

### Option 2: Docker (Recommended for Production)

```bash
# Build image
docker build -t flamehaven-filesearch:1.2.0 .

# Run standalone
docker run \
  -e GEMINI_API_KEY="your_key" \
  -e FLAMEHAVEN_ADMIN_KEY="your_admin_key" \
  -p 8000:8000 \
  flamehaven-filesearch:1.2.0

# With Docker Compose (includes Redis)
docker-compose up -d
```

### Option 3: Kubernetes (For High Availability)

See [DEPLOYMENT_GUIDE_v1.2.0.md](DEPLOYMENT_GUIDE_v1.2.0.md) for complete Kubernetes manifests with StatefulSets, ConfigMaps, and Secrets.

---

## Configuration

### Required Environment Variables

```bash
# Google Gemini API key (get free tier at ai.google.dev)
export GEMINI_API_KEY="your_api_key"

# Admin key for creating API keys in v1.2.0
export FLAMEHAVEN_ADMIN_KEY="your_secure_admin_password"
```

### Optional Environment Variables

```bash
# Server settings
export HOST="0.0.0.0"              # Default: 127.0.0.1
export PORT="8000"                  # Default: 8000
export ENVIRONMENT="production"     # Default: development

# Database location
export FLAMEHAVEN_API_KEYS_DB="/path/to/api_keys.db"

# Redis (for multi-worker deployments)
export REDIS_HOST="localhost"       # Default: localhost
export REDIS_PORT="6379"            # Default: 6379
export REDIS_PASSWORD="password"    # Optional

# API limits
export MAX_FILE_SIZE_MB="50"        # Default: 50
```

---

## API Key Management (v1.2.0)

### Generate API Key

```bash
curl -X POST http://localhost:8000/api/admin/keys \
  -H "X-Admin-Key: $FLAMEHAVEN_ADMIN_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Production API Key",
    "permissions": ["upload", "search", "stores", "delete"],
    "rate_limit_per_minute": 100
  }'
```

### List Your Keys

```bash
curl http://localhost:8000/api/admin/keys \
  -H "Authorization: Bearer sk_live_your_key..."
```

### Revoke a Key

```bash
curl -X DELETE http://localhost:8000/api/admin/keys/{key_id} \
  -H "Authorization: Bearer sk_live_your_key..."
```

### View Usage Statistics

```bash
curl "http://localhost:8000/api/admin/usage?days=7" \
  -H "Authorization: Bearer sk_live_your_key..."
```

---

## Performance

| Operation | Response Time | Throughput |
|-----------|---------------|------------|
| Health check | <10ms | - |
| Search (cache hit) | <10ms | >2 searches/sec |
| Search (cache miss) | 500ms - 3s | Depends on Gemini API |
| Batch search (10 queries) | 2-5s | >1 batch/sec |
| File upload | 1-5s | >1 file/sec |

### Resource Usage

- **Memory:** ~200MB baseline + 50MB per 1000 cached queries
- **CPU:** <5% idle, <50% under sustained load
- **Disk:** 10MB per 100,000 cached queries

---

## Security Features (v1.2.0)

- **[#] API Key Encryption:** SHA256 hashing (plain keys never stored)
- **[#] Rate Limiting:** Per-API-key customizable limits (default 100/min)
- **[#] Permission Control:** Fine-grained access (upload, search, stores, delete)
- **[#] Audit Logging:** Complete request trail with timestamps
- **[#] OWASP Headers:** Security headers on all responses
- **[#] Request Tracking:** Unique request IDs for debugging
- **[#] Key Expiration:** Optional time-limited API keys

---

## Roadmap

### v1.2.1 (Q4 2025)
- [ ] Improved admin authentication (IAM integration)
- [ ] Redis UI configuration in dashboard
- [ ] Encryption at rest for sensitive data
- [ ] Fix deprecated FastAPI `on_event` decorators

### v1.3.0 (Q1 2026)
- [ ] OAuth2/OIDC integration
- [ ] API key rotation
- [ ] Billing/metering system
- [ ] Advanced analytics dashboard

### v2.0.0 (Q2 2026)
- [ ] Multi-language support
- [ ] Enhanced file type support (XLSX, PPTX, RTF)
- [ ] Export search results (JSON, CSV, PDF)
- [ ] WebSocket streaming support

---

## Good First Issues (Contributions Welcome!)

Looking to contribute? Start with these issues:

1. **Add XLSX Support** (Difficulty: Medium)
   - Extend file upload to handle Excel spreadsheets
   - Reference: `flamehaven_filesearch/loaders.py`
   - Estimated time: 2-3 hours

2. **Implement Search Result Caching UI** (Difficulty: Easy)
   - Display cache hit rate in admin dashboard
   - Reference: `flamehaven_filesearch/dashboard.py`
   - Estimated time: 1-2 hours

3. **Add Batch Search Progress Tracking** (Difficulty: Medium)
   - Implement WebSocket endpoint for real-time batch progress
   - Reference: `flamehaven_filesearch/batch_routes.py`
   - Estimated time: 3-4 hours

4. **Create Integration Tests** (Difficulty: Easy)
   - Write end-to-end tests for common workflows
   - Reference: `tests/test_api_integration.py`
   - Estimated time: 2-3 hours

5. **Add Dark Mode to Admin Dashboard** (Difficulty: Easy)
   - Extend dashboard.py with CSS dark theme toggle
   - Reference: `flamehaven_filesearch/dashboard.py`
   - Estimated time: 1-2 hours

See [CONTRIBUTING.md](CONTRIBUTING.md) for complete contribution guidelines.

---

## Troubleshooting

### Issue: 401 Unauthorized on API requests

**Solution:**
- Verify FLAMEHAVEN_ADMIN_KEY environment variable is set
- Check Authorization header format: `Authorization: Bearer sk_live_your_key`
- Ensure API key hasn't expired

### Issue: "Redis connection refused"

**Solution:**
- Verify Redis is running: `redis-cli ping`
- Check REDIS_HOST and REDIS_PORT environment variables
- Remove Redis if not needed (system falls back to LRU cache)

### Issue: High memory usage

**Solution:**
- Configure Redis eviction policy: `maxmemory-policy allkeys-lru`
- Reduce cache TTL in configuration
- Monitor cache metrics: `curl http://localhost:8000/prometheus | grep cache`

### Issue: Slow searches

**Solution:**
- Check if caching is working (view `cache_hits_total` metric)
- Verify Gemini API is responsive
- Check network latency to Redis instance

---

## Migration from v1.1.0 to v1.2.0

### Breaking Changes

**All protected endpoints now require API key authentication.**

### Migration Steps

1. **Update dependencies:**
   ```bash
   pip install -U flamehaven-filesearch[api]
   ```

2. **Set admin key:**
   ```bash
   export FLAMEHAVEN_ADMIN_KEY="your_secure_admin_key"
   ```

3. **Generate first API key:**
   ```bash
   curl -X POST http://localhost:8000/api/admin/keys \
     -H "X-Admin-Key: your_admin_key" \
     -H "Content-Type: application/json" \
     -d '{"name":"Production","permissions":["upload","search","stores","delete"]}'
   ```

4. **Update application code:**
   ```python
   # Old (v1.1.0)
   requests.post("http://localhost:8000/api/search", json={"query": "test"})

   # New (v1.2.0)
   api_key = "sk_live_your_key"
   requests.post(
       "http://localhost:8000/api/search",
       json={"query": "test"},
       headers={"Authorization": f"Bearer {api_key}"}
   )
   ```

5. **Test before production deployment**

### Rollback Plan

If issues occur, v1.1.0 remains available. No data loss on downgrade.

---

## Support & Community

- **Documentation:** [Deployment Guide](DEPLOYMENT_GUIDE_v1.2.0.md) | [Release Notes](RELEASE_NOTES_v1.2.0.md)
- **Issues & Bugs:** [GitHub Issues](https://github.com/flamehaven01/Flamehaven-Filesearch/issues)
- **Discussions:** [GitHub Discussions](https://github.com/flamehaven01/Flamehaven-Filesearch/discussions)
- **Security Issues:** security@flamehaven.space

---

## Architecture Overview

```
[Your Documents]
       |
       v
[File Upload Endpoint] ---> [File Parser] ---> [Store Manager]
       |                         |                    |
       +---- (REST API) --------+---- (SQLite DB)----+
       |
[Search Endpoint] ---> [Semantic Search] ---> [Gemini API]
       |                   |
       +--- (Cache) -------+
       |
[Prometheus Metrics] <--- [Audit Log]
```

---

## Performance Metrics

Recent v1.2.0 benchmark (Docker on M1 Mac):

```
Health Check:           8ms
Search (cache hit):     9ms
Search (cache miss):    1250ms
Batch Search (10 queries, parallel): 2500ms
Upload (50MB file):     3200ms
```

---

## License

FLAMEHAVEN FileSearch is released under the MIT License. See [LICENSE](LICENSE) file for details.

---

## Acknowledgments

Built with:
- **FastAPI** - Modern Python web framework
- **Google Gemini API** - Semantic understanding
- **SQLite** - Lightweight database
- **Redis** (optional) - Distributed caching

---

**Questions? Open an issue or email support@flamehaven.space**

**Last Updated:** November 16, 2025 (v1.2.0)
