Metadata-Version: 2.4
Name: clawbox
Version: 1.0.0
Summary: A minimal cloud file system for agents with semantic search capabilities
Author: Dayuan Tang, Xinhao Jiang
License: Apache-2.0
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: typer>=0.9.0
Requires-Dist: httpx>=0.25.0
Requires-Dist: rich>=13.0.0
Provides-Extra: server
Requires-Dist: alembic>=1.14.0; extra == "server"
Requires-Dist: fastapi>=0.109.0; extra == "server"
Requires-Dist: uvicorn[standard]>=0.27.0; extra == "server"
Requires-Dist: sqlalchemy>=2.0.0; extra == "server"
Requires-Dist: psycopg2-binary>=2.9.9; extra == "server"
Requires-Dist: pgvector>=0.2.4; extra == "server"
Requires-Dist: python-multipart>=0.0.6; extra == "server"
Requires-Dist: openai>=1.12.0; extra == "server"
Requires-Dist: pydantic>=2.5.0; extra == "server"
Requires-Dist: pydantic-settings>=2.1.0; extra == "server"
Requires-Dist: python-jose[cryptography]>=3.3.0; extra == "server"
Requires-Dist: aiofiles>=23.2.1; extra == "server"
Requires-Dist: boto3>=1.34.0; extra == "server"
Requires-Dist: pdfplumber>=0.10.0; extra == "server"
Requires-Dist: python-docx>=1.1.0; extra == "server"
Requires-Dist: authlib>=1.3.0; extra == "server"
Requires-Dist: itsdangerous>=2.1.0; extra == "server"
Requires-Dist: openpyxl>=3.1.0; extra == "server"
Requires-Dist: python-pptx>=1.0.0; extra == "server"
Requires-Dist: google-genai>=1.0.0; extra == "server"
Provides-Extra: dev
Requires-Dist: pytest>=7.4.0; extra == "dev"
Requires-Dist: pytest-asyncio>=0.21.0; extra == "dev"
Requires-Dist: httpx>=0.25.0; extra == "dev"
Dynamic: license-file

<p align="center">
  <h1 align="center">ClawBox</h1>
  <p align="center">
    Open-source cloud file system for AI agents<br>
    Semantic search &bull; Folders &bull; File sharing &bull; Self-hostable
  </p>
  <p align="center">
    <a href="LICENSE"><img src="https://img.shields.io/badge/license-MIT-blue.svg" alt="License: MIT"></a>
    <a href="https://www.python.org/downloads/"><img src="https://img.shields.io/badge/python-3.10+-blue.svg" alt="Python 3.10+"></a>
    <a href="https://clawbox.ink"><img src="https://img.shields.io/badge/demo-clawbox.ink-00d4ff.svg" alt="Live Demo"></a>
    <a href="docs/self-hosting.md"><img src="https://img.shields.io/badge/self--host-docker%20compose-green.svg" alt="Self-Host"></a>
  </p>
  <p align="center">
    <a href="https://clawbox.ink">Live Demo</a> &bull;
    <a href="docs/self-hosting.md">Self-Host Guide</a>
  </p>
</p>

---

## What is ClawBox?

ClawBox is an open-source file storage platform built for AI agents. Upload files, search by meaning, organize into folders, and share with anyone &mdash; all via API or CLI.

**For agents:** Store files, search by meaning, organize into folders &mdash; all via API.

**For humans:** Upload files, get a share link, search across documents. Like a smarter Dropbox with an API-first design.

### Key Features

| Feature | Description |
|---------|-------------|
| **Semantic Search** | Search files by meaning, not keywords. Powered by Gemini embeddings. |
| **Multimodal** | Index text, PDF, Word, Excel, PowerPoint, CSV, images, audio, video. |
| **Virtual Folders** | Organize files with paths like `/docs/reports/q1.pdf`. |
| **File Sharing** | Generate share links &mdash; anyone with the link can download. |
| **Google Login** | Sign in with Google for 10 GB storage (1 GB free without login). |
| **Self-Hostable** | Docker Compose and cloud-friendly deployment options for local or hosted setups. |

---

## Quick Start

### Option 1: Docker

```bash
git clone https://github.com/Alfra-AI/clawbox.git
cd clawbox
cp .env.example .env       # Edit to add your Google API key for search
docker compose up -d
docker compose exec app alembic upgrade head
```

ClawBox is then available at **http://localhost:8000**.

### Option 2: Use the hosted version

No setup needed &mdash; just visit **[clawbox.ink](https://clawbox.ink)**.

### Option 3: CLI

```bash
pip install .
clawbox init                    # Get a token
clawbox upload report.pdf       # Upload
clawbox search "quarterly revenue"  # Semantic search
```

If you want to connect ClawBox to a coding or task agent, see [`ClawBoxSkill/SKILL.md`](ClawBoxSkill/SKILL.md).

---

## API

All endpoints (except public ones) require `Authorization: Bearer <token>`.

### Core

| Endpoint | Method | Auth | Description |
|----------|--------|------|-------------|
| `POST /get_token` | POST | No | Get a free token (1 GB storage) |
| `POST /files/upload` | POST | Yes | Upload a file (with optional `path` for folders) |
| `GET /files` | GET | Yes | List files (filter by `folder`, `recursive`) |
| `GET /files/{id}` | GET | Yes | Download a file |
| `PATCH /files/{id}` | PATCH | Yes | Move/rename a file |
| `DELETE /files/{id}` | DELETE | Yes | Delete a file |
| `POST /search` | POST | Yes | Semantic search across files |
| `POST /files/embed` | POST | Yes | Generate/retry embeddings |

### Sharing

| Endpoint | Method | Auth | Description |
|----------|--------|------|-------------|
| `POST /files/{id}/share` | POST | Yes | Create a share link |
| `GET /s/{code}` | GET | No | Download via share link |

### Auth

| Endpoint | Method | Description |
|----------|--------|-------------|
| `GET /auth/google` | GET | Start Google OAuth login |
| `GET /auth/me` | GET | Get current user info |

---

## Supported File Formats

| Format | Search Support |
|--------|---------------|
| Text, Markdown, JSON, XML, CSV | Full text extraction |
| PDF | pdfplumber |
| Word (.docx) | python-docx |
| Excel (.xlsx) | openpyxl |
| PowerPoint (.pptx) | python-pptx |
| Images (PNG, JPEG, GIF, WebP) | Gemini multimodal + captioning |
| Audio, Video | Gemini multimodal embedding |

---

## Self-Hosting

### Single Server

```bash
docker compose up -d
docker compose exec app alembic upgrade head
```

### With MinIO (S3-Compatible Storage)

```bash
docker compose -f docker-compose.cluster.yml up -d
docker compose -f docker-compose.cluster.yml exec app alembic upgrade head
```

### Any Cloud

ClawBox works with any S3-compatible storage. Set `S3_ENDPOINT_URL`:

| Provider | S3_ENDPOINT_URL |
|----------|-----------------|
| AWS S3 | (leave empty) |
| MinIO | `http://minio:9000` |
| GCS | `https://storage.googleapis.com` |
| DigitalOcean Spaces | `https://{region}.digitaloceanspaces.com` |
| Cloudflare R2 | `https://{account_id}.r2.cloudflarestorage.com` |

See [Self-Hosting Guide](docs/self-hosting.md) for the full local, cluster, and cloud setup flow.

---

## Configuration

| Variable | Default | Description |
|----------|---------|-------------|
| `DATABASE_URL` | `postgresql://...` | PostgreSQL connection |
| `STORAGE_BACKEND` | `local` | `local` or `s3` |
| `S3_ENDPOINT_URL` | (empty) | S3-compatible endpoint |
| `GOOGLE_API_KEY` | (empty) | Gemini API key (enables search) |
| `GOOGLE_CLIENT_ID` | (empty) | Google OAuth (enables login) |
| `SESSION_SECRET_KEY` | `change-me` | Session signing (change in prod) |
| `APP_URL` | `http://localhost:8000` | Public URL for share links |

See [.env.example](.env.example) for all options.

---

## Architecture

```
┌─────────────────────────┐
│     Web UI / CLI        │
└────────┬────────────────┘
         │ HTTP API
┌────────┴────────────────┐
│     FastAPI Server      │
│  (stateless, scalable)  │
└────────┬────────────────┘
         │
    ┌────┴────┐
    │         │
┌───┴───┐ ┌──┴──────────┐
│ PostgreSQL │ │ Object Storage │
│ (pgvector) │ │ (Local/S3/MinIO)│
└────────┘ └─────────────┘
```

- **PostgreSQL + pgvector** &mdash; metadata, users, embeddings, search
- **Object storage** &mdash; file content (local filesystem, S3, MinIO, GCS, etc.)
- **Gemini** &mdash; embeddings + multimodal indexing (optional)

---

## Project Structure

```
src/
├── main.py           # FastAPI app, routing, middleware
├── config.py         # Settings from environment
├── models.py         # SQLAlchemy models
├── auth.py           # Bearer token authentication
├── storage.py        # Storage backend (local/S3)
├── embeddings.py     # Gemini embeddings + text extraction
├── database.py       # Database connection
├── cli.py            # CLI tool (clawbox command)
├── routes/
│   ├── tokens.py     # Token management + settings
│   ├── files.py      # File CRUD + sharing + folders
│   ├── search.py     # Semantic search
│   └── oauth.py      # Google OAuth
└── static/
    └── index.html    # Web UI
```

---

## Contributing

Contributions are welcome! Please open an issue or submit a PR.

---

## License

[MIT](LICENSE)
