Metadata-Version: 2.4
Name: agentkb
Version: 0.1.0
Summary: A unified search and knowledge tool for AI agents and developers
License-Expression: MIT
Requires-Python: <3.14,>=3.11
Requires-Dist: click>=8.0
Requires-Dist: numpy>=1.24
Requires-Dist: pylate>=1.1
Requires-Dist: pyyaml>=6.0
Requires-Dist: tree-sitter-c
Requires-Dist: tree-sitter-go
Requires-Dist: tree-sitter-java
Requires-Dist: tree-sitter-javascript
Requires-Dist: tree-sitter-python
Requires-Dist: tree-sitter-ruby
Requires-Dist: tree-sitter-rust
Requires-Dist: tree-sitter-typescript
Requires-Dist: tree-sitter>=0.24
Provides-Extra: web
Requires-Dist: beautifulsoup4>=4.12; extra == 'web'
Requires-Dist: requests>=2.28; extra == 'web'
Description-Content-Type: text/markdown

# Plait

A unified search and knowledge tool for AI agents and developers.

Plait braids together two capabilities into a single tool: **semantic code search** and a **persistent, LLM-maintained knowledge base**. Agents get one interface to search code, search accumulated knowledge, and build up project understanding over time.

## Install

```bash
pip install plait

# Optional: URL ingestion support
pip install plait[web]
```

## Quick Start

```bash
# Index your code
plait index

# Search semantically
plait search "database connection pooling"

# Search with regex pre-filter + semantic ranking
plait search -e "async def" "error handling patterns"

# Search with file filtering
plait search --include="*.py" "authentication"
plait search --exclude-dir=tests "config parsing"
```

## Code Search

Plait indexes code using tree-sitter (Python, JS, TS, Rust, Go, Java, C, Ruby) and searches using ColBERT multi-vector embeddings with hybrid semantic + keyword ranking.

```bash
# Build/update the index (incremental — skips unchanged files)
plait index

# Semantic search
plait search "retry logic with exponential backoff"

# Regex pre-filter + semantic ranking
plait search -e "class.*Error" "custom exception handling"

# Fixed string match
plait search -F "TODO" "incomplete implementations"

# Word boundary match
plait search -w -e "test" "unit testing patterns"

# Semantic only (skip keyword search)
plait search --semantic-only "dependency injection"

# Filter by file type
plait search --include="*.rs" "memory allocation"

# Exclude directories
plait search --exclude-dir=vendor --exclude-dir=generated "validation"

# Top-k results (default: 15)
plait search -k 5 "logging"

# Files only (no content)
plait search -l "authentication"

# Full content output
plait search -c "main entry point"

# JSON output (for programmatic use)
plait search --json "error handling"
```

## Knowledge Base

The knowledge base is a directory of markdown files that the LLM writes, maintains, and searches. It compounds over time — every source ingested and every question answered can enrich it.

```bash
# Initialize a project KB
plait kb init

# Initialize a global KB (cross-project knowledge)
plait kb init --global

# Ingest a source document
plait kb ingest ./notes/architecture-decisions.md

# Ingest a URL (requires plait[web])
plait kb ingest https://some-blog.com/auth-best-practices

# Search the KB
plait search -s kb "why did we choose JWT over sessions"

# Search code + KB together
plait search -s all "authentication flow"

# List all wiki pages
plait kb list

# Check KB health
plait kb lint

# Show KB status
plait kb status
```

### KB Structure

```
~/.plait/kb/{project}/
  wiki/         LLM-generated pages (entity, concept, decision, synthesis, source-summary)
  sources/      Raw input documents (articles, RFCs, meeting notes)
  schema.md     Page conventions — how to structure pages, frontmatter, wikilinks
  index.md      Content catalog — every page listed with summary
  log.md        Chronological operation log
```

### Writing KB Pages

Pages are markdown with YAML frontmatter. The agent writes them directly using its file tools:

```markdown
---
title: Auth Token Lifecycle
type: entity
tags: [auth, oauth, tokens]
sources: [oauth-rfc.md]
created: 2026-04-04
updated: 2026-04-04
---

## Overview

Tokens are refreshed via the OAuth2 refresh_token grant...

See also: [[OAuth Configuration]], [[Session Management]]
```

Page types: `entity`, `concept`, `decision`, `synthesis`, `source-summary`. See `schema.md` in your KB for full conventions.

## Search Scopes

| Flag | Searches |
|------|----------|
| `-s code` (default) | Code index only |
| `-s kb` | Project KB + global KB |
| `-s all` | Everything |

Results are tagged with their source: `[code]`, `[kb]`, `[kb:source]`.

## Agent Integration

Plait includes hooks for Claude Code that inject context at session start and remind agents about semantic search.

```bash
# Install hooks into .claude/settings.json
plait hooks install

# What the hooks do:
plait hooks session-start    # Shows index stats, KB summary, usage instructions
plait hooks pre-tool-use     # Reminds about plait search on Grep/Glob
```

### Agent Workflow

1. Session starts. Hook tells the agent what indexes and KB pages are available.
2. Agent searches code: `plait search "token refresh logic"`
3. Agent searches KB: `plait search -s kb "why refresh tokens expire after 30 days"`
4. Agent learns something worth keeping: writes a wiki page to the KB.
5. Agent searches again — new page is indexed automatically on next search.

## Configuration

```bash
# View all settings
plait settings

# Set a value
plait settings set code_model lightonai/LateOn-Code-edge
plait settings set kb_model answerdotai/answerai-colbert-small-v1
plait settings set top_k 20
```

Settings are stored in `~/.plait/config.json`.

| Setting | Default | Description |
|---------|---------|-------------|
| `code_model` | `answerdotai/answerai-colbert-small-v1` | ColBERT model for code indexing |
| `kb_model` | `answerdotai/answerai-colbert-small-v1` | ColBERT model for KB indexing |
| `default_scope` | `code` | Default search scope |
| `top_k` | `15` | Default number of results |

## Other Commands

```bash
# Show project status (index stats, KB summary)
plait status

# Delete index for current project
plait clear

# Delete all indexes
plait clear --all

# Check for updates
plait update
```

## How It Works

Plait uses a hybrid search pipeline:

1. **ColBERT encoding** — queries and documents are encoded into per-token multi-vector embeddings
2. **PLAID search** — late-interaction scoring finds semantically relevant results
3. **FTS5 keyword search** — BM25 ranking catches exact matches semantic search might miss
4. **RRF fusion** — reciprocal rank fusion merges both rankings (semantic weighted 3x higher)
5. **Post-filtering** — regex patterns and glob filters applied to results

Code is parsed with tree-sitter into structured units (functions, classes, methods) with call graph and signature information baked into the embedding text. KB pages are chunked at heading boundaries with frontmatter metadata.

Indexes are incremental — only changed files are re-encoded on subsequent runs.
