Metadata-Version: 2.4
Name: knowledge_base_mcp
Version: 0.6.1
Summary: Knowledge Base MCP
Requires-Python: >=3.13
Requires-Dist: aiohttp[speedups]>=3.12.13
Requires-Dist: asyncclick>=8.1.8
Requires-Dist: docling-core[chunking]>=2.37.0
Requires-Dist: docling==2.42.0
Requires-Dist: duckdb>=1.3.0
Requires-Dist: fastmcp>=2.3.5
Requires-Dist: flashrank>=0.2.10
Requires-Dist: githubkit>=0.12.15
Requires-Dist: gitpython>=3.1.44
Requires-Dist: llama-index-core>=0.12.45
Requires-Dist: llama-index-embeddings-fastembed>=0.3.2
Requires-Dist: llama-index-embeddings-huggingface>=0.5.4
Requires-Dist: llama-index-node-parser-docling>=0.3.2
Requires-Dist: llama-index-postprocessor-flashrank-rerank>=0.1.0
Requires-Dist: llama-index-readers-docling>=0.3.3
Requires-Dist: llama-index-readers-github>=0.6.1
Requires-Dist: llama-index-readers-slack>=0.3.1
Requires-Dist: llama-index-storage-docstore-duckdb>=0.1.0
Requires-Dist: llama-index-storage-docstore-elasticsearch>=0.3.0
Requires-Dist: llama-index-storage-index-store-duckdb>=0.1.0
Requires-Dist: llama-index-storage-index-store-elasticsearch>=0.4.0
Requires-Dist: llama-index-storage-kvstore-duckdb>=0.1.1
Requires-Dist: llama-index-vector-stores-duckdb>=0.1.0
Requires-Dist: llama-index-vector-stores-elasticsearch>=0.4.3
Requires-Dist: llama-index>=0.12.45
Requires-Dist: mcp>=1.9.0
Requires-Dist: mistune>=3.1.3
Requires-Dist: rich>=14.0.0
Requires-Dist: sentence-transformers>=4.1.0
Requires-Dist: syrupy>=4.9.1
Requires-Dist: torch>=2.7.1
Requires-Dist: tree-sitter-language-pack>=0.8.0
Requires-Dist: tree-sitter>=0.24.0
Requires-Dist: types-lxml>=2025.3.30
Description-Content-Type: text/markdown

# Knowledge Base MCP

An MCP Server for creating Knowledge Bases and searching them!

## 100% Local, Fast, and Easy to Use

The MCP Server uses DuckDB by default which runs in-process and persists to disk. Everything from the vector store, to the embeddings, to the reranker, crawler, etc is all local. It does not require any cloud services or external dependencies.

Even more important, the Knowledge Base MCP Server is very fast at indexing, embedding, searching, and reranking compared to other solutions.

## Quick Start

Just add the following to your MCP Server configuration!

```json
"Knowledge Base": {
    "command": "uvx",
    "args": [
        "knowledge_base_mcp",
        "duckdb",
        "persistent",
        "run"
    ]
}
```

# Features

## Search Features

- **Knowledge Base Management**: Organize your data into knowledge bases.
- **Semantic Search**: Query across one or more knowledge bases using semantic search.
- **Multiple Vector Store Backends**: Works with DuckDB and Elasticsearch.

Search results are automatically reranked using a reranker model to try to achieve the highest quality results.

## Document Ingestion Features

- **Website Ingestion**: Crawl websites into a knowledge base.
- **Git Repository Ingestion**: Load and ingest a Git repository into a knowledge base.
- **Directory Ingestion**: Load and ingest a directory into a knowledge base.

The Knowledge Base MCP Server uses advanced hierarchical + semantic chunking techniques to create embeddings for your data.

## Usage

### Command-Line Interface

The CLI supports both DuckDB (in-memory or persistent) and Elasticsearch as vector store backends.

#### DuckDB (Persistent)

To run the server with a persistent DuckDB store, which will save your knowledge bases to disk:

```bash
uv run knowledge_base_mcp duckdb persistent --docs-db-dir ./storage --docs-db-name documents.duckdb --code-db-dir ./storage --code-db-name code.duckdb run
```

- `--docs-db-dir`: Directory to store document knowledge base. Defaults to `./storage`.
- `--docs-db-name`: Filename for the document knowledge base. Defaults to `documents.duckdb`.
- `--code-db-dir`: Directory to store code knowledge base. Defaults to `./storage`.
- `--code-db-name`: Filename for the code knowledge base. Defaults to `code.duckdb`.

#### DuckDB (In-Memory)

To run the server with an in-memory DuckDB store (data will not be persisted after the server stops):

```bash
uv run knowledge_base_mcp duckdb memory run
```

#### Elasticsearch

To run the server with Elasticsearch as the backend, ensure you have an Elasticsearch instance running and accessible.

```bash
uv run knowledge_base_mcp elasticsearch --url http://localhost:9200 --index-docs-vectors kbmcp-docs-vectors --index-docs-kv kbmcp-docs-kv --index-code-vectors kbmcp-code-vectors --index-code-kv kbmcp-code-kv run
```

You can also set Elasticsearch options via environment variables:
- `ES_URL`: Elasticsearch instance URL (e.g., `http://localhost:9200`).
- `ES_INDEX_DOCS_VECTORS`: Index name for document vectors. Defaults to `kbmcp-docs-vectors`.
- `ES_INDEX_DOCS_KV`: Index name for document key-value store. Defaults to `kbmcp-docs-kv`.
- `ES_INDEX_CODE_VECTORS`: Index name for code vectors. Defaults to `kbmcp-code-vectors`.
- `ES_INDEX_CODE_KV`: Index name for code key-value store. Defaults to `kbmcp-code-kv`.
- `ES_USERNAME`: Username for Elasticsearch authentication.
- `ES_PASSWORD`: Password for Elasticsearch authentication.
- `ES_API_KEY`: API Key for Elasticsearch authentication.

### Main Server Tools/Endpoints

When running, the MCP server exposes the following tools:

#### Ingestion Tools (from `IngestServer`)

- **load_website**: Create a new knowledge base from a website by crawling seed URLs. If the knowledge base already exists, it will be replaced.
  - Parameters: `knowledge_base` (name for the new KB), `seed_urls` (list of URLs to start crawling), `url_exclusions` (optional list of URLs to exclude), `max_pages` (optional maximum number of pages to crawl).
- **load_directory**: Create a new knowledge base from files within a local directory.
  - Parameters: `knowledge_base` (name for the new KB), `path` (path to the directory), `exclude` (optional file path globs to exclude), `extensions` (optional list of file extensions to include, defaults to Markdown and AsciiDoc), `recursive` (optional, whether to recursively gather files, defaults to True).
- **load_git_repository**: Create a new knowledge base from a Git repository by cloning it and ingesting its contents.
  - Parameters: `knowledge_base` (name for the new KB), `repository_url` (URL of the Git repository), `branch` (branch to clone), `path` (path within the repository to ingest), `exclude` (optional file path globs to exclude), `extensions` (optional list of file extensions to include, defaults to Markdown and AsciiDoc).

#### Search Tools (from `KnowledgeBaseSearchServer`)

- **query**: Search across one or more knowledge bases using a natural language question.
  - Parameters: `query` (the plain language query string), `knowledge_bases` (optional list of knowledge base names to restrict the search to; searches all if not provided).
- **get_document**: Retrieve a specific document from a knowledge base by its title.
  - Parameters: `knowledge_base` (the name of the knowledge base), `title` (the title of the document).

#### Management Tools (from `KnowledgeBaseManagementServer`)

- **get_knowledge_bases**: List all available knowledge bases and their document counts.
- **delete_knowledge_base**: Remove a specific knowledge base by its name.
  - Parameters: `knowledge_base` (the name of the knowledge base to delete).
- **delete_all_knowledge_bases**: Remove all knowledge bases from the vector store.
- **get_knowledge_base_stats**: Get detailed statistics for a specific knowledge base.
  - Parameters: `knowledge_base` (the name of the knowledge base).

## VS Code McpServer Usage

1. Open the command palette (Ctrl+Shift+P or Cmd+Shift+P).
2. Type "Settings" and select "Preferences: Open User Settings (JSON)".
3. Add the following MCP Server configuration:

```json
{
    "mcp": {
        "servers": {
            "Knowledge Base": {
                "command": "uvx",
                "args": [
                    "knowledge_base_mcp",
                    "duckdb",
                    "persistent",
                    "run"
                ]
            }
        }
    }
}
```

## Roo Code


```json
{
    "mcpServers": {
      "Knowledge Base": {
          "command": "uvx",
          "args": [
              "knowledge_base_mcp",
              "duckdb",
              "persistent",
              "run"
          ]
      }
    }
}
```

## License

See [LICENSE](LICENSE).