Metadata-Version: 2.4
Name: maris
Version: 0.1.0
Summary: Local Multi-Agent Repository Intelligence System
Author-email: Rohin Patel <rohin.patel@outlook.com>
License: MIT
Project-URL: Homepage, https://github.com/rohinp/maris
Project-URL: Documentation, https://github.com/rohinp/maris/docs
Project-URL: Repository, https://github.com/rohinp/maris
Project-URL: Issues, https://github.com/rohinp/maris/issues
Keywords: repository,intelligence,llm,code-analysis,local-first
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: >=3.11
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: tree-sitter>=0.21.0
Requires-Dist: duckdb>=1.0.0
Requires-Dist: lancedb>=0.5.0
Requires-Dist: pyarrow>=17.0.0
Requires-Dist: langchain>=0.1.0
Requires-Dist: langchain-community>=0.0.20
Requires-Dist: langgraph>=0.0.20
Requires-Dist: ollama>=0.1.0
Requires-Dist: pydantic>=2.5.0
Requires-Dist: python-dotenv>=1.0.0
Requires-Dist: watchdog>=4.0.0
Requires-Dist: gitpython>=3.1.0
Requires-Dist: rich>=13.7.0
Requires-Dist: click>=8.1.0
Provides-Extra: dev
Requires-Dist: pytest>=8.0.0; extra == "dev"
Requires-Dist: pytest-cov>=4.1.0; extra == "dev"
Requires-Dist: pytest-asyncio>=0.23.0; extra == "dev"
Requires-Dist: black>=24.0.0; extra == "dev"
Requires-Dist: ruff>=0.2.0; extra == "dev"
Requires-Dist: mypy>=1.8.0; extra == "dev"
Requires-Dist: pre-commit>=3.6.0; extra == "dev"
Provides-Extra: docs
Requires-Dist: mkdocs>=1.5.0; extra == "docs"
Requires-Dist: mkdocs-material>=9.5.0; extra == "docs"
Requires-Dist: mkdocstrings[python]>=0.24.0; extra == "docs"
Dynamic: license-file

# Local Multi-Agent Repository Intelligence System

## Vision

Build a fully local, privacy-first repository intelligence platform that helps developers understand, navigate, document, analyze, and reason about source code.

The goal is **not** to compete with cloud coding assistants such as Claude Code, Cursor, GitHub Copilot, or OpenAI Codex.

The system will:

* Run locally
* Use local LLMs
* Never require source code to leave the machine
* Focus on understanding rather than code generation
* Be language-aware through AST parsing
* Maintain a continuously updated repository knowledge graph
* Support multiple specialized agents

The primary objective is to become a "repository expert" capable of answering questions, generating documentation, explaining architecture, performing impact analysis, and understanding code evolution over time.

---

# Core Principles

## 1. Retrieval First

The quality of answers depends on retrieval quality.

The system should prioritize:

* AST-aware indexing
* Symbol-aware retrieval
* Dependency-aware retrieval

over generic vector similarity search.

---

## 2. Code is a Graph

A repository is not a collection of files.

A repository is a graph of:

* Packages
* Modules
* Classes
* Traits
* Interfaces
* Functions
* Methods
* Dependencies
* Imports
* Call relationships

The system should maintain this graph as a first-class entity.

---

## 3. Local First

All processing should happen locally:

* Parsing
* Embedding generation
* Retrieval
* Reasoning

No external APIs are required.

---

## 4. Specialized Agents

Each agent should have a single responsibility.

Avoid creating one large autonomous agent.

Instead create multiple focused agents sharing a common knowledge layer.

---

# High-Level Architecture

```text
Repository

    │

    ▼

Indexing Agent

    │

    ▼

Repository Knowledge Layer

    ├── Symbol Store
    ├── Dependency Graph
    ├── Vector Store
    ├── Commit History
    └── Metadata

    │

    ▼

Agents

    ├── Documentation Agent ✅
    ├── Q&A Agent ✅
    ├── Git Agent ✅
    ├── Impact Analysis Agent ✅
    ├── Git Archaeology Agent (Planned)
    └── Future Agents
```

---

# Technology Choices

## Parsing

Use Tree-sitter.

Reason:

* Mature ecosystem
* Multi-language support
* Incremental parsing
* Existing grammars

Supported languages for MVP:

* Scala
* Java
* Python

Future:

* Go
* Rust
* Kotlin
* C++
* C#
* TypeScript

---

## Local LLM Runtime

Use Ollama.

Candidate models:

### MVP

* Qwen3 8B
* Gemma 3 12B

### Recommended

* Qwen3 32B

### Future

* Qwen3 72B
* DeepSeek R1 Distill

---

## Embeddings

Candidate models:

* nomic-embed-text
* bge-large
* gte-large

Embeddings should only assist retrieval.

They must not become the primary retrieval mechanism.

---

## Agent Orchestration

Use LangGraph.

Reason:

* Explicit workflows
* State management
* Tool orchestration
* Easy future expansion

Avoid autonomous agent loops.

Prefer deterministic workflows.

---

## Storage

### Metadata Store

DuckDB

Stores:

* symbols
* files
* relationships
* commits
* documentation

---

### Vector Store

LanceDB

Stores:

* embeddings
* semantic search index

Alternative:

* Qdrant

---

### Future Graph Database

Optional.

Candidates:

* KuzuDB
* Neo4j

Do not introduce graph databases during MVP.

---

# Repository Knowledge Layer

This is the most important component.

All agents interact through this layer.

Responsibilities:

* Symbol lookup
* Dependency traversal
* Semantic retrieval
* Impact analysis support
* Commit history lookup

Example interface:

```scala
trait RepositoryKnowledgeService {

  def findSymbol(name: String)

  def findCallers(symbol: Symbol)

  def findCallees(symbol: Symbol)

  def retrieveContext(question: String)

  def impactedSymbols(symbol: Symbol)

}
```

This layer becomes the foundation of the entire platform.

---

# MVP

## Agent 1: Repository Indexing Agent

### Responsibilities

Convert source code into structured knowledge.

### Workflow

Repository

↓

Tree-sitter AST

↓

Symbol Extraction

↓

Dependency Extraction

↓

Embedding Generation

↓

Storage

### Extracted Metadata

For every symbol:

```json
{
  "symbol": "GraphRunner.retryExecuteNode",
  "type": "method",
  "file": "GraphRunner.scala",
  "language": "scala",
  "calls": [
    "attemptExecuteNode"
  ]
}
```

### Incremental Updates

✅ **Implemented via Git Agent**

The system now includes a Git Agent that:

* Detects changes via `git diff`
* Tracks the last indexed commit
* Re-indexes only changed files
* Supports incremental indexing via CLI: `maris index --incremental`

This dramatically improves indexing performance for large repositories.

See [Git Agent Documentation](docs/GIT_AGENT.md) for details.

---

## Agent 2: Documentation Agent

### Responsibilities

Generate repository documentation.

### Output

* Architecture overview
* Component documentation
* Module descriptions
* Dependency diagrams
* Data flow descriptions

### Important Rule

Never generate documentation directly from raw files.

Always use indexed symbols and repository graph data.

---

## Agent 3: Repository Q&A Agent

### Responsibilities

Answer questions about code.

Examples:

* Explain GraphRunner
* How does retry work?
* Where is reducer used?
* What happens when training starts?

### Workflow

Question

↓

Retrieve Symbols

↓

Expand Dependencies

↓

Build Context

↓

LLM Reasoning

↓

Answer

### Goal

Context should consist of relevant symbols.

Not arbitrary chunks.

---

# Future Roadmap

## Agent 4: Git Agent

✅ **Implemented** (June 2026)

Purpose:

Track repository changes and enable incremental indexing.

Capabilities:

* Detect changes since last indexing
* Categorize changes (added/modified/deleted/renamed)
* Enable efficient incremental re-indexing
* Track commit history

See [Git Agent Documentation](docs/GIT_AGENT.md) for details.

---

## Agent 5: Impact Analysis Agent

✅ **Implemented** (June 2026)

Purpose:

Analyze the impact of code changes and help developers understand what will be affected by modifications.

Capabilities:

* **Dependency analysis**: Find direct and indirect callers, callees, and affected files
* **Test discovery**: Identify tests covering symbols and suggest missing scenarios
* **Edge case detection**: Detect missing null checks, error handling, and boundary conditions
* **Breaking change detection**: Identify potential breaking changes and affected callers
* **Recommendations**: Generate actionable recommendations based on analysis

Integration:

* **Auto-routing**: Orchestrator automatically routes impact-related questions (keywords: "impact", "affect", "break", "edge case", "test coverage")
* **Explicit CLI**:
  - `maris impact analyze --symbol "SymbolName"`
  - `maris impact edge-cases --file "path/to/file.py"`
  - `maris impact tests --symbol "SymbolName"`
  - `maris impact breaking-changes --symbol "SymbolName"`
* **Implicit via ask**: `maris ask "What will be affected if I change X?"`

Example:

```bash
# Auto-routed to Impact Analysis Agent
maris ask "What will be affected if I change GitAgent?"

# Explicit impact analysis
maris impact analyze --symbol "GitAgent.detect_changes"
maris impact edge-cases --file "src/maris/agents/git_agent.py"
maris impact tests --symbol "QAAgent.answer_question"
```

See [Impact Analysis Agent Documentation](docs/IMPACT_ANALYSIS_AGENT.md) for details.

---

## Agent 6: Git Archaeology Agent

Purpose:

Understand historical code evolution.

Questions:

* When was this bug introduced?
* Who changed this logic?
* Why was this method added?

Data Sources:

* git log
* git blame
* commit metadata

Capabilities:

* commit timeline generation
* code evolution summaries
* regression identification

---

## Agent 6: Test Suggestion Agent

Purpose:

Suggest tests based on modifications.

Inputs:

* changed symbols
* dependency graph
* historical bugs

Outputs:

* missing tests
* edge cases
* regression scenarios

---

## Agent 7: Architecture Evolution Agent

Purpose:

Track architecture changes over time.

Capabilities:

* detect coupling growth
* detect module boundaries
* identify hotspots
* detect architectural drift

---

# Retrieval Strategy

## Do Not

Generic chunking:

```text
1000 token chunks
```

This loses structure.

---

## Preferred

AST-based symbol chunking.

Example:

```text
Package

  ├── Class

        ├── Method

        ├── Method

        └── Method
```

Each symbol becomes a retrievable unit.

---

## Retrieval Pipeline

Question

↓

Vector Search

↓

Symbol Expansion

↓

Dependency Expansion

↓

Context Assembly

↓

Reasoning

This combines semantic search with graph traversal.

---

# Non Goals

The system is NOT intended to:

* Generate PRs
* Automatically modify code
* Replace developers
* Act autonomously
* Execute arbitrary repository changes

The system is designed to help developers understand software.

---

# Success Criteria

MVP is successful when:

1. ✅ Repository indexing works incrementally (Git Agent)
2. ✅ Symbols can be queried accurately
3. ✅ Documentation can be generated automatically
4. ✅ Q&A answers are grounded in repository knowledge
5. ✅ Entire workflow runs locally
6. ✅ No external API dependencies are required

**MVP Complete!** All success criteria have been met.

---

# Long-Term Goal

Become a local repository intelligence platform capable of understanding large codebases as well as experienced maintainers, while remaining privacy-first, language-aware, and fully developer-controlled.

