Metadata-Version: 2.4
Name: ai-prishtina-agentic-kag
Version: 0.1.0
Summary: Professional Knowledge-Augmented Generation (KAG) library for production services.
Author: ai-prishtina-agentic-kag contributors
License: MIT
Project-URL: Homepage, https://github.com/albanmaxhuni/ai-prishtina-agentic-kag
Project-URL: Repository, https://github.com/albanmaxhuni/ai-prishtina-agentic-kag
Project-URL: Issues, https://github.com/albanmaxhuni/ai-prishtina-agentic-kag/issues
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: pydantic>=2.0
Requires-Dist: networkx>=3.0
Requires-Dist: numpy>=1.24
Requires-Dist: openai>=1.0
Requires-Dist: anthropic>=0.10
Requires-Dist: tqdm>=4.65
Requires-Dist: python-dotenv>=1.0
Requires-Dist: typing-extensions>=4.5
Requires-Dist: faiss-cpu>=1.7.4
Requires-Dist: tiktoken>=0.5.0
Provides-Extra: dev
Requires-Dist: pytest>=7.0; extra == "dev"
Requires-Dist: pytest-asyncio>=0.21; extra == "dev"
Requires-Dist: black>=23.0; extra == "dev"
Requires-Dist: isort>=5.12; extra == "dev"
Requires-Dist: mypy>=1.3; extra == "dev"
Requires-Dist: ruff>=0.0.270; extra == "dev"
Dynamic: license-file
Dynamic: requires-python

# AI Prishtina · Agentic KAG

<div align="center">

<img src="https://cdn.buymeacoffee.com/uploads/cover_images/2025/07/Z2zeOfmCLnzhA78njCsfaRNA0ebw0dlXo53HmGhq.jpg@2560w_0e.webp" alt="AI-Prishtina Logo" width="100%">

<p align="center">
  <a href="https://pypi.org/project/ai-prishtina-agentic-kag/"><img src="https://img.shields.io/pypi/v/ai-prishtina-agentic-kag.svg" alt="PyPI version"></a>
  <a href="https://pepy.tech/project/ai-prishtina-agentic-kag"><img src="https://static.pepy.tech/badge/ai-prishtina-agentic-kag" alt="Total Downloads"></a>
  <a href="https://pepy.tech/project/ai-prishtina-agentic-kag/month"><img src="https://static.pepy.tech/badge/ai-prishtina-agentic-kag/month" alt="Monthly Downloads"></a>
  <a href="https://pepy.tech/project/ai-prishtina-agentic-kag/week"><img src="https://static.pepy.tech/badge/ai-prishtina-agentic-kag/week" alt="Weekly Downloads"></a>
  <a href="https://pypi.org/project/ai-prishtina-agentic-kag/"><img src="https://img.shields.io/pypi/dm/ai-prishtina-agentic-kag.svg" alt="PyPI Downloads"></a>
  <a href="https://www.python.org/downloads/"><img src="https://img.shields.io/badge/python-3.10+-blue.svg" alt="Python 3.10+"></a>
  <a href="https://albanmaxhuni.com"><img src="https://img.shields.io/badge/License-Commercial-green.svg" alt="License: Commercial"></a>
  <a href="https://github.com/albanmaxhuni/ai-prishtina-agentic-kag/actions"><img src="https://img.shields.io/github/actions/workflow/status/albanmaxhuni/ai-prishtina-agentic-kag/ci.yml?label=tests" alt="Tests"></a>
  <a href="./htmlcov/index.html"><img src="https://img.shields.io/badge/coverage-96%25-brightgreen.svg" alt="Coverage"></a>
</p>

**Your documents had one job. Now they have a graph.**

A Python library for **Knowledge-Augmented Generation (KAG)** that turns your documents into a self-improving, graph-native knowledge system. Ships with NetworkX-powered reasoning, multi-strategy retrieval, and professional LLM integrations. Current release: **v0.1.0**.

</div>

## Table of contents

- [Why this library?](#why-this-library)
- [Key features](#key-features)
  - [Graph-Native Memory](#the-brain--graph-native-memory)
  - [Multi-Strategy Retrieval](#the-secret-sauce--multi-strategy-retrieval)
  - [Agentic Reasoning](#the-logic--agentic-reasoning)
  - [Production Infrastructure](#the-suit--production-grade-infrastructure)
- [Package structure](#package-structure)
- [Quick start](#quick-start)
  - [Installation](#installation)
  - [Basic usage](#basic-usage)
- [The full cookbook](#the-full-cookbook)
  - [Recipe 1 — Hello, KAG world](#recipe-1--hello-kag-world)
  - [Recipe 2 — Graph reasoning](#recipe-2--graph-reasoning)
  - [Recipe 3 — Multi-strategy retrieval](#recipe-3--multi-strategy-retrieval)
  - [Recipe 4 — Production observability](#recipe-4--production-observability)
  - [Recipe 5 — Advanced ingestion pipeline](#recipe-5--advanced-ingestion-pipeline)
  - [Recipe 6 — High-precision retrieval (Reranking)](#recipe-6--high-precision-retrieval-reranking)
  - [Recipe 7 — Security and Analytics](#recipe-7--security-and-analytics)
  - [Recipe 8 — Async & Streaming (Web Services)](#recipe-8--async--streaming-web-services)
  - [Recipe 9 — Custom Provider Extension](#recipe-9--custom-provider-extension)
  - [Recipe 10 — High-Level Orchestrator (AgenticKAG)](#recipe-10--high-level-orchestrator-agentickag)
  - [Recipe 11 — Novel Features: Self-Correction & Semantic Discovery](#recipe-11--novel-features-self-correction--semantic-discovery)



---

## Why this library?

While standard RAG (Retrieval-Augmented Generation) relies solely on flat vector similarity, Agentic KAG elevates your AI's reasoning by treating knowledge as an interconnected graph. By combining semantic search with graph-aware reasoning (like PageRank and Community Detection), this library delivers deterministic, highly contextual, and traceable answers that enterprise environments demand.

```python
# The TL;DR version — three lines to go from "I have documents" to "I have actual answers"
from agentic_kag import KAGPipeline, KnowledgeGraph, OpenAIGenerator
pipeline = KAGPipeline(graph=KnowledgeGraph(), generator=OpenAIGenerator())
response = await pipeline.run_async("Summarize Q4 revenue trends")
```

---

## Architecture

Agentic KAG follows a modular, four-layer architecture designed for maximum flexibility and reasoning depth.

```mermaid
graph TD
    subgraph Ingestion ["1. Ingestion Layer"]
        Docs[Raw Documents] --> Loader[Document Loader]
        Loader --> Splitter[Recursive Splitter]
        Splitter --> Extractor[Metadata Extractor]
        Extractor --> Graph[(Knowledge Graph)]
        Extractor --> VStore[(Vector Store)]
    end

    subgraph Retrieval ["2. Retrieval Layer"]
        Query[User Query] --> MQ[Multi-Query / HyDE]
        MQ --> Hybrid[Hybrid Retriever]
        VStore -.-> Hybrid
        Graph -.-> Hybrid
        Hybrid --> Rerank[Cross-Encoder Reranker]
    end

    subgraph Reasoning ["3. Reasoning Layer"]
        Rerank --> Seeds[Seed Nodes]
        Seeds --> Reasoner{Agentic Reasoner}
        Reasoner --> ToT[Tree of Thoughts]
        Reasoner --> Refl[Reflection]
        Reasoner --> Path[Path Discovery]
        ToT & Refl & Path --> Context[Context Constructor]
    end

    subgraph Generation ["4. Generation Layer"]
        Context --> LLM[LLM Generator]
        Query --> LLM
        LLM --> Answer[Final Answer]
    end

    style Ingestion fill:#f9f,stroke:#333,stroke-width:2px
    style Retrieval fill:#bbf,stroke:#333,stroke-width:2px
    style Reasoning fill:#dfd,stroke:#333,stroke-width:2px
    style Generation fill:#fdd,stroke:#333,stroke-width:2px
```

---

## Key features

### The Brain — Graph-Native Memory

We don't just store text chunks; we store `KnowledgeNodes` and `KnowledgeEdges`. This allows the LLM to traverse relationships, not just match keywords. It's like giving your AI a long-term memory that actually makes sense.

| Component | What it does | Module |
|-----------|-------------|--------|
| **KnowledgeGraph** | NetworkX-backed directed graph with JSON persistence and analytics. | `graph/` |
| **KnowledgeNode** | Pydantic model for nodes with content, metadata, and embeddings. | `core/` |
| **KnowledgeEdge** | Pydantic model for directed, weighted relations between nodes. | `core/` |

### The Secret Sauce — Multi-Strategy Retrieval

Seamlessly fuse Keyword, Vector, and Hybrid retrieval strategies to eliminate the blind spots of pure vector search. Now with advanced reranking and query expansion.

| Feature | What it does (in plain English) | Module |
|---------|------------------------------|--------|
| **HybridRetriever** | Optimized combination of keyword and vector strategies using weighted scoring. | `retrieval/` |
| **CrossEncoderReranker** | High-precision re-scoring of candidates using Cross-Encoder models. | `retrieval/` |
| **MultiQueryRetriever** | Improves recall by generating and executing multiple query variations. | `retrieval/` |
| **HyDERetriever** | Uses hypothetical document embeddings to bridge semantic gaps. | `retrieval/` |
| **KeywordRetriever** | Fast term-overlap scoring using Jaccard similarity. | `retrieval/` |
| **VectorRetriever** | Semantic search using cosine similarity and embeddings. | `retrieval/` |

### The Logic — Agentic Reasoning

Before generating an answer, the pipeline "reasons" over the graph. It expands the context to include highly relevant, interconnected nodes that a simple vector search would miss.

| Component | What it does | Module |
|-----------|-------------|--------|
| **TreeOfThoughtsReasoner** | Systematic tree-based exploration and pruning of graph paths. | `reasoning/` |
| **ReflectionReasoner** | High-precision LLM-guided graph exploration and filtering. | `reasoning/` |
| **PathReasoner** | Narrative path discovery connecting disparate knowledge seeds. | `reasoning/` |
| **PageRankReasoner** | Importance-based candidate ranking using Personalized PageRank. | `reasoning/` |
| **CommunityReasoner** | Leverages graph clusters for thematic context discovery. | `reasoning/` |
| **RuleBasedReasoner** | Bounded neighbor traversal for strict graph expansion. | `reasoning/` |

### The Suit — Production-Grade Infrastructure

Enterprise features that keep your KAG system running in production without embarrassing itself.

| Feature | What it does (in plain English) | Module |
|---------|------------------------------|--------|
| **SelfCorrectionPipeline** | **Novel**: Automatically evaluates and retries queries with deep reasoning if initial results are weak. | `core/` |
| **SemanticEdgeDiscovery** | **Novel**: Uses LLMs to find hidden relationships between nodes that aren't explicitly linked in text. | `ingestion/` |
| **GraphContextCompressor** | **Novel**: Intelligently reduces context size using graph topology and node centrality. | `utils/` |
| **MetadataExtractor** | Automated extraction of entities and summaries using LLMs. | `ingestion/` |
| **BatchProcessor** | High-performance parallel processing for large-scale ingestion. | `ingestion/` |
| **RecursiveSplitter** | Intelligent text chunking that preserves structural integrity. | `ingestion/` |
| **AccessControl** | Permission-based security (READ/WRITE/ADMIN) for graph resources. | `utils/` |
| **Observability** | Built-in `CallbackManager` for granular lifecycle tracing. | `core/` |
| **Standardized Errors** | Comprehensive exception hierarchy for robust error handling. | `utils/` |

---

## Extensive Provider Ecosystem

We don't believe in vendor lock-in. Agentic KAG supports a wide array of local and remote providers out of the box.

*   **LLMs**: OpenAI, Anthropic, Google Gemini, Ollama (Local), llama.cpp (Local GGUF).
*   **Embeddings**: OpenAI, HuggingFace (Local), Ollama, Google, Cohere, Voyage AI.
*   **Vector Stores**: FAISS, ChromaDB, Qdrant, Milvus, Pinecone.

---

## Package structure

```text
agentic_kag/
├── base/               # Abstract base classes (Generator, Retriever, Reasoner, etc.)
├── core/               # Pipeline orchestrator, config, and domain models (types)
├── graph/              # Knowledge graph implementation and persistence
├── ingestion/          # Document loading, recursive splitting, and metadata extraction
├── providers/          # Concrete implementations for external services
│   ├── llm/            # OpenAI, Anthropic, Google, Ollama, llama.cpp
│   ├── embeddings/     # Local (HF) and Remote (OpenAI, Cohere, etc.) models
│   └── vector_stores/  # FAISS, Chroma, Qdrant, Milvus, Pinecone
├── reasoning/          # Graph expansion (ToT, Reflection, Path, PageRank, etc.)
├── retrieval/          # Search strategies (Hybrid, Reranker, Multi-Query, HyDE)
├── utils/              # Logging, Security, Context construction, and Exceptions
└── __init__.py         # Clean public API surface
```

---

## Quick start

### Installation

```bash
# The standard entry point
pip install ai-prishtina-agentic-kag

# For the intrepid developer
pip install -e ".[dev]"
```

### Environment setup

```bash
# Required: choose your preferred intelligence provider
export OPENAI_API_KEY=sk-your-key-here
export ANTHROPIC_API_KEY=sk-ant-...
```

---

## The full cookbook

<details>
<summary><strong>Recipe 1 — Hello, KAG world</strong></summary>

Get a local test instance running and complete a sample workflow in under 10 minutes. No specialized degree required.

```python
import os
import asyncio
from agentic_kag import (
    KAGPipeline, KnowledgeEdge, KnowledgeGraph, KnowledgeNode,
    OpenAIGenerator, VectorRetriever, OpenAIEmbedding, FAISSVectorStore
)

async def main():
    # Ensure your credentials are set
    os.environ["OPENAI_API_KEY"] = "your-openai-api-key-here"

    # 1. Initialize the Graph & Vector Store
    graph = KnowledgeGraph()
    embedding_model = OpenAIEmbedding(model="text-embedding-3-small")
    vector_store = FAISSVectorStore(embedding_model=embedding_model)

    # 2. Ingest Knowledge
    node_a = KnowledgeNode(node_id="a", content="Prishtina is the capital of Kosovo.")
    node_b = KnowledgeNode(node_id="b", content="Kosovo is located in the Balkans.")
    
    graph.add_node(node_a)
    graph.add_node(node_b)
    graph.add_edge(KnowledgeEdge(source_id="a", target_id="b", relation="located_in"))
    
    # Add to vector store to generate embeddings
    vector_store.add_nodes([node_a, node_b])

    # 3. Configure the Pipeline
    generator = OpenAIGenerator(model="gpt-4o")
    retriever = VectorRetriever(graph=graph, embedding_fn=embedding_model.embed_query)
    
    pipeline = KAGPipeline(graph=graph, generator=generator, retriever=retriever)

    # 4. Run the Agentic Query
    print("The system is contemplating your inquiry...")
    result = await pipeline.run_async("Where is Prishtina located?")
    
    print("\nThe Oracle Declares:", result.answer)
    print("Canonical Evidence:", result.citations)

if __name__ == "__main__":
    asyncio.run(main())
```

</details>

<details>
<summary><strong>Recipe 2 — Graph reasoning</strong></summary>

Leverage the power of NetworkX to reason over your knowledge graph. Because flat lists are so last season.

```python
from agentic_kag import PageRankReasoner, CommunityReasoner

# Use PageRank to find influential nodes
reasoner = PageRankReasoner(graph)
pipeline = KAGPipeline(graph=graph, reasoner=reasoner)

# Or use Community Detection to pull in related clusters
reasoner = CommunityReasoner(graph)
pipeline = KAGPipeline(graph=graph, reasoner=reasoner)

# For high-precision, use Reflection (requires a generator)
from agentic_kag import ReflectionReasoner, OpenAIGenerator
reasoner = ReflectionReasoner(graph, generator=OpenAIGenerator())
pipeline = KAGPipeline(graph=graph, reasoner=reasoner)

# For complex multi-hop queries, use Tree-of-Thoughts
from agentic_kag import TreeOfThoughtsReasoner
reasoner = TreeOfThoughtsReasoner(graph, generator=OpenAIGenerator())
pipeline = KAGPipeline(graph=graph, reasoner=reasoner)
```

</details>

<details>
<summary><strong>Recipe 3 — Multi-strategy retrieval</strong></summary>

Combine keyword and vector search for the best of both worlds. Fusing semantic depth with literal precision.

```python
from agentic_kag import HybridRetriever, KeywordRetriever, VectorRetriever

keyword = KeywordRetriever(graph)
vector = VectorRetriever(graph, embedding_fn=embedding_model.embed_query)

hybrid = HybridRetriever(
    graph=graph,
    vector_retriever=vector,
    keyword_retriever=keyword,
    alpha=0.5 # Balance between vector and keyword scores
)

pipeline = KAGPipeline(graph=graph, retriever=hybrid)
```

</details>

<details>
<summary><strong>Recipe 4 — Production observability</strong></summary>

Never have a "black box" moment in production. Watch your pipeline's thought process in real-time.

```python
from agentic_kag import CallbackManager, LoggingCallbackHandler

# Add built-in logging or create your own handler
callbacks = CallbackManager([LoggingCallbackHandler()])

pipeline = KAGPipeline(
    graph=graph,
    callbacks=callbacks
)

# You will now see detailed logs for retrieval, reasoning, and generation
await pipeline.run_async("What is KAG?")
```

</details>

<details>
<summary><strong>Recipe 5 — Advanced ingestion pipeline</strong></summary>

Automate the process of turning raw text files into structured, metadata-rich knowledge nodes.

```python
from agentic_kag import (
    DocumentLoader, RecursiveCharacterTextSplitter, 
    MetadataExtractor, OpenAIGenerator, BatchProcessor
)

# 1. Load raw documents
loader = DocumentLoader()
raw_nodes = loader.load_directory("./my_docs", glob="*.txt")

# 2. Split into semantic chunks
splitter = RecursiveCharacterTextSplitter(chunk_size=800, chunk_overlap=100)
chunked_nodes = splitter.split_documents(raw_nodes)

# 3. Enrich with AI-generated metadata (Summary, Entities)
extractor = MetadataExtractor(generator=OpenAIGenerator())
processor = BatchProcessor(max_workers=4)

# Process in parallel for speed
enriched_nodes = processor.map(
    lambda node: KnowledgeNode(
        **node.model_dump(exclude={'metadata'}),
        metadata={**node.metadata, **extractor.extract(node)}
    ),
    chunked_nodes
)

# Now add enriched_nodes to your KnowledgeGraph and VectorStore
```

</details>

<details>
<summary><strong>Recipe 6 — High-precision retrieval (Reranking)</strong></summary>

Use a two-stage retrieval process: fast initial recall followed by expensive, high-precision reranking.

```python
from agentic_kag import (
    VectorRetriever, CrossEncoderReranker, 
    KAGPipeline, OpenAIEmbedding
)

# Initial fast retriever
embeddings = OpenAIEmbedding()
vector_retriever = VectorRetriever(graph, embedding_fn=embeddings.embed_query)

# High-precision reranker (Cross-Encoder)
reranker = CrossEncoderReranker(model_name="cross-encoder/ms-marco-MiniLM-L-6-v2")

# Custom pipeline execution with reranking
async def run_with_rerank(query):
    # 1. Initial retrieval
    initial_results = vector_retriever.retrieve(query, top_k=20)
    
    # 2. Rerank top 20 down to best 5
    reranked_results = reranker.rerank(query, initial_results, graph)
    top_5 = reranked_results[:5]
    
    # 3. Proceed with standard pipeline reasoning/generation
    pipeline = KAGPipeline(graph=graph, retriever=vector_retriever)
    # ...
```

</details>

<details>
<summary><strong>Recipe 7 — Security and Analytics</strong></summary>

Enforce access control on your knowledge assets and monitor graph health.

```python
from agentic_kag import AccessControl, Permission, KnowledgeGraph

# 1. Enforce Permissions
security = AccessControl(permissions={Permission.READ})
security.require_permission(Permission.READ)  # Passes
# security.require_permission(Permission.WRITE) # Raises KAGError

# 2. Analyze Graph Structure
graph = KnowledgeGraph()
# ... add nodes and edges ...
stats = graph.get_analytics()
print(f"Graph Density: {stats['density']:.4f}")
print(f"Average Degree: {stats['average_degree']:.2f}")
```

</details>

<details>
<summary><strong>Recipe 8 — Async & Streaming (Web Services)</strong></summary>

Perfect for FastAPI or other high-concurrency web frameworks.

```python
from fastapi import FastAPI
from fastapi.responses import StreamingResponse
from agentic_kag import KAGPipeline, OpenAIGenerator, KnowledgeGraph

app = FastAPI()
pipeline = KAGPipeline(graph=KnowledgeGraph(), generator=OpenAIGenerator())

@app.get("/chat")
async def chat(query: str):
    # 1. Full async execution
    # result = await pipeline.run_async(query)
    # return {"answer": result.answer}

    # 2. Real-time token streaming
    async def stream_generator():
        # Note: In a real app, you'd perform retrieval/reasoning first
        # to get the context, then stream the generator.
        context = "..." # Pre-retrieved context
        async for chunk in pipeline._generator.astream(query, context):
            yield f"data: {chunk}\n\n"

    return StreamingResponse(stream_generator(), media_type="text/event-stream")
```

</details>

<details>
<summary><strong>Recipe 9 — Custom Provider Extension</strong></summary>

Extend the library with your own proprietary LLM or Vector Store.

```python
from agentic_kag.base.generator import Generator
from typing import Optional

class MyCustomLLM(Generator):
    def generate(self, query: str, context: str) -> str:
        # Your custom logic here
        return f"Custom response to {query}"

    async def agenerate(self, query: str, context: str) -> str:
        return self.generate(query, context)

# Use it in the pipeline
pipeline = KAGPipeline(graph=graph, generator=MyCustomLLM())
```

</details>

<details>
<summary><strong>Recipe 10 — High-Level Orchestrator (AgenticKAG)</strong></summary>

The easiest way to get started with a simple document-based KAG system.

```python
from agentic_kag import AgenticKAG, Document, InMemoryKnowledgeBase

# 1. Setup simple KB
kb = InMemoryKnowledgeBase()
kb.add_documents([
    Document(id="doc1", content="The project is 100% complete."),
    Document(id="doc2", content="AI Prishtina is the maintainer.")
])

# 2. Initialize Orchestrator with a simple generator function
def my_llm_call(prompt, **kwargs):
    return "The system says: " + prompt[:20]

agent = AgenticKAG(generator=my_llm_call, knowledge_base=kb)

# 3. Run
response = agent.run("What is the status?")
print(response.answer)
```

</details>

<details>
<summary><strong>Recipe 11 — Novel Features: Self-Correction & Semantic Discovery</strong></summary>

Leverage the unique capabilities of Agentic KAG to build a self-improving knowledge system.

```python
from agentic_kag import (
    SelfCorrectionPipeline, SemanticEdgeDiscoverer, 
    GraphContextCompressor, KnowledgeGraph, OpenAIGenerator
)

# 1. Dynamic Semantic Edge Discovery
# Find hidden links between nodes during ingestion
discoverer = SemanticEdgeDiscoverer(generator=OpenAIGenerator())
new_edges = discoverer.discover_edges(new_nodes, graph)
for edge in new_edges:
    graph.add_edge(edge)

# 2. Self-Correcting Pipeline
# Automatically retry with "Deep Reasoning" if the first answer is weak
pipeline = SelfCorrectionPipeline(graph=graph, generator=OpenAIGenerator())
result = pipeline.run("Explain the complex relationship between X and Y")

# 3. Graph-Aware Context Compression
# Intelligently shrink context using graph topology
compressor = GraphContextCompressor(graph)
compressed_context = compressor.compress(result.citations, max_tokens=500)
```

</details>

---

## Deployment Strategies for the Professional Environment

Deploying Agentic KAG in an enterprise setting requires a level of care usually reserved for fine china.

### Environment Prerequisites
- **Compute Resources**: Minimum 2 vCPUs, 4GB RAM (FAISS runs entirely in-memory; scale your RAM according to your ambitions and document count).
- **Python Runtime**: 3.10+ in a containerized environment (Docker is strongly recommended for sanity maintenance).
- **Network Connectivity**: Outbound internet access to your intelligence providers of choice (OpenAI or Anthropic).

### Infrastructure Configuration
1. **Vector Store Persistence**: The default `FAISSVectorStore` lives a transient life in-memory. For production, extend this class to use `faiss.write_index` and `faiss.read_index` to persist your index to an S3 bucket or persistent volume.
2. **Graph Persistence**: Serialize your `KnowledgeGraph` (via `nx.node_link_data`) to a graph database (like Neo4j) or a JSON blob in a relational database for stateful deployments.
3. **Concurrency Management**: Utilize the `run_async` and `astream` methods behind an ASGI server (like Uvicorn/FastAPI) to handle high concurrent user loads without causing your event loop to question its existence.

### Security Hardening
- **Secret Management**: NEVER hardcode API keys unless you enjoy involuntary security audits. Inject `OPENAI_API_KEY` via secure secret managers.
- **Prompt Injection Mitigation**: The `ContextConstructor` is strict, but you should be stricter. Ensure user queries are sanitized before they reach the pipeline.
- **Data Privacy**: If processing sensitive information, consider local embedding models and local LLMs by extending our abstract base classes.

---

## The Collective Effort: Contribution Guidelines

We welcome contributions from those who value code quality over code quantity.

### 1. Local Setup
```bash
git clone https://github.com/albanmaxhuni/ai-prishtina-agentic-kag.git
cd ai-prishtina-agentic-kag
# Install with the tools of the trade
pip install -e ".[dev]"
```

### 2. Code Style Mandates
- **Formatting**: We use `black` (line length 100).
- **Linting**: We use `ruff`. It's fast, and so are we.
- **Type Checking**: We use `mypy`. All new functions MUST have complete type hints. Guessing is for games, not production code.
- **Docstrings**: Follow the Google Docstring format.

### 3. Testing Mandates
- All new features must include unit tests. No tests, no merge.
- Run the test suite before submitting a PR to avoid the walk of shame:
  ```bash
  export PYTHONPATH=$PYTHONPATH:$(pwd)
  python -m unittest discover -s tests/unit -p "test*.py"
  ```
- Coverage must not decrease. We aim for the stars, or at least 95%.

### 4. PR Submission Workflow
1. Fork the repository and create a feature branch.
2. Commit your changes with messages that would make your mother proud.
3. Ensure all tests and linters pass.
4. Open a Pull Request. Include a summary of changes and link any relevant issues.

---

## When Things Don't Go As Planned: FAQ

Here are the 10 most common issues and their actionable resolutions.

**1. `ImportError: Please install 'faiss-cpu' to use FAISSVectorStore.`**
*   **Resolution**: You are missing the FAISS dependency. Run `pip install faiss-cpu` and try to contain your excitement.

**2. `ImportError: Please install the 'openai' package...`**
*   **Resolution**: The OpenAI SDK is required for `OpenAIGenerator`. Run `pip install openai`.

**3. The pipeline returns "No supporting knowledge found for query".**
*   **Resolution**: Your retrieval `min_score` might be too optimistic, or your graph is a ghost town. Lower `min_score` to `0.0` to debug.

**4. `openai.AuthenticationError: Incorrect API key provided.`**
*   **Resolution**: Double-check your environment variables. `export OPENAI_API_KEY="sk-..."`.

**5. `RuntimeError: asyncio.run() cannot be called from a running event loop.`**
*   **Resolution**: If you are in a Jupyter Notebook, the event loop is already active. Use `await pipeline.run_async(query)` directly.

**6. PageRankReasoner is taking its time.**
*   **Resolution**: Your graph might be too interconnected. Reduce `max_hops` or switch to `RuleBasedReasoner` for a more focused search.

**7. `PydanticValidationError: 1 validation error for KnowledgeNode`**
*   **Resolution**: Use keyword arguments for initialization. Pydantic appreciates the clarity.

**8. How do I see what the pipeline is doing under the hood?**
*   **Resolution**: Use the `LoggingCallbackHandler`. It's like an X-ray for your pipeline.

**9. My custom Generator isn't working with streaming.**
*   **Resolution**: Ensure you've implemented `stream()` and `astream()` methods. Generators should actually generate.

**10. Tests fail with `ModuleNotFoundError: No module named 'agentic_kag'`**
*   **Resolution**: Your Python path is feeling neglected. Run `export PYTHONPATH=$PYTHONPATH:$(pwd)` from the project root.

---

## Testing

85+ tests and growing. Run them all or pick your battles:

```bash
# Run everything (grab a coffee)
pytest

# Run with coverage (grab two coffees)
pytest --cov=agentic_kag --cov-report=html

# Pick your battles
pytest tests/unit/test_core.py              # Pipeline orchestrator
pytest tests/unit/test_reasoning.py         # Graph reasoning strategies
pytest tests/unit/test_retrieval.py         # Search & retrieval modules
pytest tests/unit/test_graph.py             # Knowledge graph operations
pytest tests/unit/test_ingestion.py         # Document processing
pytest tests/unit/test_providers_functional.py # Provider integrations
pytest tests/unit/test_security.py          # Access control
```

## Intellectual Property Protection

This library supports targeted code obfuscation to protect core business logic while maintaining API usability.


## Contributing

We don't bite. Contributions are welcome — from typo fixes to new vector store backends.

```bash
git clone https://github.com/albanmaxhuni/ai-prishtina-agentic-kag.git
cd ai-prishtina-agentic-kag
pip install -e .[dev]
pytest  # Make sure everything passes before you start
```

**Code quality:** Black + isort (formatting), Ruff (linting), mypy (type checking), pytest (tests).

## License

**Dual-licensed:**

- **AGPLv3+** — Free for open source. Copyleft applies. Network use requires source disclosure.
- **Commercial** — For proprietary use without copyleft obligations. Contact info@albanmaxhuni.com or alban.q.maxhuni@gmail.com

See the [LICENSE](LICENSE) for complete details.

---

## Contact and links

| Resource | URL |
|----------|-----|
| Documentation | [ai-prishtina-agentic-kag.readthedocs.io](https://ai-prishtina-agentic-kag.readthedocs.io) |
| Issue tracker | [GitHub Issues](https://github.com/albanmaxhuni/ai-prishtina-agentic-kag/issues) |
| Discussions | [GitHub Discussions](https://github.com/albanmaxhuni/ai-prishtina-agentic-kag/discussions) |
| Email | info@albanmaxhuni.com |

Maintained by the **AI Prishtina** project. Built on the shoulders of open-source giants and published KAG research.

**Sponsor ongoing development:**

· [coff.ee/albanmaxhuni](https://coff.ee/albanmaxhuni)

`or`

· **BTC:** `3BfwQJ2dNTWDn98H5SggNC47fNX8HeWshP`

