Metadata-Version: 2.4
Name: hyperextract
Version: 0.2.0
Summary: An intelligent, LLM-powered knowledge extraction and evolution framework with semantic search capabilities
Project-URL: Homepage, https://github.com/yifanfeng97/hyper-extract
Project-URL: Repository, https://github.com/yifanfeng97/hyper-extract
Project-URL: Issues, https://github.com/yifanfeng97/hyper-extract/issues
Author-email: Yifan Feng <evanfeng97@gmail.com>
License: Apache-2.0
License-File: LICENSE
Keywords: faiss,knowledge-extraction,langchain,llm,nlp,pydantic,rag,semantic-search
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: >=3.11
Requires-Dist: faiss-cpu>=1.13.2
Requires-Dist: langchain-community>=0.4.1
Requires-Dist: langchain-openai>=1.1.7
Requires-Dist: langchain>=1.2.6
Requires-Dist: ontomem>=0.2.3
Requires-Dist: ontosight>=0.1.8
Requires-Dist: python-dotenv>=1.2.1
Requires-Dist: rich>=13.7.0
Requires-Dist: semhash>=0.4.1
Requires-Dist: structlog>=25.5.0
Requires-Dist: tomli-w>=1.0.0
Requires-Dist: typer>=0.13.0
Provides-Extra: all
Requires-Dist: langchain-anthropic>=0.3.0; extra == 'all'
Requires-Dist: langchain-google-genai>=2.1.0; extra == 'all'
Provides-Extra: anthropic
Requires-Dist: langchain-anthropic>=0.3.0; extra == 'anthropic'
Provides-Extra: google
Requires-Dist: langchain-google-genai>=2.1.0; extra == 'google'
Description-Content-Type: text/markdown

<div align="center">

<a href="https://yifanfeng97.github.io/Hyper-Extract/latest/">
<picture>
  <source media="(prefers-color-scheme: dark)" srcset="docs/assets/logo/logo-horizontal-dark.svg">
  <source media="(prefers-color-scheme: light)" srcset="docs/assets/logo/logo-horizontal.svg">
  <img alt="Hyper-Extract Logo" src="docs/assets/logo/logo-horizontal.svg" width="600">
</picture>
</a>

<br/>
<br/>

**Smart Knowledge Extraction CLI**

**Transform documents into structured knowledge with one command.**

[📖 English Version](./README.md) · [中文版](./README_ZH.md)

[![PyPI Version](https://img.shields.io/pypi/v/hyperextract)](https://pypi.org/project/hyperextract/)
[![Python Version](https://img.shields.io/badge/python-3.11%2B-blue)](https://python.org)
[![License](https://img.shields.io/badge/license-Apache%202.0-blue)](LICENSE)
[![Status](https://img.shields.io/badge/status-active-success)]()
[![Docs](https://img.shields.io/badge/docs-online-blue)](https://yifanfeng97.github.io/Hyper-Extract/latest/)

<br/>

> **"Stop reading. Start understanding."**  
> *"告别文档焦虑，让信息一目了然"*

<br/>

<img src="docs/assets/hero.jpg" alt="Hero & Workflow" width="800" style="max-width: 100%;">

<br/>
</div>

Hyper-Extract is an intelligent, LLM-powered knowledge extraction and evolution framework. It radically simplifies transforming highly unstructured texts into persistent, predictable, and strongly-typed **Knowledge Abstracts**. It effortlessly extracts information into a wide spectrum of formats—ranging from simple **Collections** (Lists/Sets) and **Pydantic Models**, to complex **Knowledge Graphs**, **Hypergraphs**, and even **Spatio-Temporal Graphs**.

## ✨ Core Features

- 🔷 **8 Auto-Types:** From basic `AutoModel`/`AutoList` to advanced `AutoGraph`, `AutoHypergraph`, and `AutoSpatioTemporalGraph`.
- 🧠 **10+ Extraction Engines:** Out-of-the-box support for cutting-edge retrieval paradigms like `GraphRAG`, `LightRAG`, `Hyper-RAG`, and `KG-Gen`.
- 📝 **Declarative YAML Templates:** Zero-code extraction definition. Includes 80+ presets across 6 domains.
- 🔄 **Incremental Evolution:** Feed new documents on the fly to continuously map out and expand the extracted knowledge.

***

## ⚡ Quick Start

### 1. Installation

**For CLI Users** (install `he` command globally):

```bash
uv tool install hyperextract
```

**For Python Developers** (use as library):

```bash
uv pip install hyperextract
```

### 2. The Command Line Way

Extract, search, and manage directly from CLI.

> By default, the CLI uses `gpt-4o-mini` and `text-embedding-3-small`.

```bash
# Configure OpenAI API Key
he config init -k YOUR_OPENAI_API_KEY

# Extract knowledge
he parse examples/en/tesla.md -t general/biography_graph -o ./output/ -l en

# Query the knowledge abstract
he search ./output/ "What are Tesla's major achievements?"

# Visualize the knowledge graph
he show ./output/

# Incrementally supplement knowledge
he feed ./output/ examples/en/tesla_question.md

# Show the updated knowledge graph
he show ./output/
```

<details>
<summary><b>🐍 The Python API Way</b> (click to expand)</summary>
<br>

### Installation

```bash
# Clone the repository
git clone https://github.com/yifanfeng97/hyper-extract.git
cd hyper-extract

# Install dependencies
uv sync
```

### Configuration

```bash
# Copy the example env file
cp .env.example .env

# Edit .env with your API key and base URL
# OPENAI_API_KEY=your-api-key
# OPENAI_BASE_URL=https://api.openai.com/v1
```

### Usage

```python
import os
from dotenv import load_dotenv

# Load environment variables from .env file
load_dotenv()

from hyperextract import Template

# Create a template
ka = Template.create("general/biography_graph")

# Parse a document
with open("examples/en/tesla.md", "r", encoding="utf-8") as f:
    text = f.read()
result = ka.parse(text)

# Visualize the knowledge graph
ka.show(result)

# Incrementally supplement knowledge
with open("examples/en/tesla_question.md", "r", encoding="utf-8") as f:
    new_text = f.read()
ka.feed(result, new_text)

# Show the updated knowledge graph
ka.show(result)
```

> 🔗 For complete examples, see [examples/en](./examples/en/)

</details>

<br>

**Installation Comparison:**

| Use Case | Command | Purpose |
|----------|---------|---------|
| CLI Tool | `uv tool install hyperextract` | Install `he` command globally |
| Python Library | `uv pip install hyperextract` | Use in Python code |

## 🧩 Deep Dive: The 8 Auto-Types

Our framework embraces complexity without making you write boilerplate code.

<img src="docs/assets/autotypes.jpg" alt="Knowledge Structures Matrix" width="750" style="max-width: 100%;">

### Example: AutoGraph Visualization

Here is the knowledge graph visualization after `AutoGraph` extraction:

<img src="docs/assets/en_show.jpg" alt="AutoGraph Visualization" width="750" style="max-width: 100%; border-radius: 8px; box-shadow: 0 4px 12px rgba(0,0,0,0.1);">

## 🛠️ Architecture Overview

Hyper-Extract follows a **three-layer architecture**:

- **Auto-Types** define the data structures for knowledge extraction. With 8 strong-typed structures (AutoModel, AutoList, AutoSet, AutoGraph, AutoHypergraph, AutoTemporalGraph, AutoSpatialGraph, AutoSpatioTemporalGraph), they serve as the output format for all extractions.

- **Methods** provide extraction algorithms built on Auto-Types. This includes Typical methods (KG-Gen, iText2KG, iText2KG*) and RAG-based methods (GraphRAG, LightRAG, Hyper-RAG, HypergraphRAG, Cog-RAG).

- **Templates** offer domain-specific configurations with ready-to-use prompts and data structures. Covering 6 domains (Finance, Legal, Medical, TCM, Industry, General) with 80+ preset templates, users can extract knowledge without dealing with Auto-Types or Methods directly.

Use via **CLI** (`he parse`, `he search`, `he show`...) or **Python API** (`Template.create()`).

<img src="docs/assets/arch.jpg" alt="Architecture" width="750" style="max-width: 100%;">

### 📚 Related Documentation

- **Preset Templates**: Browse [80+ ready-to-use templates](./hyperextract/templates/presets/) across 6 domains
- **Design Guide**: Learn how to [create custom templates](./hyperextract/templates/DESIGN_GUIDE.md)

<details>
<summary><b>📋 Template Structure Example (Graph Type)</b></summary>

Here's a complete YAML template example for **Graph** type extraction (entity-relationship extraction):

```yaml
language: en

name: Knowledge Graph
type: graph
tags: [general]

description: 'Extract entities and their relationships to construct a knowledge graph.'

output:
  entities:
    fields:
    - name: name
      type: str
      description: 'Entity name'
    - name: type
      type: str
      description: 'Entity type: e.g., person, organization, event'
    - name: description
      type: str
      description: 'Entity description'
  relations:
    fields:
    - name: source
      type: str
      description: 'Source entity'
    - name: target
      type: str
      description: 'Target entity'
    - name: type
      type: str
      description: 'Relation type: e.g., invention, collaboration, competition'
    - name: description
      type: str
      description: 'Relation description'

guideline:
  target: 'Extract entities and their relationships from the text.'
  rules_for_entities:
    - 'Extract meaningful entities'
    - 'Maintain consistent naming'
  rules_for_relations:
    - 'Create relations only when explicitly expressed in the text'

identifiers:
  entity_id: name
  relation_id: '{source}|{type}|{target}'
  relation_members:
    source: source
    target: target

display:
  entity_label: '{name} ({type})'
  relation_label: '{type}'
```

</details>

## 📈 Comparison with Other Libraries

| Feature          | GraphRAG | LightRAG | KG-Gen | ATOM | **Hyper-Extract** |
| ---------------- | :------: | :------: | :----: | :--: | :---------------: |
| Knowledge Graph  |     ✅    |     ✅    |    ✅   |   ✅  |         ✅         |
| Temporal Graph   |     ✅    |     ❌    |    ❌   |   ✅  |         ✅         |
| Spatial Graph    |     ❌    |     ❌    |    ❌   |   ❌  |         ✅         |
| Hypergraph       |     ❌    |     ❌    |    ❌   |   ❌  |         ✅         |
| Domain Templates |     ❌    |     ❌    |    ❌   |   ❌  |         ✅         |
| CLI Tool         |     ✅    |     ❌    |    ❌   |   ❌  |         ✅         |
| Multi-language   |     ✅    |     ❌    |    ❌   |   ❌  |         ✅         |

## 🤖 Model Compatibility

Hyper-Extract relies on the model's structured output capability (`json_schema` or Function Calling).

**Verified compatible**: OpenAI GPT series, Bailian qwen-plus / qwen-turbo, local vLLM (Qwen3.5-9B GPTQ-Marlin), and more.

> For the full model compatibility list, see [Provider System & Local Model Support](https://yifanfeng97.github.io/Hyper-Extract/latest/concepts/provider-system/).

## 📚 Related Documentation

- [Documentation](https://yifanfeng97.github.io/Hyper-Extract/latest/) - Complete documentation site
- [CLI Guide](https://yifanfeng97.github.io/Hyper-Extract/latest/cli/) - Command-line interface
- [Template Gallery](./hyperextract/templates/presets/) - Available templates
- [Example Code](./examples/) - Working examples

## 🤝 Contributing & License

Contributions are welcome! Please submit Issues and PRs.
Licensed under **Apache-2.0**.

## ⭐ Star History

[![Star History Chart](https://api.star-history.com/svg?repos=yifanfeng97/Hyper-Extract&type=Date)](https://star-history.com/#yifanfeng97/Hyper-Extract&Date)
