Metadata-Version: 2.4
Name: datahub-agent-context
Version: 1.5.0.19rc5
Summary: DataHub Agent Context - MCP Tools for AI Agents
Home-page: https://datahub.io/
License: Apache License 2.0
Project-URL: Documentation, https://datahubproject.io/docs/
Project-URL: Source, https://github.com/datahub-project/datahub
Project-URL: Changelog, https://github.com/datahub-project/datahub/releases
Classifier: Development Status :: 5 - Production/Stable
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3 :: Only
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Information Technology
Classifier: Intended Audience :: System Administrators
Classifier: License :: OSI Approved
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Operating System :: Unix
Classifier: Operating System :: POSIX :: Linux
Classifier: Environment :: Console
Classifier: Environment :: MacOS X
Classifier: Topic :: Software Development
Requires-Python: >=3.9
Description-Content-Type: text/markdown
Requires-Dist: pydantic<3.0.0,>=2.0.0
Requires-Dist: google-re2<2.0,>=1.0
Requires-Dist: cachetools<7.0.0,>=5.0.0
Requires-Dist: jmespath<2.0.0,>=1.0.0
Requires-Dist: acryl-datahub[datahub-rest]==1.5.0.19rc5
Requires-Dist: httpcore<2.0,>=1.0.9
Requires-Dist: h11<1.0,>=0.16
Requires-Dist: json-repair<1.0.0,>=0.25.0
Provides-Extra: dev
Requires-Dist: types-jmespath<2.0.0,>=1.0.0; extra == "dev"
Requires-Dist: snowflake-connector-python<5.0.0,>=4.0.0; extra == "dev"
Requires-Dist: pytest<9.0.0,>=8.3.4; extra == "dev"
Requires-Dist: mypy==1.17.1; extra == "dev"
Requires-Dist: types-toml<1.0.0,>=0.10.0; extra == "dev"
Requires-Dist: types-requests<3.0.0,>=2.0.0; extra == "dev"
Requires-Dist: langchain-core<2.0.0,>=1.0.0; extra == "dev"
Requires-Dist: tox<5.0.0,>=4.0.0; extra == "dev"
Requires-Dist: ruff==0.11.7; extra == "dev"
Requires-Dist: types-PyYAML<7.0.0,>=6.0.0; extra == "dev"
Requires-Dist: langchain-mcp-adapters<1.0.0,>=0.1.0; extra == "dev"
Requires-Dist: google-adk<2.0.0,>=1.0.0; extra == "dev"
Requires-Dist: pytest-cov<7.0.0,>=2.8.0; extra == "dev"
Requires-Dist: click<9.0.0,>=8.0.0; extra == "dev"
Requires-Dist: types-cachetools<7.0.0,>=5.0.0; extra == "dev"
Requires-Dist: langchain<2.0.0,>=1.0.0; extra == "dev"
Provides-Extra: langchain
Requires-Dist: langchain-mcp-adapters<1.0.0,>=0.1.0; extra == "langchain"
Requires-Dist: langchain-core<2.0.0,>=1.0.0; extra == "langchain"
Requires-Dist: langchain<2.0.0,>=1.0.0; extra == "langchain"
Provides-Extra: google-adk
Requires-Dist: google-adk<2.0.0,>=1.0.0; extra == "google-adk"
Provides-Extra: snowflake
Requires-Dist: snowflake-connector-python<5.0.0,>=4.0.0; extra == "snowflake"
Requires-Dist: click<9.0.0,>=8.0.0; extra == "snowflake"
Dynamic: classifier
Dynamic: description
Dynamic: description-content-type
Dynamic: home-page
Dynamic: license
Dynamic: project-url
Dynamic: provides-extra
Dynamic: requires-dist
Dynamic: requires-python
Dynamic: summary

# DataHub Agent Context

**DataHub Agent Context** provides a collection of tools and utilities for building AI agents that interact with DataHub metadata. This package contains MCP (Model Context Protocol) tools that enable AI agents to search, retrieve, and manipulate metadata in DataHub. These can be used directly to create an agent, or be included in an MCP server such as Datahub's open source MCP server.

## Features

## Installation

### Base Installation

```shell
python3 -m pip install --upgrade pip wheel setuptools
python3 -m pip install --upgrade datahub-agent-context
```

### With LangChain Support

For building LangChain agents with pre-built tools:

```shell
python3 -m pip install --upgrade "datahub-agent-context[langchain]"
```

## Prerequisites

This package requires:

- Python 3.9 or higher
- `acryl-datahub` package

## Quick Start

### Basic Example

These tools are designed to be used with an AI agent and have the responses passed directly to an LLM, so the return schema is a simple dict, but they can be used independently if desired.

```python
from datahub.ingestion.graph.client import DataHubGraph
from datahub_agent_context.mcp_tools.search import search
from datahub_agent_context.mcp_tools.entities import get_entities

# Initialize DataHub graph client
client = DataHubClient.from_env()

# Search for datasets
with client as client:
    results = search(
        query="user_data",
        filters={"entity_type": ["dataset"]},
        num_results=10
    )

# Get detailed entity information
with client as client:
    entities = get_entities(
        urns=[result["entity"]["urn"] for result in results["searchResults"]]
    )
```

### LangChain Integration

Build AI agents with pre-built LangChain tools:

```python
from datahub.sdk.main_client import DataHubClient
from datahub_agent_context.langchain_tools import build_langchain_tools

# Initialize DataHub client
client = DataHubClient.from_env()

# Build all tools (read-only by default)
tools = build_langchain_tools(client, include_mutations=False)

# Or include mutation tools for tagging, descriptions, etc.
tools = build_langchain_tools(client, include_mutations=True)

# Create agent
agent = create_agent(model, tools=tools, system_prompt="...")
```

#### DataHub Cloud Tools

If you're connected to a **DataHub Cloud** instance, you can add Cloud-only tools
like Ask DataHub (AI-powered data assistant):

```python
from datahub_agent_context.langchain_tools import build_langchain_tools, build_langchain_cloud_tools

client = DataHubClient.from_env()

# Base tools (works on any DataHub instance)
tools = build_langchain_tools(client, include_mutations=True)

# Add Cloud-only tools (requires DataHub Cloud)
tools += build_langchain_cloud_tools(client, ask_datahub=True)
```

The same pattern works for Google ADK:

```python
from datahub_agent_context.google_adk_tools import build_google_adk_tools, build_google_adk_cloud_tools

tools = build_google_adk_tools(client, include_mutations=True)
tools += build_google_adk_cloud_tools(client, ask_datahub=True)
```

**See [examples/langchain/](examples/langchain/)** for complete LangChain agent examples including:

- [simple_search.py](examples/langchain/simple_search.py) - Minimal example with AWS Bedrock

### Available Tools

#### Search Tools

- `search()` - Search across all entity types with filters and sorting
- `search_documents()` - Search specifically for Document entities
- `grep_documents()` - Grep for patterns in document content

#### Entity Tools

- `get_entities()` - Get detailed information about entities by URN
- `list_schema_fields()` - List and filter schema fields for datasets

#### Lineage Tools

- `get_lineage()` - Get upstream or downstream lineage
- `get_lineage_paths_between()` - Get detailed paths between two entities

#### Query Tools

- `get_dataset_queries()` - Get SQL queries for datasets or columns

#### Mutation Tools

- `add_tags()`, `remove_tags()` - Manage tags
- `update_description()` - Update entity descriptions
- `set_domains()`, `remove_domains()` - Manage domains
- `add_owners()`, `remove_owners()` - Manage owners
- `add_glossary_terms()`, `remove_glossary_terms()` - Manage glossary terms
- `add_structured_properties()`, `remove_structured_properties()` - Manage structured properties
- `save_document()` - Save or update a Document.

#### User Tools

- `get_me()` - Get information about the authenticated user

#### Cloud-Only Tools (DataHub Cloud)

- `ask_datahub_chat()` - Ask the DataHub AI assistant a question about your data catalog
- `get_datahub_chat()` - Retrieve messages and status from an Ask DataHub conversation

## Architecture

The package is organized into the following modules:

- `mcp_tools/` - Core MCP tool implementations
  - `base.py` - Base GraphQL execution and response cleaning
  - `search.py` - Search functionality
  - `documents.py` - Document search and grep
  - `entities.py` - Entity retrieval
  - `lineage.py` - Lineage querying
  - `queries.py` - Query retrieval
  - `tags.py`, `descriptions.py`, `domains.py`, etc. - Mutation tools
  - `helpers.py` - Shared utility functions
  - `gql/` - GraphQL query definitions

## Development

### Setup

```shell
# Clone the repository
git clone https://github.com/datahub-project/datahub.git
cd datahub/datahub-agent-context

# Set up development environment
./gradlew :datahub-agent-context:installDev

# Run tests
./gradlew :datahub-agent-context:testFull

# Run linting
./gradlew :datahub-agent-context:lintFix
```

### Testing

The package includes comprehensive unit tests for all tools:

```shell
# Run full test suite
./gradlew :datahub-agent-context:testFull
```

## Support

- [Documentation](https://datahubproject.io/docs/)
- [Slack Community](https://datahub.com/slack)
- [GitHub Issues](https://github.com/datahub-project/datahub/issues)
