Metadata-Version: 2.4
Name: langchain-iceberg
Version: 0.1.2
Summary: LangChain integration for Apache Iceberg with native PyIceberg API support
Author: Vipin Kataria
License-Expression: Apache-2.0
Project-URL: Homepage, https://github.com/langchain-ai/langchain-iceberg
Project-URL: Documentation, https://github.com/langchain-ai/langchain-iceberg#readme
Project-URL: Repository, https://github.com/langchain-ai/langchain-iceberg
Project-URL: Issues, https://github.com/langchain-ai/langchain-iceberg/issues
Keywords: langchain,iceberg,data-lake,sql,ai,llm
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Topic :: Database
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: langchain-core>=0.3.0
Requires-Dist: pyiceberg>=0.7.0
Requires-Dist: pydantic>=2.0
Requires-Dist: pandas>=2.0
Requires-Dist: pyarrow>=14.0
Requires-Dist: pyyaml>=6.0
Requires-Dist: duckdb>=0.10.0
Requires-Dist: sqlparse>=0.4.0
Requires-Dist: sqlalchemy>=2.0.0
Requires-Dist: langchain-openai>=1.1.6
Requires-Dist: langchain>=1.2.0
Requires-Dist: langchainhub>=0.1.21
Provides-Extra: semantic
Requires-Dist: pyyaml>=6.0; extra == "semantic"
Provides-Extra: redis
Requires-Dist: redis>=5.0; extra == "redis"
Provides-Extra: dev
Requires-Dist: langchain-tests>=0.0.1; extra == "dev"
Requires-Dist: pytest>=7.4.0; extra == "dev"
Requires-Dist: pytest-cov>=4.1.0; extra == "dev"
Requires-Dist: pytest-asyncio>=0.21.0; extra == "dev"
Requires-Dist: black>=23.0.0; extra == "dev"
Requires-Dist: ruff>=0.1.0; extra == "dev"
Requires-Dist: mypy>=1.5.0; extra == "dev"
Requires-Dist: types-pyyaml>=6.0; extra == "dev"
Provides-Extra: test
Requires-Dist: langchain-tests>=0.0.1; extra == "test"
Requires-Dist: pytest>=7.4.0; extra == "test"
Requires-Dist: pytest-cov>=4.1.0; extra == "test"
Requires-Dist: pytest-asyncio>=0.21.0; extra == "test"
Requires-Dist: pytest-socket>=0.7.0; extra == "test"
Provides-Extra: test-integration
Requires-Dist: pytest-xdist>=3.0.0; extra == "test-integration"
Dynamic: license-file

# LangChain Iceberg Toolkit

[![PyPI version](https://badge.fury.io/py/langchain-iceberg.svg)](https://badge.fury.io/py/langchain-iceberg)
[![Python 3.10+](https://img.shields.io/badge/python-3.10+-blue.svg)](https://www.python.org/downloads/)
[![License: Apache 2.0](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)

A native LangChain integration for Apache Iceberg that enables AI-powered natural language queries over your data lakes. Built with PyIceberg for direct API access (not SQL strings) and featuring Iceberg-specific capabilities like time-travel, snapshots, and partition-aware queries.

## Features

- 🚀 **Native PyIceberg Integration** - Direct API access, not SQL strings
- 🔍 **Iceberg-Specific Tools** - Snapshots, time-travel, partition-aware queries
- 📊 **Optional Semantic Layer** - YAML-driven metrics and dimensions
- 💬 **Zero SQL Required** - Natural language to Iceberg queries
- 🏢 **Enterprise-Ready** - Query limits and timeout protection

## Installation

### Using pip (standard)

```bash
pip install langchain-iceberg
```

For semantic layer support:

```bash
pip install langchain-iceberg[semantic]
```

### Using uv (recommended by LangChain)

[uv](https://github.com/astral-sh/uv) is a fast Python package installer recommended by LangChain:

```bash
# Install uv
curl -LsSf https://astral.sh/uv/install.sh | sh

# Install package
uv pip install langchain-iceberg

# With semantic layer
uv pip install "langchain-iceberg[semantic]"
```

See [INSTALL_WITH_UV.md](INSTALL_WITH_UV.md) for more details.

## Quick Start

### Basic Usage

```python
from langchain_iceberg import IcebergToolkit
from langchain_openai import ChatOpenAI
from langchain.agents import create_react_agent

# Initialize toolkit
toolkit = IcebergToolkit(
    catalog_name="prod",
    catalog_config={
        "type": "rest",
        "uri": "http://localhost:8181",
        "warehouse": "s3://my-warehouse"
    }
)

# Get tools
tools = toolkit.get_tools()

# Create agent
llm = ChatOpenAI(model="gpt-4")
agent = create_react_agent(llm, tools)

# Query with natural language
result = agent.invoke({
    "input": "Show me the top 10 orders by amount from the sales.orders table"
})

print(result)
```

### Direct Tool Usage

```python
from langchain_iceberg import IcebergToolkit

toolkit = IcebergToolkit(
    catalog_name="rest",
    catalog_config={
        "type": "rest",
        "uri": "http://localhost:8181",
        "warehouse": "s3://warehouse/wh/"
    }
)

tools = toolkit.get_tools()

# Use tools directly
list_ns = next(t for t in tools if t.name == "iceberg_list_namespaces")
namespaces = list_ns.run({})
print(namespaces)

query = next(t for t in tools if t.name == "iceberg_query")
results = query.run({
    "table_id": "test.orders",
    "filters": "status = 'completed'",
    "limit": 10
})
print(results)
```

### With Semantic Layer

```python
# Load semantic YAML for business-friendly metrics
toolkit = IcebergToolkit(
    catalog_name="prod",
    catalog_config={...},
    semantic_yaml="s3://bucket/semantic.yaml"
)

tools = toolkit.get_tools()
# Now includes auto-generated metric tools like get_total_revenue, get_order_count, etc.

agent = create_react_agent(llm, tools)

# Business question (no SQL needed!)
result = agent.invoke({
    "input": "What was Q4 2024 revenue by customer segment?"
})
```

### Time-Travel Queries

```python
# Query historical data
result = agent.invoke({
    "input": "Compare this month's revenue to the same period last year using time-travel"
})
```

## Available Tools

The toolkit provides the following tools:

### Catalog Exploration
- `iceberg_list_namespaces` - List all namespaces in the catalog
- `iceberg_list_tables` - List tables in a namespace
- `iceberg_get_schema` - Get table schema with sample data

### Query Execution
- `iceberg_query` - Execute queries with filters and column selection
- `iceberg_plan_query` - LLM-assisted query planning

### Time-Travel (Iceberg-Specific)
- `iceberg_snapshots` - List table snapshots
- `iceberg_time_travel` - Query data at a specific point in time

### Semantic Layer (Auto-Generated)
- `get_{metric_name}` - Auto-generated tools from YAML metrics

## Documentation

- [Installation Guide](docs/installation.md)
- [Quick Start Guide](docs/quickstart.md)
- [Semantic Layer Guide](docs/semantic_layer.md)
- [API Reference](docs/api_reference.md)

## Requirements

- Python 3.10+
- Apache Iceberg catalog (REST, Hive, Glue, or Nessie)
- Cloud storage (S3, ADLS, or GCS)

## Contributing

Contributions are welcome! Please see our [Contributing Guide](CONTRIBUTING.md) for details.

## License

Apache 2.0 License - see [LICENSE](LICENSE) file for details.

## Support

- GitHub Issues: [Report a bug or request a feature](https://github.com/langchain-ai/langchain-iceberg/issues)
- Documentation: [Full documentation](https://github.com/langchain-ai/langchain-iceberg#readme)

## Roadmap

- [x] Core toolkit with catalog exploration
- [x] Query execution tools
- [ ] Time-travel and snapshot tools
- [ ] Semantic layer with YAML support
- [ ] Governance features (access control, PII protection) - Planned for future release
- [ ] Query planner tool

