Metadata-Version: 2.4
Name: causaliq-knowledge
Version: 0.5.0
Summary: Incorporating LLM and human knowledge into causal discovery
Author-email: CausalIQ <info@causaliq.com>
Maintainer-email: CausalIQ <info@causaliq.com>
License-Expression: MIT
Project-URL: Homepage, https://github.com/causaliq/causaliq-knowledge
Project-URL: Documentation, https://github.com/causaliq/causaliq-knowledge#readme
Project-URL: Repository, https://github.com/causaliq/causaliq-knowledge
Project-URL: Bug Tracker, https://github.com/causaliq/causaliq-knowledge/issues
Keywords: causaliq
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Science/Research
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Scientific/Engineering
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: >=3.9
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: causaliq-workflow>=0.2.0
Requires-Dist: click>=8.0.0
Requires-Dist: httpx>=0.24.0
Requires-Dist: pydantic>=2.0.0
Provides-Extra: dev
Requires-Dist: causaliq-core>=0.4.0; extra == "dev"
Requires-Dist: pytest>=7.0.0; extra == "dev"
Requires-Dist: pytest-cov>=4.0.0; extra == "dev"
Requires-Dist: pytest-mock>=3.10.0; extra == "dev"
Requires-Dist: black<26.0.0,>=25.0.0; extra == "dev"
Requires-Dist: isort>=5.10.0; extra == "dev"
Requires-Dist: flake8>=5.0.0; extra == "dev"
Requires-Dist: mypy>=1.0.0; extra == "dev"
Requires-Dist: types-requests>=2.32.0; extra == "dev"
Requires-Dist: pre-commit>=2.20.0; extra == "dev"
Requires-Dist: build>=0.8.0; extra == "dev"
Requires-Dist: twine>=4.0.0; extra == "dev"
Provides-Extra: test
Requires-Dist: causaliq-core>=0.4.0; extra == "test"
Requires-Dist: causaliq-workflow>=0.2.0; extra == "test"
Requires-Dist: pytest>=7.0.0; extra == "test"
Requires-Dist: pytest-cov>=4.0.0; extra == "test"
Requires-Dist: pytest-mock>=3.10.0; extra == "test"
Provides-Extra: docs
Requires-Dist: mkdocs>=1.5.0; extra == "docs"
Requires-Dist: mkdocs-material>=9.0.0; extra == "docs"
Requires-Dist: mkdocstrings==0.30.1; extra == "docs"
Requires-Dist: mkdocstrings-python==1.18.2; extra == "docs"
Dynamic: license-file

# causaliq-knowledge

![Python Versions](https://img.shields.io/badge/python-3.9%20%7C%203.10%20%7C%203.11%20%7C%203.12%20%7C%203.13-blue)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
![Coverage](https://img.shields.io/badge/coverage-100%25-brightgreen)

The CausalIQ Knowledge project represents a novel approach to causal discovery by combining the traditional statistical structure learning algorithms with the contextual understanding and reasoning capabilities of Large Language Models. This integration enables more interpretable, domain-aware, and human-friendly causal discovery workflows. It is part of the [CausalIQ ecosystem](https://causaliq.org/) for intelligent causal discovery.

## Status

🚧 **Active Development** - this repository is currently in active development, which involves:

- Adding new knowledge features, in particular knowledge from LLMs
- Migrating functionality which provides knowledge based on standard reference networks from the legacy monolithic discovery repo
- Ensuring CausalIQ development standards are met


## Quick Start

```python
from causaliq_knowledge.llm import LLMKnowledge

# Query an LLM about a potential causal relationship
knowledge = LLMKnowledge(models=["groq/llama-3.1-8b-instant"])
result = knowledge.query_edge("smoking", "lung_cancer")

print(f"Exists: {result.exists}, Direction: {result.direction}")
print(f"Confidence: {result.confidence}")
print(f"Reasoning: {result.reasoning}")
```

## Features

## ✅ Currently implemented releases:

- **v0.1.0 - Foundation LLM** [January 2026]: Foundation release establishing LLM client infrastructure for causal graph
generation.

- **v0.2.0 - Additional LLMs** [January 2026]: Expanded LLM provider support from 2 to 7 providers.

- **v0.3.0 - LLM Caching** [January 2026]: SQLite-based response caching with CLI tools for cache management.

- **v0.4.0 - Graph Generation** [February 2026]: CLI tools and CausalIQ workflows for LLM-generated causal graphs.

- **v0.5.0 - Workflow Integration** [February 2026]: Integration into CausalIQ Workflows including writing results to cache.

## 🛣️ Upcoming Releases (speculative)

- **Release v0.6.0 - Statistical Fusion**: Support knowledge requirements for fusing LLM knowledge and statistical graph averaging.

- **Release v0.7.0 - LLM Provider Cost Tracking**: Query LLM provider APIs for usage and cost statistics.

- **Release v0.8.0 - Enhanced LLM Context**: Background literature supplied to LLMs

- **Release v0.9.0 - Legacy Reference**: Support for deriving knowledge from reference networks and migration of functionality from legacy discovery repo

## Implementation Approach

### Technology Stack

- **Vendor-Specific API Clients**: Direct integration with LLM providers using httpx
- **[Pydantic](https://docs.pydantic.dev/)**: Structured response validation
- **[Click](https://click.palletsprojects.com/)**: Command-line interface

### Why Vendor-Specific APIs (not LiteLLM/LangChain)?

We use **direct vendor-specific API clients** rather than wrapper libraries:

| Aspect | Direct APIs | Wrapper Libraries |
|--------|-------------|-------------------|
| Reliability | ✅ Full control | ❌ Wrapper bugs |
| Dependencies | ✅ Minimal (httpx) | ❌ Heavy (~50-100MB) |
| Debugging | ✅ Clear traces | ❌ Abstraction layers |
| Maintenance | ✅ We control | ❌ Wait for updates |

This approach keeps the package lightweight, reliable, and easy to debug.

### Supported LLM Providers

| Provider | Client | Models | Free Tier |
|----------|--------|--------|-----------|
| **Groq** | `GroqClient` | llama-3.1-8b-instant | ✅ Generous |
| **Google Gemini** | `GeminiClient` | gemini-2.5-flash | ✅ Generous |
| **OpenAI** | `OpenAIClient` | gpt-4o-mini | ❌ Paid |
| **Anthropic** | `AnthropicClient` | claude-sonnet-4-20250514 | ❌ Paid |
| **DeepSeek** | `DeepSeekClient` | deepseek-chat | ✅ Low cost |
| **Mistral** | `MistralClient` | mistral-small-latest | ❌ Paid |
| **Ollama** | `OllamaClient` | llama3 | ✅ Free (local) |

## Upcoming Key Innovations

### 🧠 LLMs support Causal Discovery and Inference

- Initially LLM will work with **graph averaging** to resolve uncertain edges (use entropy to decide edges with uncertain existence or direction)
- Integration into **structure learning** algorithms to provide knowledge for "uncertain" areas of the graph
- LLMs analyse learning process and errors to **suggest improved algorithms**
- LLMs used to preprocess **text and visual data** so they can be used as inputs to structure learning

### 🤝 Human Engagement

- **Natural language constraints**: Specify domain knowledge in plain English
- **Expert knowledge incorporation** by converting expert understanding into algorithmic constraints
- LLMs convert **natural language questions** to causal queries
- **Interactive causal discovery** where structure learning or LLMs identify areas of causal uncertainty and can test causal hypotheses through dialogue

### 🪟 Transparency and interpretability

- LLMs **interpret structure learning process** and outputs, including their uncertainties
- LLMs **interpret causal inference** results including uncertainties
- **Contextual graph interpretation** to explain variable meanings and relationships
- **Uncertainty communication** with clear explanation of confidence levels and limitations
- **Report generation** including automated research summaries and methodology descriptions

### 🔒 Stability and reproducibility

- **Cache queries and responses** so that experiments are stable and repeatable even if LLMs themselves are not
- **Stable randomisation** of e.g. data sub-sampling

### 💰 Efficient use of LLM resources (important as an independent researcher)

- **Cache queries and results** so that knowledge can be re-used
- Evaluation and development of **simple context-adapted LLMs**


## Upcoming Integration with CausalIQ Ecosystem

- 🔍 CausalIQ Discovery makes use of this package to learn more accurate graphs.
- 🧪 CausalIQ Analysis uses this package to explain the learning process, intelligently combine and explain results.
- 🔮 CausalIQ Predict uses this package to explain predictions made by learnt models.

## Documentation

- [User Guide](docs/userguide/introduction.md) - Getting started
- [Architecture Overview](docs/architecture/overview.md) - Design and components
- [LLM Integration Design](docs/architecture/llm_integration.md) - Detailed LLM design
- [Roadmap](docs/roadmap.md) - Release planning

---

**Supported Python Versions**: 3.9, 3.10, 3.11, 3.12, 3.13  
**Default Python Version**: 3.11
