Metadata-Version: 2.4
Name: linkml-store
Version: 0.3.0rc3
Summary: linkml-store
Author-email: Author 1 <author@org.org>
License-Expression: MIT
License-File: LICENSE
Requires-Python: <4.0,>=3.10
Requires-Dist: click
Requires-Dist: duckdb-engine>=0.11.2
Requires-Dist: duckdb>=0.10.1
Requires-Dist: google-cloud-bigquery
Requires-Dist: jinja2>=3.1.4
Requires-Dist: jsonlines>=4.0.0
Requires-Dist: jsonpatch>=1.33
Requires-Dist: jsonpath-ng
Requires-Dist: linkml-runtime>=1.8.0
Requires-Dist: multipledispatch
Requires-Dist: pandas>=2.2.1
Requires-Dist: pydantic>=2.0.0
Requires-Dist: pymongo>=4.11
Requires-Dist: pystow>=0.5.4
Requires-Dist: python-dotenv>=1.0.1
Requires-Dist: sqlalchemy
Requires-Dist: tabulate
Requires-Dist: xmltodict>=0.13.0
Provides-Extra: all
Requires-Dist: google-cloud-bigquery; extra == 'all'
Requires-Dist: linkml-map; extra == 'all'
Requires-Dist: linkml-renderer; extra == 'all'
Requires-Dist: linkml>=1.8.0; extra == 'all'
Requires-Dist: llm; extra == 'all'
Requires-Dist: neo4j; extra == 'all'
Requires-Dist: networkx; extra == 'all'
Requires-Dist: py2neo; extra == 'all'
Requires-Dist: tiktoken; extra == 'all'
Provides-Extra: analytics
Requires-Dist: matplotlib; extra == 'analytics'
Requires-Dist: pandas; extra == 'analytics'
Requires-Dist: plotly; extra == 'analytics'
Requires-Dist: seaborn; extra == 'analytics'
Provides-Extra: app
Requires-Dist: streamlit>=1.32.2; extra == 'app'
Provides-Extra: bigquery
Requires-Dist: google-cloud-bigquery; extra == 'bigquery'
Provides-Extra: dremio
Requires-Dist: pyarrow; extra == 'dremio'
Provides-Extra: fastapi
Requires-Dist: fastapi; extra == 'fastapi'
Requires-Dist: uvicorn; extra == 'fastapi'
Provides-Extra: frictionless
Requires-Dist: frictionless; extra == 'frictionless'
Provides-Extra: h5py
Requires-Dist: h5py; extra == 'h5py'
Provides-Extra: ibis
Requires-Dist: gcsfs; extra == 'ibis'
Requires-Dist: ibis-framework[duckdb,examples]>=9.3.0; extra == 'ibis'
Requires-Dist: multipledispatch; extra == 'ibis'
Provides-Extra: llm
Requires-Dist: llm; extra == 'llm'
Requires-Dist: tiktoken; extra == 'llm'
Provides-Extra: map
Requires-Dist: linkml-map>=0.3.9; extra == 'map'
Requires-Dist: ucumvert>=0.2.0; extra == 'map'
Provides-Extra: mongodb
Requires-Dist: pymongo; extra == 'mongodb'
Provides-Extra: neo4j
Requires-Dist: neo4j; extra == 'neo4j'
Requires-Dist: networkx; extra == 'neo4j'
Requires-Dist: py2neo; extra == 'neo4j'
Provides-Extra: pyarrow
Requires-Dist: pyarrow; extra == 'pyarrow'
Provides-Extra: pyreadr
Requires-Dist: pyreadr; extra == 'pyreadr'
Provides-Extra: rdf
Requires-Dist: lightrdf; extra == 'rdf'
Provides-Extra: renderer
Requires-Dist: linkml-renderer; extra == 'renderer'
Provides-Extra: scipy
Requires-Dist: scikit-learn; extra == 'scipy'
Requires-Dist: scipy; extra == 'scipy'
Provides-Extra: tests
Requires-Dist: black>=24.0.0; extra == 'tests'
Requires-Dist: ruff>=0.6.2; extra == 'tests'
Provides-Extra: validation
Requires-Dist: linkml>=1.8.0; extra == 'validation'
Description-Content-Type: text/markdown

# linkml-store

An AI-ready data management and integration platform. LinkML-Store
provides an abstraction layer over multiple different backends
(including DuckDB, MongoDB, Neo4j, and local filesystems), allowing for
common query, index, and storage operations.

For full documentation, see [https://linkml.io/linkml-store/](https://linkml.io/linkml-store/)

See [these slides](https://docs.google.com/presentation/d/e/2PACX-1vSgtWUNUW0qNO_ZhMAGQ6fYhlXZJjBNMYT0OiZz8DDx8oj7iG9KofRs6SeaMXBBOICGknoyMG2zaHnm/embed?start=false&loop=false&delayms=3000) for a high level overview.

__Warning__ LinkML-Store is still undergoing changes and refactoring,
APIs and command line options are subject to change!

## Quick Start

Install, add data, query it:

```
pip install linkml-store[all]
linkml-store -d duckdb:///db/my.db -c persons insert data/*.json
linkml-store -d duckdb:///db/my.db -c persons query -w "occupation: Bricklayer"
```

Index it, search it:

```
linkml-store -d duckdb:///db/my.db -c persons index -t llm
linkml-store -d duckdb:///db/my.db -c persons search "all persons employed in construction"
```

Validate it:

```
linkml-store -d duckdb:///db/my.db -c persons validate
```

## Basic usage

* [Command Line](https://linkml.io/linkml-store/tutorials/Command-Line-Tutorial.html)
* [Python](https://linkml.io/linkml-store/tutorials/Python-Tutorial.html)
* API
* Streamlit applications

## The CRUDSI pattern

Most database APIs implement the **CRUD** pattern: Create, Read, Update, Delete.
LinkML-Store adds **Search** and **Inference** to this pattern, making it **CRUDSI**.

The notion of "Search" and "Inference" is intended to be flexible and extensible,
including:

* Search
   * Traditional keyword search
   * Search using LLM Vector embeddings (*without* a dedicated vector database)
   * Pluggable specialized search, e.g. genomic sequence (not yet implemented)
* Inference (encompassing  *validation*, *repair*, and inference of missing data)
   * Classic rule-based inference
   * Inference using LLM Retrieval Augmented Generation (RAG)
   * Statistical/ML inference

## Features

### Multiple Adapters

LinkML-Store is designed to work with multiple backends, giving a common abstraction layer

* [MongoDB](https://linkml.io/linkml-store/how-to/Use-MongoDB.html)
* [DuckDB](https://linkml.io/linkml-store/tutorials/Python-Tutorial.html)
* [Solr](https://linkml.io/linkml-store/how-to/Query-Solr-using-CLI.html)
* [Neo4j](https://linkml.io/linkml-store/how-to/Use-Neo4j.html)
* **Ibis** - Universal database adapter supporting DuckDB, PostgreSQL, SQLite, BigQuery, Snowflake, and many more

* Filesystem

Coming soon: any RDBMS, any triplestore, HDF5-based stores, ChromaDB/Vector dbs ...

The intent is to give a union of all features of each backend. For
example, analytic faceted queries are provided for *all* backends, not
just Solr.

### Composable indexes

Many backends come with their own indexing and search
schemes. Classically this was Lucene-based indexes, now it is semantic
search using LLM embeddings.

LinkML store treats indexing as an orthogonal concern - you can
compose different indexing schemes with different backends. You don't
need to have a vector database to run embedding search!

See [How to Use-Semantic-Search](https://linkml.io/linkml-store/how-to/Use-Semantic-Search.html)

### Use with LLMs

TODO - docs

### Validation

LinkML-Store is backed by [LinkML](https://linkml.io), which allows
for powerful expressive structural and semantic constraints.

See [Indexing JSON](https://linkml.io/linkml-store/how-to/Index-Phenopackets.html)

and [Referential Integrity](https://linkml.io/linkml-store/how-to/Check-Referential-Integrity.html)

## Web API

There is a preliminary API following HATEOAS principles implemented using FastAPI.

To start you should first create a config file, e.g. `db/conf.yaml`:

Then run:

```
export LINKML_STORE_CONFIG=./db/conf.yaml
make api
```

The API returns links as well as data objects, it's recommended to use a Chrome plugin for JSON viewing
for exploring the API. TODO: add docs here.

The main endpoints are:

* `http://localhost:8000/` - the root of the API
* `http://localhost:8000/pages/` - browse the API via HTML
* `http://localhost:8000/docs` - the Swagger UI

## Streamlit app

```
make app
```

## Background

See [these slides](https://docs.google.com/presentation/d/e/2PACX-1vSgtWUNUW0qNO_ZhMAGQ6fYhlXZJjBNMYT0OiZz8DDx8oj7iG9KofRs6SeaMXBBOICGknoyMG2zaHnm/embed?start=false&loop=false&delayms=3000) for more details

