Metadata-Version: 2.1
Name: cpcat
Version: 0.1
Summary: A portable, scalable, and fast AI Data Lakehouse.
Home-page: https://github.com/ray-project/deltacat
Author: Patina Software Foundation
License: UNKNOWN
Platform: UNKNOWN
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Programming Language :: Python :: 3 :: Only
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.9
Classifier: Operating System :: OS Independent
Requires-Python: >=3.9
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: aws-embedded-metrics ==3.2.0
Requires-Dist: boto3 ~=1.34
Requires-Dist: getdaft ==0.3.6
Requires-Dist: intervaltree ==3.1.0
Requires-Dist: numpy ==1.21.5
Requires-Dist: pandas ==1.3.5
Requires-Dist: pyarrow ==17.0.0
Requires-Dist: pydantic ==1.10.4
Requires-Dist: pymemcache ==4.0.0
Requires-Dist: ray >=2.20.0
Requires-Dist: s3fs ==2024.5.0
Requires-Dist: tenacity ==8.2.3
Requires-Dist: typing-extensions ==4.6.1
Requires-Dist: redis ==4.6.0
Requires-Dist: schedule ==1.2.0
Provides-Extra: iceberg
Requires-Dist: pyiceberg[glue] >=0.6.0 ; extra == 'iceberg'

![deltacat-header-logo](media/deltacat-logo-alpha.png)

DeltaCAT is a portable Multimodal Data Lakehouse powered by [Ray](https://github.com/ray-project/ray). It lets you define and manage
fast, scalable, ACID-compliant Multimodal data lakes, and has been used to [successfully manage exabyte-scale enterprise
data lakes](https://aws.amazon.com/blogs/opensource/amazons-exabyte-scale-migration-from-apache-spark-to-ray-on-amazon-ec2/).

It uses the Ray distributed compute framework together with [Apache Arrow](https://github.com/apache/arrow) and
[Daft](https://github.com/Eventual-Inc/Daft) to efficiently scale common table management tasks, like petabyte-scale
merge-on-read and copy-on-write operations.

DeltaCAT provides four high-level components:
1. **Catalog**: High-level APIs to create, discover, organize, and manage datasets.
2. **Compute**: Distributed data management jobs to read, write, and optimize datasets.
3. **Storage**: In-memory and on-disk multimodal dataset formats.
4. **Sync**: Synchronize DeltaCAT datasets with other data warehouses and table formats.


## Getting Started

DeltaCAT is rapidly evolving. Usage instructions will be posted here soon!

For now, feel free to peruse some of our examples:
* https://github.com/ray-project/deltacat/tree/2.0/deltacat/examples/rivulet
* https://github.com/ray-project/deltacat/tree/2.0/deltacat/examples/iceberg


