Metadata-Version: 2.4
Name: orion_py_client
Version: 0.1.13
Summary: Python Client for Orion Feature Store to push/produce Model Features and get features' metadata
Project-URL: Homepage, https://github.com/Meesho/orion/orion_py_client
Project-URL: Bug Tracker, https://github.com/Meesho/orion/issues
Project-URL: Source, https://github.com/meesho/orion/orion_py_client
Author-email: Jaya Kommuru <jaya.kommuru@meesho.com>
License: Apache-2.0
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Requires-Python: >=3.7
Requires-Dist: cloudpathlib>=0.17.0
Requires-Dist: protobuf>=4.24.1
Description-Content-Type: text/markdown

# Orion Python Client

A lightweight Python client for interacting with [Orion Feature Store](https://github.com/Meesho/orion). This client provides functionality for feature metadata retrieval, protobuf serialization, and Kafka integration. 🚀 

This client helps in pushing ML model's features stored in offline sources (like tables, Cloud storage objects in parquet/delta format, etc) to Orion Feature Store

## Key Features

- Feature metadata retrieval
- Protobuf serialization of feature values and produce to Apache Kafka
- Support for features of different various data types:
  - Scalar types (FP32, FP64, Int32, Int64, UInt32, UInt64, String, Bool)
  - Vector types (Vectors of each of the above Scalar Types)
- Kafka integration with configurable settings

## 📥 Installation

```bash
pip install orion-py-client==0.1.1
```

## Prerequisites

- Python 3.7+
- (Optional) Apache Spark 3.0+ & spark-sql-kafka for Kafka feature push functionality

## Usage

### Basic Usage

```python
from orion_py_client import OrionPyClient

# Initialize the client
client = OrionPyClient(
    features_metadata_source_url="your_features_metadata_source_url",
    job_id="your_job_id",
    job_token="your_job_token"
)

# Get feature details
(
    offline_src_type_columns,
    offline_col_to_default_values_map,
    entity_column_names
) = opy_client.get_features_details()
```


### Push Feature Values from Offline sources to Orion via Spark -> Kafka

#### Supported Offline Sources
1. Table (Hive/Delta)
2. Parquet folder stored in Cloud Storage (AWS/GCS/ADLS)
3. Delta folder stored in Cloud Storage (AWS/GCS/ADLS)


Refer to the [examples](https://github.com/Meesho/orion/tree/main/examples/notebook) for detailed example of how to configure a job and push the feature values

Followng is a simple flow / outline of the steps involved in above example

```python
# create a new orion client
opy_client = OrionPyClient(features_metadata_source_url, job_id, job_token) 

# get the features details
feature_mapping, offline_col_to_default_values_map, onfs_fg_to_onfs_feat_map, onfs_fg_to_ofs_feat_map, fg_to_datatype_map, entity_label, entity_column_names = opy_client.get_features_details(fgs_to_consider)

# read the data from different sources
df = get_features_from_all_sources(spark, entity_column_names, feature_mapping, offline_col_to_default_values_map)

# serialize of protobuf binary
proto_df = opy_client.generate_df_with_protobuf_messages(df, intra_batch_size=20) 

# Produce data to kafka so that consumers write features to Orion Feature Store
opy_client.write_protobuf_df_to_kafka(proto_df, kafka_bootstrap_servers, kafka_topic, additional_options)
```

## Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

## Support

For support, please create an issue
