Metadata-Version: 2.3
Name: redis-featureform
Version: 3.0.0
Summary: Python SDK and ff CLI for Redis Feature Form
Author: Simba Khadder
Author-email: Simba Khadder <simba.khadder@redis.com>
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Typing :: Typed
Requires-Dist: cloudpickle>=3.0.0
Requires-Dist: httpx>=0.27.0
Requires-Dist: keyring>=25.0.0
Requires-Dist: packaging>=24.0
Requires-Dist: pydantic>=2.0.0
Requires-Dist: structlog>=24.0.0
Requires-Dist: typer>=0.12.0
Requires-Dist: rich>=13.0.0
Requires-Dist: pyyaml>=6.0.0
Requires-Dist: grpcio>=1.60.0
Requires-Dist: grpcio-health-checking>=1.60.0
Requires-Dist: protobuf>=4.25.0
Requires-Dist: pyarrow>=17.0.0
Requires-Dist: pandas>=2.3.3
Requires-Dist: tomli>=2.0.0 ; python_full_version < '3.11'
Requires-Dist: pytest>=9.0.3 ; extra == 'dev'
Requires-Dist: pytest-cov>=4.1.0 ; extra == 'dev'
Requires-Dist: pytest-asyncio>=0.23.0 ; extra == 'dev'
Requires-Dist: pytest-httpx>=0.30.0 ; extra == 'dev'
Requires-Dist: psycopg[binary]>=3.2.0 ; extra == 'dev'
Requires-Dist: snowflake-connector-python>=3.12.0 ; extra == 'dev'
Requires-Dist: requests>=2.33.0 ; extra == 'dev'
Requires-Dist: redis>=5.0.0 ; extra == 'dev'
Requires-Dist: schemathesis>=4 ; extra == 'dev'
Requires-Dist: polars>=1.30.0 ; extra == 'dev'
Requires-Dist: schemathesis>=4 ; extra == 'dev'
Requires-Dist: ruff>=0.8.0 ; extra == 'dev'
Requires-Dist: mypy>=1.8.0 ; extra == 'dev'
Requires-Dist: pyright>=1.1.350 ; extra == 'dev'
Requires-Dist: datamodel-code-generator>=0.55.0 ; extra == 'dev'
Requires-Dist: opentelemetry-api>=1.20.0 ; extra == 'otel'
Requires-Dist: opentelemetry-sdk>=1.20.0 ; extra == 'otel'
Requires-Dist: pyspark>=3.5,<5 ; extra == 'pyspark'
Requires-Python: >=3.10
Project-URL: Documentation, https://redis.io/docs/latest/develop/ai/featureform/
Project-URL: Overview, https://redis.io/docs/latest/develop/ai/featureform/overview/
Project-URL: Quickstart, https://redis.io/docs/latest/develop/ai/featureform/quickstart/
Provides-Extra: dev
Provides-Extra: otel
Provides-Extra: pyspark
Description-Content-Type: text/markdown

# Redis Feature Form Python SDK

`redis-featureform` is the Python SDK and `ff` CLI for Redis Feature Form.

Redis Feature Form gives data and ML teams a declarative workflow for defining
providers, datasets, transformations, entities, features, labels, training sets,
and feature views in Python while keeping existing offline data systems in place.
Redis is the low-latency online store for feature serving.

Install from PyPI as `redis-featureform`, then import it in Python as
`featureform`.

Official Redis Feature Form docs:

- Overview: `https://redis.io/docs/latest/develop/ai/featureform/overview/`
- Quickstart: `https://redis.io/docs/latest/develop/ai/featureform/quickstart/`
- Documentation hub: `https://redis.io/docs/latest/develop/ai/featureform/`

## Install

```bash
pip install redis-featureform
```

Optional extras:

```bash
pip install "redis-featureform[otel]"
pip install "redis-featureform[pyspark]"
```

## What Redis Feature Form Does

Redis Feature Form helps you:

- register offline systems and Redis as providers
- define feature engineering resources as Python code
- submit a complete resource graph into a workspace with `ff apply` or `client.apply()`
- inspect the resulting plan before execution
- query datasets and training sets over Arrow Flight
- serve online feature values from Redis-backed feature views

## Core Workflow

The Redis Feature Form workflow is:

1. Create a workspace.
2. Register providers for offline systems and the Redis online store.
3. Author resources in Python.
4. Apply or merge the resource graph into the workspace.
5. Let Redis Feature Form plan and run the required execution work.
6. Query offline results for training or fetch online features for inference.

## Core Concepts

- Workspace: logical boundary for a Redis Feature Form resource graph
- Provider: connection to an external system such as Postgres, Snowflake, Spark, S3, Iceberg, or Redis
- Resource: Python declaration of an entity, dataset, transformation, feature, label, training set, or feature view
- Apply: single-shot workspace update that submits one combined resource set, computes a delta, and optionally runs an apply job
- Merge: apply strategy that preserves omitted resources already present in the workspace
- Execution mode: planning mode for runtime work, currently `NORMAL`, `UPDATE`, or `FULL_REMATERIALIZE`
- Training set: offline, model-ready dataset composed from features and labels
- Feature view: online-serving surface backed by Redis for inference-time lookup

## Connect And Register Providers

The CLI is usually the fastest way to bootstrap Redis Feature Form resources:

```bash
ff --server https://api.example.com --transport rest auth login

ff --server https://api.example.com --transport rest \
  workspace create fraud-detection

ff --server https://api.example.com --transport rest \
  provider register analytics-postgres \
  --workspace fraud-detection \
  --type postgres \
  --pg-host postgres.example.com \
  --pg-port 5432 \
  --pg-database analytics \
  --pg-user ff_user \
  --pg-password-secret env:PG_PASSWORD

ff --server https://api.example.com --transport rest \
  provider register online-store \
  --workspace fraud-detection \
  --type redis \
  --redis-host redis.example.com \
  --redis-port 6379
```

Providers are registered separately from apply. Resource definitions then refer
to those provider names when you define datasets, transformations, training sets,
and Redis-backed feature views.

## What `ff apply` Does

`ff apply` is the main authoring workflow in Redis Feature Form.

At a high level it:

1. loads one Python entry file or package directory
2. collects resources from either an explicit `resources = [...]` list or the global registry
3. serializes the full resource graph into one request for one workspace
4. compares the submitted graph with the current workspace graph
5. returns a dry-run plan or commits the new version and optionally waits for the apply job

`ff apply` is graph-wide. It does not patch one object at a time.

### Example Apply File

```python
from datetime import timedelta

import featureform as ff
from featureform.types.resource import FeatureView, MaterializationEngine, TrainingSetType

postgres = ff.get_postgres("analytics-postgres")

customer = ff.Entity(
    name="customer",
    description="Customer entity for fraud models",
)

transactions = postgres.dataset(
    name="transactions",
    schema="public",
    table="transactions",
    timestamp_column="event_ts",
    description="Raw payment events",
)


@postgres.sql_transformation(
    name="customer_daily_rollups",
    description="Daily transaction totals per customer",
    inputs=[transactions],
)
def customer_daily_rollups() -> str:
    return """
        SELECT
            customer_id,
            date_trunc('day', event_ts) AS event_day,
            SUM(amount) AS total_amount,
            COUNT(*) AS transaction_count
        FROM {{transactions}}
        GROUP BY 1, 2
    """


customer_amount_7d = (
    ff.Feature(name="customer_amount_7d")
    .from_dataset(
        customer_daily_rollups,
        entity="customer",
        entity_column="customer_id",
        value="total_amount",
        timestamp="event_day",
    )
    .with_provider("analytics-postgres")
    .aggregate(function=ff.AggregateFunction.SUM, window=timedelta(days=7))
)

fraud_label = (
    ff.Label()
    .from_dataset(transactions)
    .value("is_fraud")
    .timestamp("event_ts")
    .entity("customer", column="customer_id")
    .build()
)

fraud_training = ff.TrainingSet(
    name="fraud_training",
    provider="analytics-postgres",
    features=[customer_amount_7d],
    label=fraud_label,
    type=TrainingSetType.STATIC,
)

customer_features = FeatureView(
    name="customer_features",
    entity="customer",
    features=[customer_amount_7d.name],
    inference_store="online-store",
    materialization_engine=MaterializationEngine.K8S,
)

resources = [
    customer,
    transactions,
    customer_daily_rollups,
    customer_amount_7d,
    fraud_label,
    fraud_training,
    customer_features,
]
```

## Apply Strategies And Execution Modes

Redis Feature Form now exposes apply strategy and execution mode as separate axes.

Apply strategies:

- default apply: submitted resources replace the desired graph for the workspace
- merge: submitted resources merge into the workspace and omitted resources are retained

Execution modes:

- `NORMAL`: standard diff-based planning
- `UPDATE`: revisit the target graph in update mode
- `FULL_REMATERIALIZE`: revisit the target graph in full-rematerialize mode

That means combinations such as these are valid:

- apply + normal
- apply + update
- apply + full rematerialize
- merge + normal
- merge + update
- merge + full rematerialize

### Apply From The CLI

Preview the graph delta without changing the workspace:

```bash
ff --server https://api.example.com --transport rest \
  apply -f resources.py --workspace fraud-detection --plan
```

Apply the submitted graph and wait for completion:

```bash
ff --server https://api.example.com --transport rest \
  apply -f resources.py \
  --workspace fraud-detection \
  --message "initial fraud pipeline" \
  --wait --wait-for finished
```

Merge into the existing workspace instead of replacing omitted resources:

```bash
ff --server https://api.example.com --transport rest \
  apply -f resources.py \
  --workspace fraud-detection \
  --merge \
  --wait --wait-for finished
```

Run a plan in update mode:

```bash
ff --server https://api.example.com --transport rest \
  apply -f resources.py \
  --workspace fraud-detection \
  --plan \
  --update
```

Request a merge with full rematerialization semantics:

```bash
ff --server https://api.example.com --transport rest \
  apply -f resources.py \
  --workspace fraud-detection \
  --merge \
  --full-rematerialize
```

Useful flags:

- `--plan`: show what would change without applying
- `--merge`: keep omitted resources already present in the workspace
- `--update`: use update execution mode
- `--full-rematerialize`: use full rematerialization execution mode
- `--wait --wait-for finished`: block until the apply job completes and the workspace version is visible

### Apply From Python

Use `client.apply()` for standard apply semantics:

```python
import os

import featureform as ff
import resources

client = ff.Client(
    base_url="grpc.example.com:443",
    transport="grpc",
    token=os.environ["FEATUREFORM_TOKEN"],
)

result = client.apply(
    resources=resources.resources,
    workspace="fraud-detection",
    message="initial fraud pipeline",
    wait=True,
)

print(result.version)
for change in result.changes:
    print(change.resource_type, change.resource_name, change.action)
```

Use execution mode directly from Python:

```python
plan = client.apply(
    resources=resources.resources,
    workspace="fraud-detection",
    dry_run=True,
    execution_mode="UPDATE",
)
```

Use merge semantics without deleting omitted resources:

```python
result = client.merge(
    resources=resources.resources,
    workspace="fraud-detection",
    execution_mode="FULL_REMATERIALIZE",
    wait=True,
)
```

If `resources` is omitted, `client.apply(...)` and `client.merge(...)` use the
global resource registry. That works well when definition modules register
resources during import.

## Query Offline Data

Use Arrow Flight helpers to pull datasets, training sets, or feature-view
snapshots into pandas or stream them incrementally.

```python
training = client.query_training_set(
    "fraud_training",
    workspace="fraud-detection",
    flight_server="grpc.example.com:9090",
).to_pandas()

print(training.head())
```

You can also query datasets directly:

```python
transactions = client.query_dataset(
    "transactions",
    workspace="fraud-detection",
    limit=1000,
).to_pandas()
```

## Serve Online Features From Redis

Once a feature view is materialized, Redis Feature Form serves online values from
Redis with `client.serve(...)`.

```python
features = client.serve(
    "customer_features",
    entity="customer_123",
    workspace="fraud-detection",
)

print(features["customer_amount_7d"])
```

Batch serving is supported as well:

```python
result = client.serve(
    "customer_features",
    entities=["customer_123", "customer_456"],
    workspace="fraud-detection",
    return_type="pandas",
)
```

## Package Surface

- `featureform.Client`: client for workspaces, providers, apply and merge, offline queries, and online serving
- `ff`: CLI for auth, apply, workspace, provider, catalog, graph, scheduler, and RBAC operations
- `featureform.resources`: typed resource DSL for authoring Redis Feature Form resource graphs
- `featureform.types`: request and response models used by the client surface
