Metadata-Version: 2.4
Name: pyspark-fluvius
Version: 0.1.0
Summary: PySpark custom data source for Fluvius Energy API
Project-URL: Homepage, https://github.com/warreee/spark-fluvius
Project-URL: Repository, https://github.com/warreee/spark-fluvius
Project-URL: Issues, https://github.com/warreee/spark-fluvius/issues
Author: Ward Schodts
License-Expression: AGPL-3.0-or-later
License-File: LICENSE
Keywords: api,data-source,energy,fluvius,pyspark,spark
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: GNU Affero General Public License v3 or later (AGPLv3+)
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: >=3.13
Requires-Dist: fluvius-energy-api>=0.1.1
Requires-Dist: pyarrow>=15.0.0
Requires-Dist: pyspark>=4.0.0
Provides-Extra: dev
Requires-Dist: mypy>=1.9.0; extra == 'dev'
Requires-Dist: pytest-cov>=4.0.0; extra == 'dev'
Requires-Dist: pytest>=8.0.0; extra == 'dev'
Requires-Dist: ruff>=0.4.0; extra == 'dev'
Description-Content-Type: text/markdown

    # spark-fluvius

PySpark custom data sources for the [Fluvius Energy API](https://github.com/warreee/fluvius-energy-api).

Read energy measurements and mandates directly into Spark DataFrames.

## Installation

```bash
pip install pyspark-fluvius
```

## Quick Start

```python
from pyspark.sql import SparkSession
from pyspark_fluvius import register_datasources

# Create SparkSession
spark = SparkSession.builder.appName("MyApp").getOrCreate()

# Register Fluvius data sources
register_datasources()

# Read mandates
mandates_df = spark.read.format("fluvius.mandates") \
    .option("status", "Approved") \
    .load()

# Read energy data
energy_df = spark.read.format("fluvius.energy") \
    .option("ean", "541234567890123456") \
    .option("period_type", "readTime") \
    .option("granularity", "daily") \
    .option("from_date", "2024-01-01") \
    .option("to_date", "2024-01-31") \
    .load()
```

## Authentication

Credentials can be provided via environment variables or Spark options.

### Environment Variables

```bash
# Required
export FLUVIUS_SUBSCRIPTION_KEY="your-subscription-key"
export FLUVIUS_CLIENT_ID="your-client-id"
export FLUVIUS_TENANT_ID="your-tenant-id"
export FLUVIUS_SCOPE="your-scope"
export FLUVIUS_DATA_ACCESS_CONTRACT_NUMBER="your-contract-number"

# For sandbox (client secret auth)
export FLUVIUS_CLIENT_SECRET="your-client-secret"

# For production (certificate auth)
export FLUVIUS_CERTIFICATE_THUMBPRINT="your-thumbprint"
export FLUVIUS_PRIVATE_KEY="-----BEGIN RSA PRIVATE KEY-----..."
# Or use a file path:
export FLUVIUS_PRIVATE_KEY_PATH="/path/to/private_key.pem"
```

### Spark Options

```python
df = spark.read.format("fluvius.mandates") \
    .option("subscription_key", "...") \
    .option("client_id", "...") \
    .option("tenant_id", "...") \
    .option("scope", "...") \
    .option("data_access_contract_number", "...") \
    .option("client_secret", "...") \
    .load()
```

## Data Sources

### fluvius.mandates

Read mandate data from the Fluvius API.

**Options:**
| Option | Description |
|--------|-------------|
| `reference_number` | Filter by custom reference number |
| `ean` | Filter by GSRN EAN-code |
| `data_service_types` | Comma-separated list (e.g., "VH_dag,VH_kwartier_uur") |
| `energy_type` | "E" (electricity) or "G" (gas) |
| `status` | Requested, Approved, Rejected, or Finished |
| `mandate_expiration_date` | ISO format date filter |
| `renewal_status` | ToBeRenewed, RenewalRequested, or Expired |
| `last_updated_from` | ISO format datetime |
| `last_updated_to` | ISO format datetime |
| `environment` | "sandbox" (default) or "production" |

**Schema:**
| Column | Type |
|--------|------|
| reference_number | string |
| status | string |
| ean | string |
| energy_type | string |
| data_period_from | timestamp |
| data_period_to | timestamp |
| data_service_type | string |
| mandate_expiration_date | timestamp |
| renewal_status | string |

### fluvius.energy

Read energy measurement data from the Fluvius API.

**Required Options:**
| Option | Description |
|--------|-------------|
| `ean` | GSRN EAN-code (required) |
| `period_type` | "readTime" or "insertTime" (required) |

**Optional Options:**
| Option | Description |
|--------|-------------|
| `reference_number` | Custom reference number |
| `granularity` | e.g., "daily", "hourly_quarterhourly" |
| `complex_energy_types` | e.g., "active,reactive" |
| `from_date` | ISO format date (e.g., "2024-01-01") |
| `to_date` | ISO format date (e.g., "2024-01-31") |
| `environment` | "sandbox" (default) or "production" |

**Schema:**
| Column | Type | Description |
|--------|------|-------------|
| ean | string | EAN code of the installation |
| energy_type | string | "E" or "G" |
| metering_type | string | Type of metering installation |
| measurement_start | timestamp | Start of measurement period |
| measurement_end | timestamp | End of measurement period |
| granularity | string | daily, hourly, or quarter_hourly |
| meter_seq_number | string | Physical meter sequence (if applicable) |
| meter_id | string | Physical meter ID (if applicable) |
| subheadpoint_ean | string | Subheadpoint EAN (for submetering) |
| subheadpoint_type | string | auxiliary, offtake, or production |
| subheadpoint_seq_number | string | Subheadpoint sequence number |
| offtake_total_value | double | Offtake measurement value |
| offtake_total_unit | string | Measurement unit (e.g., kWh) |
| offtake_total_validation_state | string | READ, EST, VAL, or NVAL |
| offtake_total_gas_conversion_factor | string | P, D, or C (gas only) |
| offtake_day_value | double | Day tariff offtake |
| offtake_day_unit | string | Unit |
| offtake_day_validation_state | string | Validation state |
| offtake_night_value | double | Night tariff offtake |
| offtake_night_unit | string | Unit |
| offtake_night_validation_state | string | Validation state |
| injection_total_value | double | Injection measurement |
| injection_total_unit | string | Unit |
| injection_total_validation_state | string | Validation state |
| injection_day_value | double | Day tariff injection |
| injection_day_unit | string | Unit |
| injection_day_validation_state | string | Validation state |
| injection_night_value | double | Night tariff injection |
| injection_night_unit | string | Unit |
| injection_night_validation_state | string | Validation state |
| production_total_value | double | Production measurement |
| production_total_unit | string | Unit |
| production_total_validation_state | string | Validation state |

## Requirements

- Python 3.13+
- PySpark 4.0+
- fluvius-energy-api 0.1.0+

## License

AGPL-3.0-or-later
