Metadata-Version: 2.4
Name: timedb
Version: 0.1.1
Summary: timedb — opinionated schema & API for time series
Requires-Python: >=3.8
Description-Content-Type: text/markdown
Requires-Dist: click>=8.0
Requires-Dist: psycopg[binary]>=3.1
Requires-Dist: python-dotenv>=0.21

# timedb
## TL;DR
**timedb** is a opinionated schema and API built on top of PostgreSQL design to handle overlapping time series revisions and auditable human-in-the-loop updates. 

Most time series systems assume a single immutable value per timestamp. **timedb** is built for domains where data is revised, forecasted, reviewed, and corrected over time.

**timedb** lets you: 

- ⏱️ Retain "time-of-knowledge" history through a three-dimensional time series data model;
- ✍️ Make versioned ad-hoc updates to the time series data with comments and tags; and 
- 🔀 Represent both timestamp and time-interval time series simultaneously.

## Why timedb? 
Most time series systems assume:
- one value per timestamp;
- immutable historical data; and
- no distinction between when something was true vs when it was known.

This pattern is a major drawback in situations such as: 
- forecasting, where multiple forecast revisions predicts the same timestamp;
- backtesting, where "time-of-knowledge" history is required by algorithms;
- data communication, where and auditable history of updates is required.
- Human review and correction, where values are manually adjusted, annotated, or validated over time
- Late-arriving data and backfills, where new information must be incorporated without rewriting history

In practice, teams work around these limitations by overwriting data, duplicating tables and columns, or encoding semantics in column names — making systems fragile, opaque, and hard to reason about.

**timedb** addresses this by making revisions, provenance, and temporal semantics explicit in the data model, rather than treating them as edge cases.

## Installation
```python
pip install timedb
```

## Basic usage
TBD

## Tables
## runs_table

| Field | Type | Purpose |
|---|---|---|
| `run_id` **(primary key)** | attribute | Unique identifier for the run (generated by the API) |
| `workflow_id` | attribute | Identifier for the workflow that produced this run |
| `run_start_time` | time dimension | When the workflow started |
| `run_finish_time` **(optional)** | time dimension | When the workflow finished |
| `run_params` **(optional)** | attribute | Parameters/configuration used for this run (JSON object) |
| `inserted_at` | time dimension | When the row was inserted (default `now()`) |

---

## values_table

| Field | Type | Purpose |
|---|---|---|
| `value_id` **(primary key)** | attribute | Unique identifier for each version of a value |
| `run_id` **(foreign key)** | attribute | References the run that produced this value (`runs_table.run_id`) |
| `valid_time` | time dimension | Timestamp the value is valid for |
| `valid_time_end` **(optional)** | time dimension | Optional interval end time; NULL means point-in-time at `valid_time` |
| `value_key` | attribute | What the value represents (e.g. `mean`, `quantile:0.5`, `scenario:1`) |
| `value` **(optional)** | measure | The numeric value (nullable; NULL can be a valid stored value) |
| `comment` **(optional)** | attribute | Optional human annotation (whitespace-only disallowed) |
| `tags` **(optional)** | attribute | Optional semantic labels / quality flags (empty arrays disallowed; use NULL) |
| `changed_by` **(optional)** | attribute | User or service responsible for the change |
| `change_time` | time dimension | When this version row was created (default `now()`) |
| `is_current` | attribute | Whether this row is the active version for its key (default `true`) |

---

## metadata_table

| Field | Type | Purpose |
|---|---|---|
| `metadata_id` **(primary key)** | attribute | Surrogate primary key for metadata rows |
| `run_id` **(foreign key)** | attribute | References run context (`runs_table.run_id`) |
| `valid_time` | time dimension | Time context for the metadata (joins onto values via `(run_id, valid_time)`) |
| `metadata_key` | attribute | Name of the metadata field (e.g. `contractId`, `deliveryStart`) |
| `value_number` **(optional)** | attribute | Numeric metadata value (exactly one typed value must be set per row) |
| `value_string` **(optional)** | attribute | String metadata value (mutually exclusive with other typed values) |
| `value_bool` **(optional)** | attribute | Boolean metadata value (mutually exclusive with other typed values) |
| `value_time` **(optional)** | attribute | Timestamp metadata value (mutually exclusive with other typed values) |
| `value_json` **(optional)** | attribute | JSON metadata value (mutually exclusive with other typed values) |
| `inserted_at` | time dimension | When the metadata row was inserted (default `now()`) |

## Changed
### Three-dimensional time series data model
Every time series value is described using three independent timelines:

| Time dimension    | Description                                  |
| ----------------- | -------------------------------------------- |
| `knowledge_time`  | The time when the value was known            |
| `valid_time`      | The time the value represents a fact for     |
| `change_time`     | The time when the value was changed          |


### Additional attributes 
Schema columns provides additional attributes to the values according to:  

| Column name   | Description                                                     |
| ------------- | --------------------------------------------------------------- |
| `value`       | The numeric value being stored (may be NULL)                    |
| `value_key`   | Identifies what the value represents (e.g. mean, quantile, scenario) |
| `tags`        | Semantic labels and quality flags applied to the value           |
| `comment`     | Optional human annotation explaining the value or change         |
| `run_id`      | Reference to the workflow run that produced the value            |
| `run_params`  | Parameters and configuration associated with the producing run  |
| `is_current`  | Indicates whether this row is the active version for its key     |
| `changed_by`  | User or service responsible for the change                       |

## Roadmap
- [ ] Decouple the knowledge time from the run_time
- [ ] RESTful API layer that serves data to users
- [ ] Handle different time zones in the API layer while always storing in UTC in the database. 
- [ ] Support for postgres time intervals (tsrange/tstzrange)
- [ ] Built in data retention, TTL, and archiving
- [ ] Support for subscribing to database updates through the API 
- [ ] Python SDK that allows time series data manipulations, reads and writes
- [ ] Unit handling (e.g. MW, kW)
