Metadata-Version: 2.2
Name: rindle
Version: 1.0.1
Summary: Dataset preparation library with Python bindings for sliding window tensors
Keywords: time series,windowing,sliding window,dataset builder,feature engineering,ML,forecasting,finance,quant,trading,stocks,normalization,scaling,numpy,pandas
Author-Email: Eric Gilerson <ericgilerson@gmail.com>
Classifier: Development Status :: 5 - Production/Stable
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: C++
Classifier: License :: Other/Proprietary License
Classifier: Operating System :: OS Independent
Project-URL: Homepage, https://github.com/EricGilerson/rindle
Project-URL: Repository, https://github.com/EricGilerson/rindle
Project-URL: Issues, https://github.com/EricGilerson/rindle/issues
Requires-Python: >=3.9
Requires-Dist: numpy>=1.23
Provides-Extra: dev
Requires-Dist: scikit-build-core>=0.5; extra == "dev"
Requires-Dist: build; extra == "dev"
Description-Content-Type: text/markdown

# Rindle for Python

Rindle turns collections of per-ticker CSV files into contiguous sliding-window
tensors that are ready for deep learning workflows. The Python extension wraps
the C++20 data preparation engine behind a small, NumPy-friendly API so you can
configure builds, materialize datasets, and recover fitted scalers directly
from notebooks or training scripts.

## Highlights

- **Deterministic dataset builds** – declare the window geometry, scaler, and
  input schema with `rindle.create_config` and let the engine emit consistent
  results across runs.
- **Multi-threaded** – both `build_dataset` and `get_dataset` parallelize work
  across tickers and windows using a C++ thread pool. An optional `thread_count`
  parameter (default `0` = auto) gives explicit control. The Python GIL is
  released during these calls so other threads stay responsive.
- **Manifest-driven reloads** – rehydrate tensors on demand with
  `rindle.get_dataset` using the in-memory manifest returned by a build or a
  saved `manifest.json` file.
- **NumPy integration** – feature (`Dataset.X`) and target (`Dataset.Y`) tensors
  are exposed as NumPy arrays with shape `(windows, sequence_length, features)`
  and `float32` precision for direct use with frameworks such as PyTorch or
  TensorFlow.
- **Scaler introspection** – fetch the fitted scaler for any ticker/feature pair
  to invert predictions or understand the normalization that was applied.

## Installation

The package ships with pre-built wheels when possible and can also be compiled
locally with a C++20 toolchain.

```bash
pip install rindle
```

Building from source requires a compiler with C++20 support, CMake 3.18+, and
Python 3.9 or newer. When working from a clone of the repository:

```bash
python -m pip install --upgrade pip
python -m pip install build
python -m build
python -m pip install dist/rindle-*.whl
```

## Quickstart

```python
from pathlib import Path
import rindle

config = rindle.create_config(
    input_dir=Path("data/raw_prices"),
    output_dir=Path("data/processed"),
    feature_columns=["Open", "High", "Low", "Close", "Volume"],
    seq_length=64,
    future_horizon=8,
    target_column="Close",
    time_mode=rindle.TimeMode.UTC_NS,
    row_major=False,
    scaler_kind=rindle.ScalerKind.Standard,
)

manifest = rindle.build_dataset(config)  # parallelized across tickers

# Load full dataset (default)
dataset = rindle.get_dataset(manifest)  # parallelized tensor fill

# Load a random 10% sample (maintains ticker distribution)
dataset_small = rindle.get_dataset(manifest, percentage=0.1)

# Explicit thread count (0 = auto-detect)
dataset = rindle.get_dataset(manifest, thread_count=4)

X = dataset.X  # NumPy array: (windows, seq_length, n_features), dtype=float32
Y = dataset.Y  # NumPy array aligned with X when targets are enabled
meta = dataset.meta  # List of WindowMeta objects with ticker provenance
print("total windows:", dataset.n_windows())
```

The manifest stores the configuration, aggregate statistics, and ticker-level
metadata. A copy is written to `<output_dir>/manifest.json` during the build so
you can reload tensors later without repeating the pipeline:

```python
from pathlib import Path

manifest_path = Path(config.output_dir) / "manifest.json"
reloaded = rindle.get_dataset(manifest_path)
```

## Inspecting manifests and scalers

Each `ManifestContent` instance exposes the fields captured during the build,
including `feature_columns`, `total_windows`, and `ticker_stats`. The helper
method `find_stats("AAPL")` returns the `TickerStats` record for a ticker, and
`build_ticker_index()` can be called if you mutate `ticker_stats` manually.

To invert normalized values or apply identical scaling elsewhere:

```python
scaler = rindle.get_feature_scaler(manifest, ticker="AAPL", feature="Close")
original_value = rindle.inverse_transform_value(scaler, value=0.42)
```

The returned `FittedScaler` exposes `transform` and `inverse_transform` methods
as well as a `params` property that includes summary statistics (mean, standard
deviation, quartiles, and min/max bounds).

## Data layout

- `Dataset.X` and `Dataset.Y` are three-dimensional NumPy arrays backed by the
  underlying C++ tensors (`float32`). When `row_major=False` (the default), the
  layout is `[window][time][feature]` with contiguous storage, making it ideal
  for training recurrent and convolutional models.
- `Dataset.meta` is a list of `WindowMeta` objects describing where each window
  originated. Fields include `ticker`, `start_row`, `end_row`, and optional
  `target_start` / `target_end` indices.

## API reference snapshot

| Function | Description |
| --- | --- |
| `rindle.create_config(...)` | Validate paths, choose feature columns, configure window geometry and scaling. Returns a `DatasetConfig`. |
| `rindle.build_dataset(config, thread_count=0)` | Run discovery → scaling → windowing in parallel and return a `ManifestContent`. |
| `rindle.get_dataset(manifest_or_path, percentage=1.0, thread_count=0)` | Load feature/target tensors in parallel. Optional `percentage` (0.0 < p <= 1.0) loads a random subset of windows per ticker. |
| `rindle.get_feature_scaler(manifest_or_path, ticker, feature)` | Retrieve the fitted scaler for a ticker/feature pair to apply or invert scaling. |
| `rindle.inverse_transform_value(scaler, value)` | Convenience helper to undo scaling with a `FittedScaler`. |

Additional classes such as `DatasetConfig`, `ManifestContent`, `Dataset`, and
`TickerStats` expose their fields as Python attributes for straightforward
inspection or serialization.

## Project resources

- Source repository: <https://github.com/EricGilerson/rindle>
- Issue tracker: <https://github.com/EricGilerson/rindle/issues>

Although the core engine is implemented in C++, the Python package provides a
self-contained workflow for assembling time-series datasets without leaving the
Python ecosystem.
