GPU Neural Networks via Vulkan

Datasets and DataLoaders

Wrap your data, split train/val, apply transforms, and iterate batches efficiently.

TensorDataset

The simplest way to wrap numpy arrays into a dataset. Each array must have the same first dimension (number of samples):

python
from grilly.utils.data import TensorDataset, DataLoader
import numpy as np

X = np.random.randn(1000, 64).astype(np.float32)
y = np.random.randint(0, 10, 1000).astype(np.int64)

dataset = TensorDataset(X, y)
print(f"Dataset size: {len(dataset)}")
print(f"First sample: X={dataset[0][0].shape}, y={dataset[0][1]}")
Output
Dataset size: 1000
First sample: X=(64,), y=7

Custom Dataset

For more control, subclass Dataset and implement __len__ and __getitem__:

python
from grilly.utils.data import Dataset

class MyDataset(Dataset):
    def __init__(self):
        self.data = np.array([
            [1.0, 2.0],
            [3.0, 4.0],
            [5.0, 6.0],
        ], dtype=np.float32)
        self.labels = np.array([0, 1, 0])

    def __len__(self):
        return len(self.data)

    def __getitem__(self, idx):
        return self.data[idx], self.labels[idx]

dataset = MyDataset()
print(f"Sample 1: {dataset[1]}")
Output
Sample 1: (array([3., 4.], dtype=float32), 1)

DataLoader: Batching and Shuffling

DataLoader iterates through a dataset in batches, optionally shuffling each epoch:

python
loader = DataLoader(dataset, batch_size=2, shuffle=True)

for batch_data, batch_labels in loader:
    print("Data:", batch_data)
    print("Labels:", batch_labels)
    print()
Output
Data: [[5. 6.]
 [1. 2.]]
Labels: [0 0]

Data: [[3. 4.]]
Labels: [1]
vs PyTorch The API is identical: DataLoader(dataset, batch_size, shuffle). The num_workers parameter is accepted for compatibility but currently ignored (all loading is single-process).

Train/Validation Split

Use random_split to divide a dataset into train and validation subsets:

python
from grilly.utils.data import random_split

# Split 1000 samples into 800 train + 200 val
full_dataset = TensorDataset(X, y)
train_set, val_set = random_split(full_dataset, [800, 200])

train_loader = DataLoader(train_set, batch_size=32, shuffle=True)
val_loader   = DataLoader(val_set,   batch_size=64, shuffle=False)

print(f"Train: {len(train_set)} samples, {len(train_loader)} batches")
print(f"Val:   {len(val_set)} samples, {len(val_loader)} batches")
Output
Train: 800 samples, 25 batches
Val:   200 samples, 4 batches

Transforms

Chain preprocessing steps with Compose. Transforms are applied per-sample when using ArrayDataset:

python
from grilly.utils.data import (
    ArrayDataset, Compose, ToFloat32,
    Normalize, Flatten, RandomNoise, RandomFlip,
)

# Define a transform pipeline
transform = Compose([
    ToFloat32(scale=1.0/255.0),       # uint8 -> float32, normalize to [0,1]
    Normalize(mean=0.5, std=0.5),      # center to [-1, 1]
    RandomNoise(std=0.01),              # data augmentation
    Flatten(),                           # flatten to 1D
])

# Wrap data with transforms
images = np.random.randint(0, 256, (500, 28, 28)).astype(np.uint8)
labels = np.random.randint(0, 10, 500)

dataset = ArrayDataset(
    data=images,
    labels=labels,
    transform=transform,
)

sample, label = dataset[0]
print(f"Transformed sample shape: {sample.shape}")
print(f"Value range: [{sample.min():.2f}, {sample.max():.2f}]")
Output
Transformed sample shape: (784,)
Value range: [-1.02, 1.01]
Tip Use RandomFlip(p=0.5) for image augmentation and OneHot(num_classes) for label encoding. The Lambda(fn) transform lets you apply any custom function.

Complete Data Pipeline

Putting it all together — dataset, transforms, split, and batched training:

python
import grilly.nn as nn
import grilly.optim as optim
from grilly.utils.data import TensorDataset, DataLoader, random_split
import numpy as np

# Create dataset
X = np.random.randn(1000, 64).astype(np.float32)
y = np.random.randint(0, 10, 1000).astype(np.int64)
dataset = TensorDataset(X, y)

# Split and create loaders
train_set, val_set = random_split(dataset, [800, 200])
train_loader = DataLoader(train_set, batch_size=32, shuffle=True)

# Model + optimizer
model = nn.Sequential(
    nn.Linear(64, 128), nn.ReLU(),
    nn.Linear(128, 10),
)
optimizer = optim.AdamW(model.parameters(), lr=1e-3)
loss_fn = nn.CrossEntropyLoss()

# Training loop with batches
for epoch in range(5):
    model.train()
    total_loss = 0.0
    for X_batch, y_batch in train_loader:
        output = model(X_batch)
        loss = loss_fn(output, y_batch)
        grad = loss_fn.backward(np.ones_like(loss), output, y_batch)
        model.zero_grad()
        model.backward(grad)
        optimizer.step()
        total_loss += float(np.mean(loss))
    print(f"Epoch {epoch+1}: avg_loss={total_loss/len(train_loader):.4f}")
Output
Epoch 1: avg_loss=2.3142
Epoch 2: avg_loss=2.2801
Epoch 3: avg_loss=2.2453
Epoch 4: avg_loss=2.2089
Epoch 5: avg_loss=2.1704