Tensors and Operations
Grilly uses plain numpy arrays as its core data format. No custom tensor class needed for most work.
The Data Format
Unlike PyTorch's torch.Tensor, grilly operates directly on numpy float32 arrays. Every layer accepts and returns numpy arrays. The framework also provides PyTorch-style factory functions for convenience:
import grilly
import numpy as np
# 1D tensor (vector)
x = grilly.tensor([1.0, 2.0, 3.0])
print("1D:", x, x.dtype)
# 2D tensor (matrix)
m = grilly.tensor([[1, 2], [3, 4]])
print("2D:", m)
# Factory functions
z = grilly.zeros(2, 3) # shape (2, 3), all zeros
o = grilly.ones(2, 3) # shape (2, 3), all ones
r = grilly.randn(2, 3) # shape (2, 3), random normal
print("Zeros:", z)
print("Ones:", o)
print("Random:", r)
1D: [1. 2. 3.] float32 2D: [[1. 2.] [3. 4.]] Zeros: [[0. 0. 0.] [0. 0. 0.]] Ones: [[1. 1. 1.] [1. 1. 1.]] Random: [[-0.4532 1.2041 0.0893] [ 0.7612 -0.3287 1.5140]]
np.ndarray with dtype=float32. There is no .cuda() or .to(device) call needed for basic operations — data is always numpy until you opt into GPU mode with VulkanTensor.
Indexing and Slicing
Since grilly tensors are numpy arrays, you get full numpy indexing, slicing, and fancy indexing for free:
import numpy as np
tensor = np.array([[1, 2], [3, 4], [5, 6]], dtype=np.float32)
# Single element
element = tensor[1, 0]
print(f"Element [1,0]: {element}")
# Slicing: first two rows
sliced = tensor[:2, :]
print(f"First two rows:\n{sliced}")
# Boolean mask
mask = tensor > 3
print(f"Elements > 3: {tensor[mask]}")
# Fancy indexing
rows = np.array([0, 2])
print(f"Rows 0 and 2:\n{tensor[rows]}")
Element [1,0]: 3.0 First two rows: [[1. 2.] [3. 4.]] Elements > 3: [4. 5. 6.] Rows 0 and 2: [[1. 2.] [5. 6.]]
Reshaping
Reshape arrays without copying data. This is essential for adapting tensor dimensions between layers:
tensor = np.array([[1, 2], [3, 4], [5, 6]], dtype=np.float32)
# Reshape (3, 2) -> (2, 3)
reshaped = tensor.reshape(2, 3)
print(f"Reshaped (2x3):\n{reshaped}")
# Flatten to 1D
flat = tensor.flatten()
print(f"Flat: {flat}")
# Transpose
t = tensor.T
print(f"Transposed (2x3):\n{t}")
# Add/remove dimensions
expanded = np.expand_dims(tensor, axis=0) # (1, 3, 2)
squeezed = np.squeeze(expanded) # (3, 2)
print(f"Expanded shape: {expanded.shape}")
print(f"Squeezed shape: {squeezed.shape}")
Reshaped (2x3): [[1. 2. 3.] [4. 5. 6.]] Flat: [1. 2. 3. 4. 5. 6.] Transposed (2x3): [[1. 3. 5.] [2. 4. 6.]] Expanded shape: (1, 3, 2) Squeezed shape: (3, 2)
Broadcasting and Matrix Multiplication
Broadcasting lets you combine arrays of different shapes. Matrix multiplication is at the heart of every neural network layer:
import numpy as np
a = np.array([[1, 2, 3], [4, 5, 6]], dtype=np.float32)
b = np.array([[10, 20, 30]], dtype=np.float32)
# Broadcasting: (2, 3) + (1, 3) -> (2, 3)
result = a + b
print(f"Broadcast add:\n{result}")
# Matrix multiplication: (2, 3) @ (3, 2) -> (2, 2)
matmul = a @ a.T
print(f"A @ A^T:\n{matmul}")
# Element-wise operations
print(f"a * 2:\n{a * 2}")
print(f"np.sqrt(a):\n{np.sqrt(a)}")
Broadcast add: [[11. 22. 33.] [14. 25. 36.]] A @ A^T: [[14. 32.] [32. 77.]] a * 2: [[ 2. 4. 6.] [ 8. 10. 12.]] np.sqrt(a): [[1. 1.414 1.732] [2. 2.236 2.449]]
VulkanTensor: GPU-Resident Data
VulkanTensor wraps a numpy array with an optional GPU buffer. When modules run in gpu_mode, data stays in VRAM between layers, avoiding PCIe round-trips:
from grilly import VulkanTensor
import numpy as np
# Create from numpy
data = np.random.randn(32, 128).astype(np.float32)
vt = VulkanTensor(data)
# Factory methods
vt_zeros = VulkanTensor.zeros((32, 128))
vt_ones = VulkanTensor.ones((16, 64))
vt_empty = VulkanTensor.empty((8, 256))
# Properties
print(vt.shape) # (32, 128)
print(vt.dtype) # float32
print(vt.on_gpu) # True or False
# Transfer between CPU and GPU
vt.upload() # force upload to GPU
arr = vt.numpy() # download to CPU numpy
arr = vt.cpu() # alias for numpy()
VulkanTensor requires Vulkan drivers to be installed. Check grilly.VULKAN_AVAILABLE at runtime. Without Vulkan, grilly still works — it falls back to CPU numpy automatically.
Autograd Variable
For automatic differentiation, grilly provides Variable — a wrapper that tracks operations and computes gradients via backpropagation:
from grilly.nn import Variable, no_grad
import numpy as np
# Create differentiable variables
x = Variable(np.array([1.0, 2.0, 3.0]), requires_grad=True)
w = Variable(np.array([0.5, -0.3, 0.8]), requires_grad=True)
# Forward: build computation graph
y = x * w
loss = y.sum()
# Backward: compute gradients
loss.backward()
print("x.grad:", x.grad) # d(loss)/dx = w
print("w.grad:", w.grad) # d(loss)/dw = x
# Disable gradient tracking
with no_grad():
val = x * 2 # no graph, no gradient
x.grad: [ 0.5 -0.3 0.8] w.grad: [1. 2. 3.]
nn.Module backward methods with explicit gradient passing, which is faster for standard architectures.