Metadata-Version: 2.4
Name: oven-tensor
Version: 0.1.1
Summary: PyTorch-style tensor operations with CUDA kernels compiled by oven-compiler
Author-email: Sinjin Jeong <sjjeong94@gmail.com>
License-Expression: MIT
Project-URL: Homepage, https://github.com/yourusername/oven-tensor
Project-URL: Bug Reports, https://github.com/yourusername/oven-tensor/issues
Project-URL: Source, https://github.com/yourusername/oven-tensor
Keywords: pytorch,cuda,gpu,tensor,machine-learning,deep-learning,oven
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.7
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: >=3.7
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: numpy<2.0,>=1.19.0
Requires-Dist: pycuda>=2021.1
Requires-Dist: oven-compiler>=0.1.0
Provides-Extra: dev
Requires-Dist: pytest>=6.0; extra == "dev"
Requires-Dist: pytest-cov; extra == "dev"
Requires-Dist: black; extra == "dev"
Requires-Dist: flake8; extra == "dev"
Dynamic: license-file

# Oven-Tensor

A PyTorch-style tensor library with GPU acceleration using CUDA kernels compiled by oven-compiler.

## ✨ Features

- 🚀 **PyTorch-like Interface**: Familiar tensor operations with `.cpu()`, `.gpu()`, `@` operator
- ⚡ **Automatic Kernel Compilation**: Python kernels compiled to PTX using oven-compiler
- 💾 **Smart Caching**: Compiled kernels cached for fast subsequent loads
- 🔄 **CPU/GPU Hybrid**: Seamless switching between NumPy (CPU) and CUDA (GPU)
- 🔧 **Custom Kernels**: Easy to add and execute custom CUDA kernels
- 🎯 **Dynamic Registration**: Register kernels at runtime from code or files

## 📦 Installation

```bash
pip install oven-tensor
```

**Requirements:**
- Python 3.7+
- CUDA-capable GPU
- [oven-compiler](https://github.com/oven-lang/oven) in PATH
- PyCUDA

## 🚀 Quick Start

```python
import oven_tensor as ot

# Create tensors
x = ot.tensor([1.0, 2.0, 3.0, 4.0])
y = ot.tensor([2.0, 3.0, 4.0, 5.0])

# CPU operations (NumPy backend)
z_cpu = x + y
print(z_cpu)  # Tensor([3. 5. 7. 9.], device=cpu)

# GPU operations (CUDA kernels)
x_gpu = x.gpu()
y_gpu = y.gpu()
z_gpu = x_gpu + y_gpu
print(z_gpu.cpu())  # Tensor([3. 5. 7. 9.], device=cpu)

# Matrix multiplication
A = ot.tensor([[1.0, 2.0], [3.0, 4.0]])
B = ot.tensor([[5.0, 6.0], [7.0, 8.0]])
C = A @ B
print(C)  # Tensor([[19. 22.], [43. 50.]], device=cpu)
```

## 📚 API Reference

### Tensor Creation
```python
ot.tensor([1, 2, 3])      # From data
ot.zeros((2, 3))          # Zero tensor  
ot.ones((2, 3))           # Ones tensor
ot.randn((2, 3))          # Random normal
ot.linspace(0, 10, 5)     # Evenly spaced values
```

### Basic Operations
```python
# Unary operations
x.sigmoid(), x.exp(), x.sqrt(), x.abs()
x.sin(), x.cos(), x.log(), x.tanh()

# Binary operations  
x + y, x - y, x * y, x / y, x ** y, x % y

# Matrix operations
A @ B, A.matmul(B), ot.matmul(A, B)
```

### Device Management
```python
x.gpu()                   # Move to GPU
x.cpu()                   # Move to CPU
x.to(ot.device('gpu'))    # Explicit device transfer
```

## 🔧 Custom Kernels

### Basic Usage
```python
# Execute built-in custom kernels
x = ot.tensor([1, 2, 3, 4]).gpu()
result = ot.zeros((4,)).gpu()

ot.execute_kernel("vector_scale", x, result, scale=2.0)
ot.execute_kernel("vector_relu", x, result)
```

### Writing Custom Kernels
Add kernels in `oven_tensor/kernels/`:

```python
# my_kernels.py
import oven.language as ol

def my_kernel(x_ptr: ol.ptr, y_ptr: ol.ptr, factor: float):
    """Scale vector by factor"""
    idx = ol.get_global_id()
    x_val = ol.load(x_ptr, idx)
    y_val = x_val * factor
    ol.store(y_val, y_ptr, idx)
```

### Dynamic Registration

Register kernels at runtime:

```python
# From code string
kernel_code = '''
import oven.language as ol

def runtime_kernel(x_ptr: ol.ptr, y_ptr: ol.ptr, factor: float):
    idx = ol.get_global_id()
    x_val = ol.load(x_ptr, idx)
    y_val = x_val * factor
    ol.store(y_val, y_ptr, idx)
'''

functions = ot.register_kernel_from_code(kernel_code, "my_module", ["runtime_kernel"])
ot.execute_kernel("runtime_kernel", x, result, factor=3.0)

# From file
functions = ot.register_kernel_from_file("my_kernels.py")

# Cleanup
ot.unregister_kernel("runtime_kernel")
```

## 🎛️ Cache Management

```bash
# Command-line tool
oven-tensor-cache list    # List cached functions
oven-tensor-cache clear   # Clear cache
oven-tensor-cache info    # Show cache info
```

```python
# Python API
ot.clear_kernel_cache()           # Clear cache
ot.reload_kernels()               # Reload kernels
ot.list_available_functions()     # List functions
```

## 🧪 Testing

```bash
# Run all tests
./scripts/run_tests.sh

# Specific categories
pytest tests/ -m "not gpu"       # Skip GPU tests
pytest tests/ -m "not slow"      # Skip slow tests
pytest tests/ --cov=oven_tensor  # With coverage
```

## 📄 License

MIT License
