Metadata-Version: 2.4
Name: feathertail
Version: 0.6.1
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Rust
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Typing :: Typed
License-File: LICENSE
Summary: A tiny, fast, Rust-backed transformation core for Python table data
Author-email: Odos Matthews <odosmatthews@gmail.com>
Requires-Python: >=3.8
Description-Content-Type: text/markdown; charset=UTF-8; variant=GFM
Project-URL: Documentation, https://github.com/eddiethedean/feathertail#readme
Project-URL: Homepage, https://github.com/eddiethedean/feathertail
Project-URL: Issues, https://github.com/eddiethedean/feathertail/issues
Project-URL: Repository, https://github.com/eddiethedean/feathertail

# 🪶 feathertail

A high-performance Python DataFrame library powered by Rust — designed for flexibility, blazing speed, and intelligent type handling. Built for production with comprehensive features, advanced analytics, and enterprise-grade performance.

---

## ✨ Key Features

### 🚀 **Core DataFrame Operations**
- ✅ Build `TinyFrame` from Python dict records (`from_dicts`)
- ✅ Automatic type inference, including mixed-type and optional columns
- ✅ Intelligent fallback to Python objects when Rust-native types aren't possible (stored by runtime pointer identity for the lifetime of the frame—keep references alive while using `TinyFrame`)
- ✅ Flexible `fillna` to handle missing data
- ✅ Powerful `cast_column` to convert columns between types
- ✅ Smart `edit_column`: edits that automatically adjust column type if needed
- ✅ Drop or rename columns easily
- ✅ Export back to Python dicts (`to_dicts`)

### 🔗 **Advanced Data Operations**
- ✅ **Join Operations**: Inner, left, right, outer, and cross joins
- ✅ **Filtering & Sorting**: Advanced filtering with multiple conditions and multi-column sorting
- ✅ **GroupBy Aggregations**: `TinyGroupBy` with **string key columns** — sum, mean, min, max, std, var, median, first, last, count, size (call each aggregation separately)
- ✅ **Window Functions**: Rolling and expanding window operations
- ✅ **Ranking Functions**: Rank calculation with multiple methods and percentage change

### 📊 **Advanced Analytics**
- ✅ **Descriptive Statistics**: `describe()`, `skew()`, `kurtosis()`, `quantile()`, `mode()`, `nunique()`
- ✅ **Correlation & Covariance**: Full correlation/covariance matrices and pairwise calculations
- ✅ **Time Series Operations**: DateTime parsing (strict: invalid or empty strings raise `ValueError`), component extraction, time differences, and shifting
- ✅ **String Operations**: Case conversion, whitespace removal, replacement, splitting, pattern matching, length, and concatenation
- ✅ **Data Validation**: Not null, range, pattern, uniqueness validation with comprehensive reporting

### ⚡ **Performance & Optimization**
- ✅ **SIMD Operations**: x86_64 optimized numerical operations for blazing speed
- ✅ **Parallel Processing**: Multi-core operations using Rayon for GroupBy, filtering, and sorting
- ✅ **Memory Optimization**: String interning, lazy evaluation, and copy-on-write optimizations
- ✅ **Chunked Processing**: Handle large datasets efficiently with streaming operations
- ✅ **Rust-backed Core**: Lightweight, fast, and dependency-light
- ✅ **Cross-Platform Builds**: Automated CI/CD with pre-built wheels for all major platforms

### 🛠️ **Developer Experience**
- ✅ **Comprehensive Documentation**: Sphinx-generated API docs with tutorials and guides
- ✅ **Logging & Debugging**: Built-in logging system with performance monitoring
- ✅ **Profiling Tools**: Performance profiling and optimization insights
- ✅ **Development Tools**: Pre-commit hooks, automated testing, and development scripts
- ✅ **250+ Comprehensive Tests**: Full test coverage running in well under one second locally

---

## 📦 Installation

```bash
pip install feathertail
```

> **✅ Cross-Platform Support**: Pre-built wheels are available for Python 3.8+ on:
> - **Linux** (x86_64)
> - **macOS** (ARM64/aarch64) 
> - **Windows** (x86_64)

### Building from Source

```bash
# Clone the repository
git clone https://github.com/eddiethedean/feathertail.git
cd feathertail

# Install dependencies and build
pip install maturin
maturin develop --release

# Or install in development mode
pip install -e .
```

---

## 🧑‍💻 Quickstart

### Basic DataFrame Operations

```python
import feathertail as ft

records = [
    {"name": "Alice", "age": 30, "city": "New York", "score": 95.5},
    {"name": "Bob", "age": None, "city": "Paris", "score": 85.0},
    {"name": "Charlie", "age": 25, "city": "New York", "score": None},
]

frame = ft.TinyFrame.from_dicts(records)
print(frame)
```

**Output:**
```
TinyFrame(rows=3, columns=4, cols={ 'name': 'Str', 'age': 'OptInt', 'city': 'Str', 'score': 'OptFloat' })
```

### Advanced Filtering and Sorting

```python
# Filter and sort data
filtered = frame.filter("age", ">", 25)
sorted_frame = frame.sort_values(["city", "age"], ascending=[True, False])
print(sorted_frame.to_dicts())
```

### GroupBy Aggregations

```python
# Group keys must be string columns. Build TinyGroupBy, then aggregate with the frame:
gb = ft.TinyGroupBy(frame, ["city"])
mean_age = gb.mean(frame, "age")
max_score = gb.max(frame, "score")
row_counts = gb.count(frame)
```

### Join Operations

```python
# Inner join with another DataFrame
other_data = [
    {"city": "New York", "population": 8_000_000},
    {"city": "Paris", "population": 2_000_000},
]
other_frame = ft.TinyFrame.from_dicts(other_data)

joined = frame.join(other_frame, "city", "city", "inner")
print(joined.to_dicts())
```

#### Join semantics

- **Composite keys**: A row is used for matching only if **every** join-key column is non-null (SQL-style). If any key component is null, that row does not appear in the join key index.
- **Column names**: The result keeps one name per logical column. If the **same basename** appears as a non-join column on both sides, or if a join-key name on one frame collides with a **non-key** column on the other, feathertail raises `ValueError`—rename on one frame first. Automatic `_x` / `_y` suffixing (pandas-style) is not implemented yet.
- **`cross_join`**: Left and right must have **disjoint** column names; overlaps raise `ValueError`.
- **Python object fallback**: Fallback object storage from **both** sides is merged on join outputs so `to_dicts()` can resolve references. Conflicting reuse of the same internal id for different objects raises a runtime error (very rare).

### Advanced Analytics

```python
# Descriptive statistics
description = frame.describe("score")
print(description.to_dicts())

# Correlation analysis
correlation = frame.corr("age", "score")
print(f"Age-Score correlation: {correlation}")

# Time series operations
time_data = [
    {"timestamp": "2023-01-01 10:00:00", "value": 100},
    {"timestamp": "2023-01-01 11:00:00", "value": 120},
]
time_frame = ft.TinyFrame.from_dicts(time_data)
time_frame = time_frame.to_timestamps("timestamp")  # adds `timestamp_timestamp` (Unix seconds)
time_frame = time_frame.dt_year("timestamp")      # still parses from original string column
print(time_frame.to_dicts())
```

### Window Functions

```python
# Rolling window operations
data = [{"value": i} for i in range(1, 11)]
window_frame = ft.TinyFrame.from_dicts(data)
rolling_mean = window_frame.rolling_mean("value", 3)
print(rolling_mean.to_dicts())
```

### String Operations

```python
# String manipulation
text_data = [{"text": "  hello world  "}, {"text": "foo bar"}]
text_frame = ft.TinyFrame.from_dicts(text_data)
processed = text_frame.str_upper("text").str_strip("text")
print(processed.to_dicts())
```

### Data Validation

```python
# Data quality checks
validation = frame.validate_not_null("age")
validation_summary = frame.validation_summary("age")
print(f"Validation summary: {validation_summary}")
```

---

## 🚀 Performance Features

### SIMD-Accelerated Operations
```python
# Automatic SIMD optimization for numerical operations
large_data = [{"category": "A" if i % 2 == 0 else "B", "value": i * 1.5} for i in range(100000)]
large_frame = ft.TinyFrame.from_dicts(large_data)

# TinyGroupBy keys must be string columns; aggregates run over numeric columns
gb = ft.TinyGroupBy(large_frame, ["category"])
sum_result = gb.sum(large_frame, "value")
```

### Parallel Processing
```python
# Multi-core operations for large datasets
# Automatically uses all available CPU cores
filtered = large_frame.filter("value", ">", 50000)
sorted_data = large_frame.sort_values("value")
```

### Memory Optimization
```python
# String interning and lazy evaluation
# Memory usage is automatically optimized
frame = ft.TinyFrame.from_dicts(records)
# Operations are optimized for memory efficiency
```

---

## 🛠️ Developer Tools

### Logging and Debugging
```python
# Enable comprehensive logging
ft.init_logging_with_config("info", log_memory=True, log_performance=True, log_operations=True)

# Enable debug mode
ft.enable_debug()

# Enable profiling
ft.enable_profiling()

# Your operations will be logged and profiled
frame = ft.TinyFrame.from_dicts(data)
result = frame.filter("age", ">", 25)

# View profiling report
ft.print_profiling_report()
```

### Performance Monitoring
```python
# Get operation statistics
stats = ft.get_operation_stats("filter")
print(f"Filter operations: {stats}")

# Get overall performance metrics
overall_stats = ft.get_overall_stats()
print(f"Total operations: {overall_stats['total_operations']}")
```

---

## ⚙️ Supported Types

| Type      | Column variants    | Description |
|-----------|-------------------|-------------|
| int       | `Int`, `OptInt`    | 64-bit integers with optional null support |
| float     | `Float`, `OptFloat` | 64-bit floats with optional null support |
| bool      | `Bool`, `OptBool`  | Boolean values with optional null support |
| str       | `Str`, `OptStr`    | UTF-8 strings with optional null support |
| mixed     | `Mixed`, `OptMixed` | Mixed types with automatic Python object fallback |

**`cast_column` and strings.** Casting a non-optional `Str` column to `int` or `float` is strict: each cell must parse; otherwise `ValueError` is raised (values are **not** coerced to `0`). Casting optional `OptStr` to numeric optional types maps unparseable strings to missing values where applicable.

---

## 📚 Documentation

- **[Getting Started Guide](docs/getting_started.md)** - Learn the basics
- **[Advanced Usage](docs/advanced_usage.md)** - Complex operations and patterns
- **[API Reference](docs/api/index.rst)** - Complete API documentation
- **[Tutorials](docs/tutorials/index.md)** - Step-by-step learning guides
- **[Contributing](CONTRIBUTING.md)** - How to contribute to the project

---

## 🏗️ Build System & CI/CD

### Automated Cross-Platform Builds
feathertail uses GitHub Actions to automatically build and test wheels for all major platforms:

- **15 build configurations** covering Python 3.8-3.12
- **3 operating systems**: Linux (Ubuntu), macOS (ARM64), Windows
- **Automated testing** with wheel installation verification
- **Artifact management** with 30-day retention
- **PyPI deployment** on version tags

### Build Matrix
| Platform | Python Versions | Architecture |
|----------|----------------|--------------|
| Ubuntu   | 3.8, 3.9, 3.10, 3.11, 3.12 | x86_64 |
| macOS    | 3.8, 3.9, 3.10, 3.11, 3.12 | ARM64 (aarch64) |
| Windows  | 3.8, 3.9, 3.10, 3.11, 3.12 | x86_64 |

### Quality Assurance
- ✅ **Rust compilation** with proper target architecture
- ✅ **Python wheel building** with maturin
- ✅ **CI test matrix**: `pytest` on Python **3.8–3.12** on Ubuntu, macOS, and Windows (release wheels built for the same range)
- ✅ **Installation testing** from temp directories
- ✅ **Import verification** to ensure module works correctly
- ✅ **Cross-platform compatibility** testing

---

## 🧪 Testing

```bash
# Run all tests (Rust + Python; 250+ Python unit tests plus Rust tests)
make test

# Run specific test categories
python -m pytest tests/python/unit/test_tinyframe.py
python -m pytest tests/python/unit/test_joins.py
python -m pytest tests/python/unit/test_analytics.py
```

---

## 🏗️ Building from Source

```bash
# Clone the repository
git clone https://github.com/your-username/feathertail.git
cd feathertail

# Set up development environment
make dev

# Build the package
make build

# Run tests
make test

# Build documentation
make docs
```

---

## 🐉 Why "feathertail"?

In *Fourth Wing*, a "feathertail" is a juvenile dragon — small, golden, and nonviolent, known for grace rather than brute force.  

This library follows the same spirit: gentle on dependencies, elegant in design, and capable of handling complex data types with ease — but with the power and performance of a full-grown dragon when you need it.

---

## 📊 Performance Benchmarks

- **250+ Python unit tests** plus Rust tests run in well under one second locally
- **SIMD-accelerated** numerical operations
- **Parallel processing** for multi-core performance
- **Memory-optimized** with string interning and lazy evaluation
- **Production-ready** with comprehensive error handling and logging

---

## ❤️ Contributing

Contributions, ideas, and feedback are always welcome! Please see our [Contributing Guide](CONTRIBUTING.md) for details.

---

## 📄 License

MIT

---

## 🎯 Roadmap

- [x] **Cross-platform PyPI builds** - ✅ Automated builds for Linux, macOS, and Windows
- [ ] Additional time series functions
- [ ] More statistical distributions
- [ ] Enhanced plotting integration
- [ ] Database connectors
- [ ] Arrow/Parquet integration

---

*Built with ❤️ using Rust and Python*

