Metadata-Version: 2.4
Name: heatmap-sketch
Version: 1.1.0
Summary: A probabilistic data structure for time-decayed popularity tracking
Author-email: Ravi Teja Prabhala Venkata <raviteja.prabhala@gmail.com>
License-Expression: MIT
Project-URL: Homepage, https://github.com/teja/heatmap-sketch
Project-URL: Documentation, https://github.com/teja/heatmap-sketch#readme
Project-URL: Repository, https://github.com/teja/heatmap-sketch
Project-URL: Issues, https://github.com/teja/heatmap-sketch/issues
Keywords: data-structures,bloom-filter,streaming,probabilistic,trending,decay,heatmap
Classifier: Development Status :: 5 - Production/Stable
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Typing :: Typed
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: numpy>=1.20.0
Provides-Extra: dev
Requires-Dist: pytest>=7.0; extra == "dev"
Requires-Dist: pytest-benchmark>=4.0; extra == "dev"
Requires-Dist: mypy>=1.0; extra == "dev"
Requires-Dist: black>=23.0; extra == "dev"
Requires-Dist: ruff>=0.1.0; extra == "dev"
Provides-Extra: numba
Requires-Dist: numba>=0.56.0; extra == "numba"
Dynamic: license-file

# HeatMap Sketch

[![PyPI version](https://badge.fury.io/py/heatmap-sketch.svg)](https://badge.fury.io/py/heatmap-sketch)
[![Python versions](https://img.shields.io/pypi/pyversions/heatmap-sketch.svg)](https://pypi.org/project/heatmap-sketch/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
[![Tests](https://github.com/thecliffhanger/heatmap-sketch/actions/workflows/tests.yml/badge.svg)](https://github.com/thecliffhanger/heatmap-sketch/actions)

**A probabilistic data structure for time-decayed popularity.**

> **Paper forthcoming on arXiv** — Full theoretical analysis and experimental validation of the HeatMap sketch data structure, including decay accuracy bounds, space-error tradeoffs, and comparisons with Count-Min Sketch, Space-Saving, and LFU cache.

## The Problem

You need to track what's trending **right now**, not what was popular historically.

- Counting gives you total frequency (all-time popular)
- Bloom filters tell you if something exists
- LRU tells you what was accessed recently

**None of these tell you "what's hot NOW".**

## The Solution: HeatMap

HeatMap tracks **time-decayed popularity** — items have "heat" that automatically cools over time.

```python
from heatmap_sketch import HeatMap

# Create a HeatMap with 1-minute half-life
hm = HeatMap(half_life_seconds=60.0)

# Add heat to items
hm.add("#python")
hm.add("#ai", amount=10.0)

# Check current heat
print(hm.check("#ai"))  # 10.0

# Get top trending
for item, heat in hm.top(10):
    print(f"{item}: {heat}")

# Clean up
hm.stop()
```

## How It Works

HeatMap implements exponential decay as a differential equation:

```
dH/dt = -λ·H + δ(item)
```

Every contribution has a **half-life**. Old contributions fade. New ones stay hot.

**Key properties:**
- **O(k)** add and check operations (k = number of hash functions)
- **O(m)** space (m = number of slots, typically ~1M)
- **Probabilistic** — may have false positives, but no false negatives
- **Thread-safe** — can be used from multiple threads
- **Auto-decay** — background thread handles time decay

## Installation

```bash
pip install heatmap-sketch
```

**Optional dependencies:**
```bash
# For development
pip install heatmap-sketch[dev]

# For accelerated performance (Numba JIT)
pip install heatmap-sketch[numba]
```

## Quick Start

### Basic Usage

```python
from heatmap_sketch import HeatMap

# Create HeatMap
hm = HeatMap(
    capacity=1_000_000,      # Expected unique items
    error_rate=0.01,         # 1% false positive rate
    half_life_seconds=60.0,  # Heat halves every minute
    decay_interval_seconds=1.0  # Auto-decay every second
)

# Add items
hm.add("#python")
hm.add("#ai", amount=10.0)

# Check heat
heat = hm.check("#ai")  # Returns current heat

# Get trending
trending = hm.top(10)  # Returns top 10 items

# Check if above threshold
if hm.is_hot("#ai", threshold=5.0):
    print("#ai is trending!")

# Always stop when done
hm.stop()
```

### Context Manager

```python
from heatmap_sketch import HeatMap

with HeatMap(half_life_seconds=60.0) as hm:
    hm.add("#python")
    print(hm.check("#python"))
# Automatically stops
```

### Hierarchical HeatMap (Multiple Time Scales)

```python
from heatmap_sketch import HierarchicalHeatMap

# Track at multiple time scales
hm = HierarchicalHeatMap(
    half_lives=(10.0, 60.0, 300.0),  # 10s, 1min, 5min
    weights=(0.5, 0.3, 0.2)          # Weight each scale
)

hm.add("#viral")

# Check heat at each level
levels = hm.check_levels("#viral")
print(f"Short-term: {levels['short']}")
print(f"Long-term: {levels['long']}")

# Combined score
combined = hm.check("#viral")

hm.stop()
```

## API Reference

### `HeatMap`

```python
HeatMap(
    capacity=1_000_000,       # Expected unique items
    error_rate=0.01,          # False positive rate (0-0.5)
    half_life_seconds=60.0,   # Time for heat to halve
    decay_interval_seconds=1.0,  # Auto-decay interval (0 = manual)
    track_candidates=True,    # Enable top-k queries
    seed=None                 # Random seed for reproducibility
)
```

**Methods:**

| Method | Description | Returns |
|--------|-------------|---------|
| `add(item, amount=1.0)` | Add heat to item | None |
| `add_many(items, amount=1.0)` | Bulk add | None |
| `check(item)` | Get current heat | float |
| `top(k)` | Get top-k hottest | `List[Tuple[str, float]]` |
| `is_hot(item, threshold)` | Check if above threshold | bool |
| `decay()` | Manual decay | None |
| `stats()` | Get statistics | Dict |
| `reset()` | Clear all | None |
| `stop()` | Stop auto-decay | None |

**Properties:**
- `m` — Number of slots
- `k` — Number of hash functions
- `half_life` — Half-life in seconds
- `decay_interval` — Decay interval in seconds

### `HierarchicalHeatMap`

```python
HierarchicalHeatMap(
    capacity=1_000_000,
    error_rate=0.01,
    half_lives=(10.0, 60.0, 300.0),  # Multiple half-lives
    weights=None,  # Equal weights by default
    decay_interval_seconds=1.0,
    track_candidates=True,
    seed=None
)
```

**Methods:**
- `add(item, amount=1.0)` — Add to all levels
- `check(item)` — Get combined heat
- `check_levels(item)` — Get heat at each level
- `top(k)` — Get top-k by combined heat

## Comparison to Other Data Structures

| Structure | Question | Decay | Top-k | Space |
|-----------|----------|-------|-------|-------|
| Bloom Filter | Is X in set? | ❌ | ❌ | O(m) |
| Count-Min Sketch | How many X? | ❌ | ❌ | O(m) |
| HyperLogLog | How many unique? | ❌ | ❌ | O(m) |
| Space-Saving | Top-k all-time? | ❌ | ✅ | O(k) |
| LFU Cache | Least frequently used? | ❌ | ❌ | O(n) |
| **HeatMap** | **What's hot now?** | ✅ | ✅ | O(m) |

## Performance

On Apple M4 (8 cores):

| Operation | Rate |
|-----------|------|
| Add (single) | 3,500/sec |
| Add (bulk, 1000 items) | 380/sec |
| Check | 5,400/sec |
| Top-10 (1000 items) | 36,000/sec |

**Memory:** ~2 MB for 100K expected items

## Use Cases

### 1. Social Media Trending

```python
hm = HeatMap(half_life_seconds=300.0)  # 5-minute half-life

for tweet in tweet_stream:
    for hashtag in tweet.hashtags:
        hm.add(hashtag)

# Get trending hashtags
trending = hm.top(10)
```

### 2. Network Monitoring / DDoS Detection

```python
hm = HeatMap(half_life_seconds=10.0)  # 10-second half-life

for packet in network_stream:
    hm.add(packet.source_ip)

# Detect hot IPs (potential attack)
for ip, heat in hm.top(100):
    if heat > THRESHOLD:
        alert(f"Suspicious traffic from {ip}")
```

### 3. E-commerce Trending Products

```python
hm = HeatMap(half_life_seconds=3600.0)  # 1-hour half-life

for page_view in page_views:
    hm.add(page_view.product_id)

# Show trending on homepage
trending_products = [item for item, _ in hm.top(20)]
```

### 4. Rate Limiting

```python
hm = HeatMap(half_life_seconds=60.0)  # 1-minute window

def check_rate_limit(user_id):
    hm.add(user_id)
    _, heat = hm.check(user_id)
    
    if heat > 100:  # Max 100 requests per minute
        raise RateLimitExceeded()
```

## Theoretical Guarantees

### Theorem 1: Decay Accuracy

The discrete approximation converges to the continuous solution:

```
H_discrete(t) → H_continuous(t) as decay_interval → 0
```

### Theorem 2: Space-Error Tradeoff

With m slots and k hash functions for n items:

```
P(false positive) ≤ (1 - e^(-kn/m))^k
```

### Theorem 3: Trend Detection

If item x is truly among the top-k and has heat H_x > θ:

```
P(x ∈ top-k result) ≥ 1 - ε
```

Where ε depends on error_rate and decay_factor.

## Development

```bash
# Clone repo
git clone https://github.com/thecliffhanger/heatmap-sketch.git
cd heatmap-sketch

# Create virtual environment
python -m venv venv
source venv/bin/activate

# Install dev dependencies
pip install -e ".[dev]"

# Run tests (168 tests)
pytest tests/ -v

# Run benchmarks
pytest tests/ --benchmark-only
```

## Citation

If you use HeatMap in your research, please cite:

```bibtex
@software{heatmap_sketch2026,
  title={HeatMap Sketch: A Data Structure for Time-Decayed Popularity},
  author={Teja},
  year={2026},
  url={https://github.com/thecliffhanger/heatmap-sketch}
}
```

## License

MIT

## Author

**Teja**

---

**HeatMap: Because "what's trending" ≠ "what's frequent."**
