Metadata-Version: 2.4
Name: joyfuljay
Version: 0.1.5
Summary: Python library for extracting ML-ready features from encrypted network traffic
Project-URL: Homepage, https://github.com/cenab/joyfuljay
Project-URL: Documentation, https://joyfuljay.readthedocs.io
Project-URL: Repository, https://github.com/cenab/joyfuljay
Project-URL: Issues, https://github.com/cenab/joyfuljay/issues
Project-URL: Changelog, https://github.com/cenab/joyfuljay/blob/main/CHANGELOG.md
Author: JoyfulJay Contributors
License: MIT License
        
        Copyright (c) 2025 JoyfulJay Contributors
        
        Permission is hereby granted, free of charge, to any person obtaining a copy
        of this software and associated documentation files (the "Software"), to deal
        in the Software without restriction, including without limitation the rights
        to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
        copies of the Software, and to permit persons to whom the Software is
        furnished to do so, subject to the following conditions:
        
        The above copyright notice and this permission notice shall be included in all
        copies or substantial portions of the Software.
        
        THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
        IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
        FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
        AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
        LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
        OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
        SOFTWARE.
License-File: LICENSE
Keywords: QUIC,TLS,encrypted,features,machine-learning,network,pcap,security,traffic
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Security
Classifier: Topic :: System :: Networking :: Monitoring
Classifier: Typing :: Typed
Requires-Python: >=3.10
Requires-Dist: click>=8.0.0
Requires-Dist: numpy>=1.24.0
Requires-Dist: pandas>=2.0.0
Requires-Dist: scapy>=2.5.0
Provides-Extra: accelerated
Requires-Dist: cython>=3.0.0; extra == 'accelerated'
Provides-Extra: all
Requires-Dist: cryptography>=41.0; extra == 'all'
Requires-Dist: cython>=3.0.0; extra == 'all'
Requires-Dist: dpkt>=1.9.8; extra == 'all'
Requires-Dist: ipython>=8.0; extra == 'all'
Requires-Dist: ipywidgets>=8.0; extra == 'all'
Requires-Dist: kafka-python>=2.0; extra == 'all'
Requires-Dist: lz4>=4.0; extra == 'all'
Requires-Dist: matplotlib>=3.7; extra == 'all'
Requires-Dist: msgpack>=1.0.0; extra == 'all'
Requires-Dist: networkx>=3.0; extra == 'all'
Requires-Dist: plotly>=5.0; extra == 'all'
Requires-Dist: prometheus-client>=0.17.0; extra == 'all'
Requires-Dist: psutil>=5.9; extra == 'all'
Requires-Dist: psycopg>=3.1; extra == 'all'
Requires-Dist: pyarrow>=12.0; extra == 'all'
Requires-Dist: pyyaml>=6.0; extra == 'all'
Requires-Dist: rich>=13.0; extra == 'all'
Requires-Dist: watchdog>=3.0; extra == 'all'
Requires-Dist: websockets>=12.0; extra == 'all'
Requires-Dist: zeroconf>=0.131.0; extra == 'all'
Provides-Extra: arrow
Requires-Dist: pyarrow>=12.0; extra == 'arrow'
Provides-Extra: compression
Requires-Dist: lz4>=4.0; extra == 'compression'
Provides-Extra: crypto
Requires-Dist: cryptography>=41.0; extra == 'crypto'
Provides-Extra: db
Requires-Dist: psycopg>=3.1; extra == 'db'
Provides-Extra: dev
Requires-Dist: build>=1.0; extra == 'dev'
Requires-Dist: cryptography>=41.0; extra == 'dev'
Requires-Dist: cython>=3.0.0; extra == 'dev'
Requires-Dist: hypothesis>=6.0; extra == 'dev'
Requires-Dist: ipython>=8.0; extra == 'dev'
Requires-Dist: ipywidgets>=8.0; extra == 'dev'
Requires-Dist: lz4>=4.0; extra == 'dev'
Requires-Dist: matplotlib>=3.7; extra == 'dev'
Requires-Dist: msgpack>=1.0.0; extra == 'dev'
Requires-Dist: mypy>=1.0; extra == 'dev'
Requires-Dist: networkx>=3.0; extra == 'dev'
Requires-Dist: pandas-stubs>=2.0; extra == 'dev'
Requires-Dist: plotly>=5.0; extra == 'dev'
Requires-Dist: prometheus-client>=0.17.0; extra == 'dev'
Requires-Dist: psutil>=5.9; extra == 'dev'
Requires-Dist: pyarrow>=12.0; extra == 'dev'
Requires-Dist: pytest-cov>=4.0; extra == 'dev'
Requires-Dist: pytest>=7.0; extra == 'dev'
Requires-Dist: pyyaml>=6.0; extra == 'dev'
Requires-Dist: rich>=13.0; extra == 'dev'
Requires-Dist: ruff>=0.1.0; extra == 'dev'
Requires-Dist: setuptools>=65.0; extra == 'dev'
Requires-Dist: twine>=5.0; extra == 'dev'
Requires-Dist: types-click>=7.1; extra == 'dev'
Requires-Dist: types-psutil>=5.9.0; extra == 'dev'
Requires-Dist: types-pyyaml>=6.0.0; extra == 'dev'
Requires-Dist: types-setuptools>=65.0.0; extra == 'dev'
Requires-Dist: watchdog>=3.0; extra == 'dev'
Requires-Dist: websockets>=12.0; extra == 'dev'
Provides-Extra: discovery
Requires-Dist: zeroconf>=0.131.0; extra == 'discovery'
Provides-Extra: docs
Requires-Dist: mkdocs-git-revision-date-localized-plugin>=1.2; extra == 'docs'
Requires-Dist: mkdocs-glightbox>=0.3; extra == 'docs'
Requires-Dist: mkdocs-material>=9.0; extra == 'docs'
Requires-Dist: mkdocs-minify-plugin>=0.7; extra == 'docs'
Requires-Dist: mkdocs>=1.5; extra == 'docs'
Requires-Dist: mkdocstrings[python]>=0.24; extra == 'docs'
Provides-Extra: dpkt
Requires-Dist: dpkt>=1.9.8; extra == 'dpkt'
Provides-Extra: fast
Requires-Dist: dpkt>=1.9.8; extra == 'fast'
Provides-Extra: graphs
Requires-Dist: networkx>=3.0; extra == 'graphs'
Provides-Extra: jupyter
Requires-Dist: ipython>=8.0; extra == 'jupyter'
Requires-Dist: ipywidgets>=8.0; extra == 'jupyter'
Requires-Dist: matplotlib>=3.7; extra == 'jupyter'
Requires-Dist: plotly>=5.0; extra == 'jupyter'
Provides-Extra: kafka
Requires-Dist: kafka-python>=2.0; extra == 'kafka'
Provides-Extra: libpcap
Requires-Dist: python-libpcap>=0.4.0; extra == 'libpcap'
Provides-Extra: monitoring
Requires-Dist: prometheus-client>=0.17.0; extra == 'monitoring'
Provides-Extra: pid
Requires-Dist: psutil>=5.9; extra == 'pid'
Provides-Extra: postgres
Requires-Dist: psycopg>=3.1; extra == 'postgres'
Provides-Extra: progress
Requires-Dist: rich>=13.0; extra == 'progress'
Provides-Extra: remote
Requires-Dist: msgpack>=1.0.0; extra == 'remote'
Requires-Dist: websockets>=12.0; extra == 'remote'
Provides-Extra: sqlite
Provides-Extra: watch
Requires-Dist: watchdog>=3.0; extra == 'watch'
Provides-Extra: yaml
Requires-Dist: pyyaml>=6.0; extra == 'yaml'
Description-Content-Type: text/markdown

<div align="center">

<img src="docs/assets/images/logo.png" alt="JoyfulJay Logo" width="200">

# JoyfulJay - Encrypted Traffic Feature Extraction

[![CI](https://github.com/cenab/joyfuljay/actions/workflows/ci.yml/badge.svg)](https://github.com/cenab/joyfuljay/actions/workflows/ci.yml)
[![PyPI version](https://badge.fury.io/py/joyfuljay.svg)](https://badge.fury.io/py/joyfuljay)
[![Python 3.10+](https://img.shields.io/badge/python-3.10+-blue.svg)](https://www.python.org/downloads/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)

![JoyfulJay](https://img.shields.io/badge/JoyfulJay-387%20Features-blue?style=flat-square)
![ML Ready](https://img.shields.io/badge/ML-Research%20Ready-22D3EE?style=flat-square)
![Encrypted Traffic](https://img.shields.io/badge/Encrypted-TLS%20%2F%20QUIC-success?style=flat-square)
![Research Tool](https://img.shields.io/badge/Use-Academic%20Research-informational?style=flat-square)

</div>

**JoyfulJay** is a Python library for extracting standardized, ML-ready features from encrypted network traffic. It operates on PCAP files and live network interfaces, producing feature vectors that capture timing, size, and protocol metadata patterns - all without decrypting any traffic.

## Features

- **Encrypted Traffic Focus**: Extract features proven effective for classifying TLS, QUIC, VPN, and Tor traffic
- **ML-Ready Output**: Pandas DataFrames, NumPy arrays, CSV, JSON, or Parquet - ready for scikit-learn, PyTorch, etc.
- **Streaming Architecture**: Process multi-GB PCAPs without loading them into memory
- **Live Capture**: Real-time feature extraction from network interfaces
- **Remote Capture**: Stream packets from remote devices over secure WebSocket (TLS/WSS)
- **Protocol Metadata**: TLS handshake parsing, JA3/JA3S fingerprints, QUIC metadata
- **Traffic Fingerprinting**: Detect Tor, VPN, and DoH traffic patterns
- **Tranalyzer Compatible**: 387 features across 21 extractors, matching research-grade tools
- **Enterprise Ready**: Kafka streaming, Prometheus metrics, mDNS discovery

## Installation

```bash
pip install joyfuljay
# or
uv pip install joyfuljay
```

For optional features (same syntax works with `uv pip`):

```bash
# Fast parsing with dpkt
pip install joyfuljay[fast]

# High-speed capture with libpcap
pip install joyfuljay[libpcap]

# Kafka streaming output
pip install joyfuljay[kafka]

# Prometheus metrics
pip install joyfuljay[monitoring]

# mDNS server discovery
pip install joyfuljay[discovery]

# Connection graph analysis
pip install joyfuljay[graphs]

# All optional features
pip install joyfuljay[fast,kafka,monitoring,discovery,graphs]
```

## Quick Start

### Python API

```python
from joyfuljay import extract_features_from_pcap

# Extract features from a PCAP file
features_df = extract_features_from_pcap("capture.pcap")

print(features_df.shape)
print(features_df.columns.tolist())
print(features_df.head())
```

### Command Line

```bash
# Extract features to CSV
jj extract capture.pcap -o features.csv

# Live capture for 60 seconds
jj live eth0 --duration 60 -o live_features.csv

# Output as JSON
jj extract capture.pcap -o features.json --format json
```

## Feature Groups

| Group | Features |
|-------|----------|
| **Flow Metadata** | 5-tuple, duration, packet/byte counts |
| **Timing** | Inter-arrival time statistics, burst metrics |
| **Size** | Packet length statistics, payload bytes |
| **TLS** | Version, cipher suite, SNI, JA3/JA3S fingerprints |
| **QUIC** | Version, ALPN, connection IDs |
| **Padding** | Fixed-size detection, constant-rate detection |
| **Fingerprint** | Tor/VPN/DoH classification |
| **TCP Analysis** | Flags, handshake, sequence/window analysis |
| **MAC/Layer 2** | Source/dest MAC, VLAN, Ethernet type |
| **ICMP** | Type/code, echo success ratio |
| **Connection Graphs** | Fan-out, communities, centrality (requires `[graphs]`) |

## Remote Capture

Stream packets from a remote device (e.g., Android phone, Raspberry Pi) to your analysis machine:

```bash
# On the capture device - start server with TLS
jj serve wlan0 --tls-cert server.crt --tls-key server.key --announce

# On your machine - discover and connect
jj discover                    # Find servers on LAN
jj connect jj://192.168.1.50:8765?token=xxx&tls=1 -o features.csv
```

## Kafka Streaming

Stream features directly to Kafka for real-time pipelines:

```python
from joyfuljay.output.kafka import KafkaWriter

with KafkaWriter("localhost:9092", topic="network-features") as writer:
    for features in extract_features_streaming("capture.pcap"):
        writer.write(features)
```

## Prometheus Metrics

Export processing metrics for monitoring:

```python
from joyfuljay.monitoring import PrometheusMetrics, start_prometheus_server

metrics = PrometheusMetrics()
start_prometheus_server(9090)  # Scrape at http://localhost:9090/metrics
```

## Requirements

- Python 3.10+
- scapy >= 2.5.0
- pandas >= 2.0.0
- numpy >= 1.24.0

## Cross-Platform Support

| Feature | Linux | macOS | Windows |
|---------|-------|-------|---------|
| PCAP file processing | ✅ | ✅ | ✅ |
| Live capture | ✅ | ✅ | ✅ (requires [Npcap](https://npcap.com/)) |

Check your system status with:
```bash
jj status
```

## Documentation

Full documentation: [docs.joyfuljay.com](https://docs.joyfuljay.com/en/stable/)

## Citation

If you use JoyfulJay in academic research, please cite:

```bibtex
@software{joyfuljay2025,
  title = {{JoyfulJay}: Encrypted Traffic Feature Extraction Library},
  year = {2025},
  publisher = {GitHub},
  url = {https://github.com/cenab/joyfuljay}
}
```

## License

MIT License - see [LICENSE](LICENSE) for details.
