Metadata-Version: 2.4
Name: seismoai-qc
Version: 0.2.0
Summary: QC module for seismic trace detection — part of seismoai
Project-URL: Homepage, https://github.com/sunilakiran/seismoai-qc
Requires-Python: >=3.8
Description-Content-Type: text/markdown
Requires-Dist: numpy
Requires-Dist: pandas

# seismoai-qc

Quality Control module for seismic trace detection — part of the **seismoai** pipeline.

Built on real data from the **Forge 2D Survey (2017)**: 166 SGY files, 27,722 total traces.

## Install

```bash
pip install seismoai-qc
```

## Usage

```python
from seismoai_qc import detect_dead_traces, detect_noisy_traces, qc_report
import numpy as np

# traces: 2D numpy array (167, 4001) — output of seismoai_io
report = qc_report(traces)
print(report['label'].value_counts())
# good     21373
# dead      6183
# noisy      166

# Pass to seismoai_model:
labels = report['label']
```

## Functions

| Function | What it does |
|---|---|
| `detect_dead_traces(traces)` | Flags traces with std_dev < 1e-4 (no signal) |
| `detect_noisy_traces(traces)` | Flags traces with max_amp > 50 (spike up to 758) |
| `qc_report(traces)` | Full DataFrame with labels for seismoai_model |

## Real Data Stats

Analyzed across all 166 SGY files (27,722 total traces):

| Category | Count | Percentage |
|---|---|---|
| **Good traces** | 21,373 | 77.1% |
| **Dead traces** | 6,183 | 22.3% |
| **Noisy traces** | 166 | 0.6% |
| **Total** | 27,722 | 100% |

### Why are some traces dead?
Dead traces (std_dev < 0.0001) occur at far offsets where seismic 
source energy does not reach the receiver. In Forge 2D Survey, 22.3% 
traces are dead — mostly far-offset receivers.

### Why do some traces reach 758?
Normal traces stay below 15 (99th percentile = 10.71). Noisy traces 
spike up to 758.22 due to electrical interference or instrument 
malfunction during acquisition. Only 0.6% traces are noisy.

## Thresholds (derived from real data)

```python
# Dead threshold
std_dev < 0.0001
# Dead trace std range: 0.00000021 to 0.0001
# Live trace std range: > 0.001 (clear gap)

# Noisy threshold  
max_amp > 50.0
# Normal 99th percentile: 10.71
# Noisy minimum: 200+ (all 166 noisy traces exceed 200)
# Worst spike: 758.22
```

## How Pair 4 Uses This

```python
from seismoai_io import load_sgy
from seismoai_qc import qc_report

# Load traces
traces = load_sgy("file.sgy")

# Get QC labels
report = qc_report(traces)

# Hand off to seismoai_model
labels = report['label']   # 'good', 'dead', 'noisy'
features = traces[report['is_dead'] == False]  # remove dead
```

## Run Tests

```bash
pytest tests/ -v
```

## Real Dataset Test

```bash
python test_real_data.py
```

## Reflection

We built the seismoai_qc module which detects dead and noisy traces 
in seismic data from the Forge 2D Survey. Dead traces have std_dev 
below 0.0001 — out of 27,722 total traces, 6,183 (22.3%) were dead 
because far-offset receivers did not receive enough source energy. 
Noisy traces have amplitudes above 50, spiking as high as 758, caused 
by electrical interference — we found 166 noisy traces (0.6%). 
We derived these thresholds by analyzing the real dataset rather than 
guessing. Our qc_report() produces labeled output that Pair 4 will use 
directly to train their noise classifier on 21,373 good traces.
