Metadata-Version: 2.4
Name: seismoai-m1m2
Version: 0.1.0
Summary: SeismoAI Module 1 and Module 2: SGY loading and seismic visualization
Author: Wasif Ali Pervez
License: MIT
Project-URL: Homepage, https://github.com/wasifalipervez1993-pixel/seismoai-m1m2
Project-URL: Repository, https://github.com/wasifalipervez1993-pixel/seismoai-m1m2
Keywords: seismic,SEG-Y,SGY,geophysics,signal processing,visualization
Requires-Python: >=3.10
Description-Content-Type: text/markdown
Requires-Dist: segyio
Requires-Dist: numpy
Requires-Dist: pandas
Requires-Dist: matplotlib

# SeismoAI: Modules 1 and 2

![Python](https://img.shields.io/badge/Python-3.10%2B-blue)
![Package](https://img.shields.io/badge/Package-src%20layout-success)
![Tests](https://img.shields.io/badge/Tests-7%20passed-brightgreen)
![Dataset](https://img.shields.io/badge/Dataset-166%20SGY%20files-orange)
![Status](https://img.shields.io/badge/Status-Submission%20Ready-success)

---

## Module Assignment

This project implements:

- **Module 1 – `seismoai_io`**: loading and preparing SEG-Y seismic data
- **Module 2 – `seismoai_viz`**: visualizing seismic gathers, traces, and frequency spectra

These modules form the **data ingestion and visualization layer** of the SeismoAI pipeline and provide the foundation for later quality control, modeling, and explainability tasks.

---

## Project Objective

The objective of this project is to design and implement a clean, reusable, and well-tested Python package for working with real seismic SEG-Y (`.sgy`) data.

As required in the assignment, the workflow followed was:

1. Understand the dataset
2. Inspect headers and trace structure
3. Analyze amplitude distribution and data characteristics
4. Implement robust I/O functions
5. Develop visualization tools
6. Validate the implementation on the complete dataset

The provided dataset contains:

- **166 SEG-Y files**
- **167 traces per file**
- **4001 samples per trace**
- **1 ms sampling interval**

---

## Architecture Overview

```mermaid
flowchart LR
    A[SEG-Y Files on Disk] --> B[Module 1: seismoai_io]
    B --> B1[load_sgy]
    B --> B2[load_folder]
    B --> B3[normalize_traces]
    B --> B4[validate_dataset]

    B --> C[Module 2: seismoai_viz]
    C --> C1[plot_gather]
    C --> C2[plot_trace]
    C --> C3[plot_spectrum]

    B --> D[Exploration Script]
    B --> E[Full Dataset Analysis]
    D --> F[Trace Statistics]
    D --> G[Header Inspection]
    E --> H[CSV Summaries]
    E --> I[Diagnostic Plots]

    C --> J[Readable Seismic Visuals]
    H --> K[analysis_outputs]
    I --> K
```

### Processing View

```text
SEG-Y files
   ↓
load_sgy / load_folder
   ↓
trace + header extraction
   ↓
dataset understanding and validation
   ↓
normalization (if needed)
   ↓
visualization:
  - gather image
  - waveform
  - frequency spectrum
```

---

## Project Structure

```text
seismoai_final/
│
├── README.md
├── pyproject.toml
├── requirements.txt
├── .gitignore
│
├── data/
│   └── (dataset kept locally, not uploaded to GitHub)
│
├── analysis_outputs/
│   ├── dataset_summary.csv
│   ├── header_comparison.csv
│   ├── max_abs_per_file.png
│   ├── std_per_file.png
│   └── weak_trace_count_per_file.png
│
├── scripts/
│   ├── seismoai_explore.py
│   └── analyze_all_sgy_files.py
│
├── src/
│   └── seismoai_m1m2/
│       ├── __init__.py
│       ├── seismoai_io.py
│       └── seismoai_viz.py
│
├── tests/
│   ├── test_seismoai_io.py
│   └── test_seismoai_viz.py
│
└── docs/
    └── reflection.txt
```

---

## Implemented Functions

## Module 1 – `seismoai_io`

### `load_sgy(file_path)`
Loads a single SEG-Y file and returns:

- seismic traces as a `numpy.ndarray`
- extracted trace headers as a `pandas.DataFrame`

### `load_folder(folder_path)`
Loads all `.sgy` files from a directory and returns a list of structured results containing:

- file name
- file path
- trace data
- header data

### `normalize_traces(traces, method="maxabs")`
Normalizes seismic amplitudes using:

- **max-absolute normalization**
- **z-score normalization**

### `validate_dataset(folder_path)`
Checks the complete dataset and reports, for each file:

- loading status
- number of traces
- number of samples
- shape
- header structure
- summary statistics

### Additional utilities
The module also includes helper functions for:

- trace summary statistics
- header summary extraction
- per-trace diagnostics
- near-dead trace detection

---

## Module 2 – `seismoai_viz`

### `plot_gather(traces)`
Displays the seismic gather as a 2D image using a seismic colormap.

### `plot_trace(traces, trace_index)`
Plots the waveform of an individual trace.

### `plot_spectrum(trace, sample_interval_ms=1.0)`
Computes and displays the frequency spectrum of a selected trace using FFT.

---

## Dataset Understanding

A full dataset-wide analysis was performed on **all 166 SEG-Y files**.

## Validation Results

- All **166 files loaded successfully**
- All files have shape **(167, 4001)**
- All files contain **89 header columns**
- No **NaN** values were detected
- No **Inf** values were detected
- Statistical behavior is highly consistent across files

## Key Observations

- The amplitude distribution is **highly skewed**
- Strong **outliers (around 758 amplitude)** are present
- Many traces contain **very low energy**
- A large number of traces are classified as **near-dead** under simple low-variance thresholds
- Header structure is consistent across the full dataset

## Interpretation

These findings suggest that:

- the dataset is structurally reliable
- normalization is necessary for robust downstream use
- direct raw-range plotting is not ideal because large outliers dominate visualization
- **percentile-based clipping** significantly improves seismic gather readability

---

## Sample Output Visuals

If these files are kept in the repository, GitHub will display them directly:

### Maximum Absolute Amplitude per File
![Maximum Absolute Amplitude](analysis_outputs/max_abs_per_file.png)

### Standard Deviation per File
![Standard Deviation per File](analysis_outputs/std_per_file.png)

### Weak Trace Count per File
![Weak Trace Count](analysis_outputs/weak_trace_count_per_file.png)

---

## Installation

Install dependencies and register the package locally:

```bash
pip install -r requirements.txt
pip install -e .
```

---

## Usage

### Load a single SEG-Y file

```python
from seismoai_m1m2.seismoai_io import load_sgy

traces, headers = load_sgy("path/to/file.sgy")
print(traces.shape)
print(headers.head())
```

### Load a folder of SEG-Y files

```python
from seismoai_m1m2.seismoai_io import load_folder

files = load_folder("path/to/data_folder")
print(len(files))
```

### Normalize traces

```python
from seismoai_m1m2.seismoai_io import normalize_traces

normalized_traces = normalize_traces(traces, method="maxabs")
```

### Plot the seismic gather

```python
from seismoai_m1m2.seismoai_viz import plot_gather

plot_gather(traces, clip_mode="percentile")
```

### Plot one trace and its spectrum

```python
from seismoai_m1m2.seismoai_viz import plot_trace, plot_spectrum

plot_trace(traces, trace_index=2)
plot_spectrum(traces[2], sample_interval_ms=1.0)
```

---

## Running the Scripts

### Explore one representative file

```bash
python scripts/seismoai_explore.py
```

This script:

- loads one sample file
- prints trace summary statistics
- prints header summary
- computes trace diagnostics
- reports threshold sensitivity for near-dead traces
- validates the full dataset
- visualizes the gather, waveform, and spectrum

### Analyze the complete dataset

```bash
python scripts/analyze_all_sgy_files.py
```

This script:

- checks all 166 files
- generates a CSV summary
- compares header structure
- creates dataset-level plots
- saves all outputs in `analysis_outputs/`

---

## Testing

Run all tests from the project root:

```bash
pytest -q
```

### Current result
- **7 tests passed successfully**

The tests cover:

- single-file loading
- folder loading
- normalization
- full dataset validation
- gather plotting
- trace plotting
- spectrum plotting

---

## Why This Submission Meets the Assignment Requirements
- All implemented functions include clear and well-structured Python docstrings as required.
- working functions for **Module 1** and **Module 2**
- real SGY data support
- docstrings for core functions
- tests for all required functions
- dataset understanding before model-related steps
- validation across the full set of **166 SEG-Y files**
- professional project structure suitable for GitHub and packaging

---

