Metadata-Version: 2.4
Name: climate-indepth-analysis
Version: 0.1.0
Summary: A growing toolkit for climate data quality checks, EDA, and analysis.
Author: Kayode Adebayo
License-Expression: MIT
Project-URL: Homepage, https://github.com/Kaysharp-cloud/climate-indepth-analysis
Project-URL: Repository, https://github.com/Kaysharp-cloud/climate-indepth-analysis
Project-URL: Issues, https://github.com/Kaysharp-cloud/climate-indepth-analysis/issues
Keywords: climate,climate-data,missing-data,eda,stations,precipitation,temperature,hydrology
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Science/Research
Classifier: Intended Audience :: Developers
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Atmospheric Science
Classifier: Topic :: Scientific/Engineering :: Hydrology
Requires-Python: >=3.9
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: numpy>=1.22
Requires-Dist: pandas>=1.5
Provides-Extra: parquet
Requires-Dist: pyarrow>=10; extra == "parquet"
Provides-Extra: dev
Requires-Dist: build>=1.0; extra == "dev"
Requires-Dist: twine>=5.0; extra == "dev"
Requires-Dist: pytest>=7.0; extra == "dev"
Requires-Dist: pyarrow>=10; extra == "dev"
Dynamic: license-file

# Climate Indepth Analysis

`climate-indepth-analysis` is a Python package for climate data quality checks, exploratory data analysis, and future climate analysis workflows.

This first release focuses on daily station climate data EDA. It checks how clean a station dataset is by creating a full continuous daily calendar for every station before calculating missing values. This is important because many climate station files skip entire dates instead of storing those dates with `NaN` values.

## Current features

- Count the number of climate stations in a file
- Support custom station ID columns such as `STATION`, `station`, `ID`, `station_id`, or `site_no`
- Support custom date columns such as `DATE`, `date`, `datetime`, or `time`
- Use default climate variables `PRCP`, `TMAX`, and `TMIN`
- Allow users to choose other variables
- Create a full station-date calendar before missing-data calculation
- Detect missing dates that are absent from the raw file
- Summarize missingness overall, by station, and by month
- Report longest missing streaks for each station and variable
- Provide descriptive statistics for selected variables
- Run in Jupyter or from the command line
- Keep all outputs in memory by default
- Save output files only when the user requests it

## Planned direction

This package is designed to grow beyond EDA. Future releases may include tools for climate data download, cleaning, spatial aggregation, trend analysis, drought analysis, precipitation indices, temperature extremes, and visualization.

## Installation

After the package is published to PyPI:

```bash
pip install climate-indepth-analysis
```

For local development from this folder:

```bash
pip install -e .
```

## Jupyter usage

```python
from climate_indepth_analysis import run_eda

results = run_eda(
    input_path="my_climate_data.csv",
    station_col="STATION",
    date_col="DATE",
    needed_cols=["PRCP", "TMAX", "TMIN"],
    start_date="1980-01-01",
    end_date="2025-12-31",
    save_outputs=False,
)

results["overall"]
results["station_summary"].head()
results["monthly"].head()
```

## Save outputs only when needed

By default, the package does not save any file. To save summary tables and a text report, set `save_outputs=True`.

```python
results = run_eda(
    input_path="my_climate_data.csv",
    station_col="ID",
    date_col="date",
    save_outputs=True,
    output_dir="my_climate_eda_results",
)
```

The full station-date calendar can be very large. It is not saved unless you also set `save_full_calendar=True`.

```python
results = run_eda(
    input_path="my_climate_data.csv",
    save_outputs=True,
    output_dir="my_climate_eda_results",
    save_full_calendar=True,
)
```

## Command-line usage

Run without saving files:

```bash
climate-indepth-analysis --input my_climate_data.csv
```

Run with custom station and date columns:

```bash
climate-indepth-analysis --input my_climate_data.csv --station-col ID --date-col date
```

Save output files to a custom folder:

```bash
climate-indepth-analysis --input my_climate_data.csv --save-outputs --output-dir my_climate_eda_results
```

You can also use the shorter command:

```bash
climate-eda --input my_climate_data.csv
```

## Default settings

```python
station_col = "STATION"
date_col = "DATE"
needed_cols = ["PRCP", "TMAX", "TMIN"]
start_date = "1980-01-01"
end_date = "2025-12-31"
save_outputs = False
output_dir = "climate_indepth_analysis_output"
```

## Output dictionary

`run_eda()` returns a dictionary with these tables:

```python
results["diagnostics"]
results["inventory"]
results["overall"]
results["station_summary"]
results["monthly"]
results["variable_stats"]
results["full_calendar"]
```

## Files saved when `save_outputs=True`

```text
00_cleanliness_report.txt
01_file_calendar_diagnostics.csv
02_station_inventory_coverage.csv
03_overall_missing_summary.csv
04_station_missing_summary.csv
05_monthly_missing_summary_long.csv
06_variable_descriptive_stats.csv
```

If `save_full_calendar=True`, the package also saves:

```text
07_full_station_date_calendar.parquet
```

If Parquet support is unavailable, it falls back to CSV.

## Build and upload to PyPI

Install build tools:

```bash
python -m pip install --upgrade build twine
```

Build the package:

```bash
python -m build
```

Check the package:

```bash
python -m twine check dist/*
```

Upload to TestPyPI first:

```bash
python -m twine upload --repository testpypi dist/*
```

Upload to PyPI:

```bash
python -m twine upload dist/*
```

## License

MIT License.
