Metadata-Version: 2.4
Name: kaggle-utils-dataset
Version: 0.1.0b1
Summary: Automated workflow for dataset discovery and downloading in ML/DL and data analysis projects.
Author: Nobrain711
License: MIT License
        
        Copyright (c) 2026 Nobrain711
        
        Permission is hereby granted, free of charge, to any person obtaining a copy
        of this software and associated documentation files (the "Software"), to deal
        in the Software without restriction, including without limitation the rights
        to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
        copies of the Software, and to permit persons to whom the Software is
        furnished to do so, subject to the following conditions:
        
        The above copyright notice and this permission notice shall be included in all
        copies or substantial portions of the Software.
        
        THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
        IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
        FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
        AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
        LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
        OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
        SOFTWARE.
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: kaggle>=1.5
Dynamic: license-file

# kaggle_utils (beta 0.1.0)

![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)
![Python](https://img.shields.io/badge/Python-3.10%2B-blue)
![Status](https://img.shields.io/badge/status-beta-orange)
![GitHub stars](https://img.shields.io/github/stars/USERNAME/REPO?style=social)

Utilities for checking, downloading, and extracting Kaggle datasets
within ML/DL and data analysis workflows.

---

## Features

* Check whether a dataset directory exists and list contained files
* Download and unzip Kaggle datasets using Kaggle CLI
* Configurable dataset root directory via environment variable
* Lightweight, reusable design for notebook and local workflows

---

## Requirements

* Python 3.10+
* Kaggle CLI installed and authenticated

### Install Kaggle CLI

```
pip install kaggle
```

> Ensure the kaggle command is available in your PATH.

Configure your Kaggle API token:

1. Download kaggle.json from your Kaggle account
2. Place it in:

```
~/.kaggle/kaggle.json
```

Ensure correct permissions:

```bash
chmod 600 ~/.kaggle/kaggle.json
```

---

## Installation

### Local (Editable Mode – Recommended for Development)

From the project root:

```bash
pip install -e .
```

---

## Configuration

Dataset root directory defaults to:

```nginx
datasets
```

Override it via environment variable:

macOS / Linux:

```bash
export KAGGLE_INPUT_DIR="datasets"
```

Windows (PowerShell):

```PowerShell
setx KAGGLE_INPUT_DIR "datasets"
```

---

## Usage

### 1️⃣ Check Local Dataset

```python
from kaggle_utils import check

result = check("titanic")

if not result:
    print("Dataset not found or empty.")
else:
    print(result)
```

Example return:

```python
{
    "datasets/titanic": ["train.csv", "test.csv"]
}
```

---

### 2️⃣ Download Dataset

```python
from kaggle_utils import download

files = download(owner="heptapod", dataset="titanic")

print(files)
```

If the dataset does not exist or the CLI fails,   
a `RuntimeError` will be raised.

---

## Error Handling

* check() → returns empty dict {} if dataset is missing
* download() → raises RuntimeError on failure
* Kaggle Notebook environments are not supported

---

## Project Structure

```
kaggle_utils/
  __init__.py
  config.py
  check.py
  download.py
```

---

## Design Philosophy

* Separation of concerns
* Minimal external dependencies
* Clear failure handling strategy
* Designed for local ML experimentation

---

## Roadmap

### beta 0.2.0

* download_if_not_exists() orchestration helper
* Logging support
* Retry mechanism

### 1.0.0

* Stable API
* PyPI publication

---

## License

![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)

Distributed under the MIT License.
See the  [LICENSE](LICENSE)  for more information.

---
