Metadata-Version: 2.4
Name: kaggleease
Version: 1.3.1
Summary: The fastest, notebook-first way to load Kaggle datasets into pandas with one line.
Author: KaggleEase Contributors
License: MIT License
        
        Copyright (c) 2025 KaggleEase Contributors
        
        Permission is hereby granted, free of charge, to any person obtaining a copy
        of this software and associated documentation files (the "Software"), to deal
        in the Software without restriction, including without limitation the rights
        to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
        copies of the Software, and to permit persons to whom the Software is
        furnished to do so, subject to the following conditions:
        
        The above copyright notice and this permission notice shall be included in all
        copies or substantial portions of the Software.
        
        THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
        IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
        FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
        AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
        LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
        OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
        SOFTWARE.
Project-URL: Homepage, https://github.com/Dinesh-raya/kaagleease
Project-URL: Repository, https://github.com/Dinesh-raya/kaagleease
Project-URL: Documentation, https://github.com/Dinesh-raya/kaagleease#readme
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.7
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Topic :: Scientific/Engineering :: Information Analysis
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: >=3.7
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: click>=7.0
Requires-Dist: ipython>=7.0.0
Requires-Dist: kagglehub>=0.1.0
Requires-Dist: pandas>=1.0.0
Requires-Dist: pyarrow>=1.0.0
Requires-Dist: requests>=2.25.0
Requires-Dist: psutil>=5.0.0
Provides-Extra: dev
Requires-Dist: pytest>=6.0; extra == "dev"
Requires-Dist: black>=21.0; extra == "dev"
Requires-Dist: flake8>=3.8; extra == "dev"
Dynamic: license-file

# KaggleEase

The fastest, notebook-first way to load Kaggle datasets into pandas with one line.

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/)

---

## Installation

```bash
pip install kaggleease
```

## Quick Start

```python
from kaggleease import load
# Intelligent handle resolution - searches and finds 'kaggle/titanic' for you
df = load("titanic") 
```

## ✨ New in v1.3.0: Intelligent Loader
KaggleEase is now "Intelligent." It doesn't just fail; it helps you fix your mistakes.

- **Fuzzy Handle Matching**: Made a typo? `load("titanik")` will suggest `titanic`.
- **Implicit Slugs**: No need for owners. `load("housing")` finds the most relevant dataset.
- **Actionable Errors**: In notebooks, errors appear as clean Markdown blocks with **💡 Fix Suggestions**.
- **Zero-Dependency REST**: Purged the heavy `kaggle` library (~10MB) for a lightweight 50KB core.

## Usage in Google Colab

### 1. Install the package
In a Colab cell, run:
```python
!pip install kaggleease
```

### 2. Use the module
```python
from kaggleease import load

df = load("titanic")
print(df.head())
```

### 3. Magic commands in Colab
```python
# Load dataset into a variable named 'df'
%kaggle load titanic --as df

# Preview a dataset
%kaggle preview titanic

# Search for datasets
%kaggle search "credit risk"
```

## Usage in Local Jupyter Notebooks

### 1. Install the package
```bash
pip install kaggleease
```

### 2. Set up Kaggle credentials
- Go to https://www.kaggle.com/account
- Download your `kaggle.json` file
- Place it in `~/.kaggle/kaggle.json` (or `%USERPROFILE%\.kaggle\kaggle.json` on Windows)
- Set file permissions to 600 (read/write for owner only)

### 3. Use the module
```python
from kaggleease import load

df = load("titanic")
print(df.head())
```

### 4. Magic commands in Jupyter
```python
# Load dataset into a specific variable
%kaggle load titanic --as df

# Preview dataset
%kaggle preview titanic

# Search for datasets
%kaggle search "credit risk"
```

## Advanced Features

### Progress Indication
Large downloads show progress bars automatically:
```python
from kaggleease import load
df = load("large-dataset", timeout=600)  # Progress shown for large files
```

### Thread-Safe Authentication
Multiple concurrent operations are supported safely:
```python
import threading
from kaggleease import load

def load_dataset(dataset):
    return load(dataset)

# Safe to run concurrently
threads = [threading.Thread(target=lambda: load_dataset("dataset")) for _ in range(5)]
for t in threads: t.start()
for t in threads: t.join()
```

### Retry Logic with Exponential Backoff
Network failures are automatically retried:
- First retry after 1s
- Second retry after 2s
- Third retry after 4s

## Notebook Magic

`KaggleEase` comes with powerful notebook magics:

```python
# Load the titanic dataset into a variable named 'df'
%kaggle load titanic --as df --timeout 600

# Preview the first few rows of a dataset
%kaggle preview titanic --timeout 30

# Search for datasets
%kaggle search "credit risk" --timeout 30

# Load specific file from dataset
%kaggle load kaggle/titanic --file train.csv --as train_df
```

## Command Line Interface

KaggleEase provides a comprehensive CLI:

```bash
# Load a dataset with custom timeout
kaggleease load kaggle/titanic --timeout 600

# Preview a dataset
kaggleease preview kaggle/titanic

# Search for datasets with result limit
kaggleease search "credit risk" --top 10

# Load specific file from dataset
kaggleease load kaggle/titanic --file train.csv

# 🚀 NEW: Setup shell completion
kaggleease completion --shell zsh # or bash/fish
```

## API Reference

### `load(dataset_handle, file=None, timeout=300)`

Load a Kaggle dataset into a pandas DataFrame with automatic progress indication for large files.

**Parameters:**
- `dataset_handle` (str): The Kaggle dataset handle in the format "owner/dataset-name"
- `file` (str, optional): Specific file to load from the dataset
- `timeout` (int): Timeout in seconds for API calls and downloads (default: 300)

**Returns:**
- `pandas.DataFrame`: The loaded dataset

**Features:**
- Automatic progress indication for files > 100MB
- Thread-safe authentication
- Retry logic with exponential backoff
- Structured logging

### `search(query, top=5, timeout=30)`

Search for Kaggle datasets.

**Parameters:**
- `query` (str): Search query
- `top` (int): Maximum number of results to return (default: 5)
- `timeout` (int): Timeout in seconds for the search operation (default: 30)

**Returns:**
- `list`: List of dataset information dictionaries

### `ProgressBar` and `show_progress`

Progress indication utilities for large downloads:
```python
from kaggleease.progress import ProgressBar

progress = ProgressBar(1000, "Download")
progress.update(500)  # Update with downloaded bytes
progress.complete()
```

## Troubleshooting

### Authentication Error
If you encounter authentication errors, make sure you have your Kaggle API credentials set up:

1. Go to https://www.kaggle.com/account and download your `kaggle.json` file
2. Place it in `~/.kaggle/kaggle.json` (or `%USERPROFILE%\.kaggle\kaggle.json` on Windows)
3. Ensure the file has restricted permissions (600)

### Large Dataset Warning
When loading large datasets (>5GB), you'll see a warning. Consider if you really need the entire dataset or if you can work with a subset.

### Timeout Errors
If you're experiencing timeout errors, try increasing the timeout value:
```python
# Increase timeout to 600 seconds
df = load("titanic", timeout=600)
```

### Progress Indication
For large downloads, progress is shown automatically. For manual control:
```python
from kaggleease.progress import show_progress
progress = show_progress(250, 1000, "Download")  # 250 bytes of 1000 total
```

### Thread Safety
The library is thread-safe for concurrent operations. Multiple threads can safely call `load()` simultaneously.

## Security Considerations

- Authentication credentials are stored securely with proper file permissions (600)
- Thread-safe authentication prevents credential corruption in concurrent environments
- All network operations include timeout protection

---

*(GIF placeholder: A short animation showing the `%kaggle load titanic` magic in action)*
