Metadata-Version: 2.4
Name: kaggleease
Version: 1.3.8
Summary: The fastest, notebook-first way to load Kaggle datasets into pandas with one line.
Author: KaggleEase Contributors
License: MIT License
        
        Copyright (c) 2025 KaggleEase Contributors
        
        Permission is hereby granted, free of charge, to any person obtaining a copy
        of this software and associated documentation files (the "Software"), to deal
        in the Software without restriction, including without limitation the rights
        to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
        copies of the Software, and to permit persons to whom the Software is
        furnished to do so, subject to the following conditions:
        
        The above copyright notice and this permission notice shall be included in all
        copies or substantial portions of the Software.
        
        THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
        IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
        FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
        AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
        LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
        OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
        SOFTWARE.
Project-URL: Homepage, https://github.com/Dinesh-raya/kaagleease
Project-URL: Repository, https://github.com/Dinesh-raya/kaagleease
Project-URL: Documentation, https://github.com/Dinesh-raya/kaagleease#readme
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.7
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Topic :: Scientific/Engineering :: Information Analysis
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: >=3.7
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: click>=7.0
Requires-Dist: ipython>=7.0.0
Requires-Dist: kagglehub>=0.1.0
Requires-Dist: pandas>=1.0.0
Requires-Dist: pyarrow>=1.0.0
Requires-Dist: requests>=2.25.0
Requires-Dist: psutil>=5.0.0
Provides-Extra: dev
Requires-Dist: pytest>=6.0; extra == "dev"
Requires-Dist: black>=21.0; extra == "dev"
Requires-Dist: flake8>=3.8; extra == "dev"
Dynamic: license-file

# KaggleEase

The fastest, notebook-first way to load Kaggle datasets into pandas with one line.

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/)

---

## Installation

```bash
pip install kaggleease
```

## Quick Start

```python
!pip install kaggleease --upgrade
from kaggleease import load, search, __version__
print(__version__) # Should be 1.3.8

df = load("titanic") # Now handles competitions automatically!
```

## Usage in Google Colab

### 1. Install the package
In a Colab cell, run:
```python
!pip install kaggleease
```

### 2. Use the module
```python
from kaggleease import load

df = load("titanic")
print(df.head())
```

### 3. Magic commands in Colab
```python
# Load dataset into a variable named 'df'
%kaggle load titanic --as df

# Preview a dataset
%kaggle preview titanic

# Search for datasets
%kaggle search "credit risk"
```

## Usage in Local Jupyter Notebooks

### 1. Install the package
```bash
pip install kaggleease
```

### 2. Set up Kaggle credentials
- Go to https://www.kaggle.com/account
- Download your `kaggle.json` file
- Place it in `~/.kaggle/kaggle.json` (or `%USERPROFILE%\.kaggle\kaggle.json` on Windows)
- Set file permissions to 600 (read/write for owner only)

### 3. Use the module
```python
from kaggleease import load

df = load("titanic")
print(df.head())
```

### 4. Magic commands in Jupyter
```python
# Load dataset into a specific variable
%kaggle load titanic --as df

# Preview dataset
%kaggle preview titanic

# Search for datasets
%kaggle search "credit risk"
```

## Advanced Features

### Progress Indication
Large downloads show progress bars automatically:
```python
from kaggleease import load
df = load("large-dataset", timeout=600)  # Progress shown for large files
```

### Thread-Safe Authentication
Multiple concurrent operations are supported safely:
```python
import threading
from kaggleease import load

def load_dataset(dataset):
    return load(dataset)

# Safe to run concurrently
threads = [threading.Thread(target=lambda: load_dataset("dataset")) for _ in range(5)]
for t in threads: t.start()
for t in threads: t.join()
```

### Retry Logic with Exponential Backoff
Network failures are automatically retried:
- First retry after 1s
- Second retry after 2s
- Third retry after 4s

## Notebook Magic

`KaggleEase` comes with powerful notebook magics:

```python
# Load the titanic dataset into a variable named 'df'
%kaggle load titanic --as df --timeout 600

# Preview the first few rows of a dataset
%kaggle preview titanic --timeout 30

# Search for datasets
%kaggle search "credit risk" --timeout 30

# Load specific file from dataset
%kaggle load kaggle/titanic --file train.csv --as train_df
```

## Command Line Interface

KaggleEase provides a comprehensive CLI:

```bash
# Load a dataset with custom timeout
kaggleease load kaggle/titanic --timeout 600

# Preview a dataset
kaggleease preview kaggle/titanic

# Search for datasets with result limit
kaggleease search "credit risk" --top 10

# Load specific file from dataset
kaggleease load kaggle/titanic --file train.csv
```

## API Reference

### `load(dataset_handle, file=None, timeout=300)`

Load a Kaggle dataset into a pandas DataFrame with automatic progress indication for large files.

**Parameters:**
- `dataset_handle` (str): The Kaggle dataset handle in the format "owner/dataset-name"
- `file` (str, optional): Specific file to load from the dataset
- `timeout` (int): Timeout in seconds for API calls and downloads (default: 300)

**Returns:**
- `pandas.DataFrame`: The loaded dataset

**Features:**
- Automatic progress indication for files > 100MB
- Thread-safe authentication
- Retry logic with exponential backoff
- Structured logging

### `search(query, top=5, timeout=30)`

Search for Kaggle datasets.

**Parameters:**
- `query` (str): Search query
- `top` (int): Maximum number of results to return (default: 5)
- `timeout` (int): Timeout in seconds for the search operation (default: 30)

**Returns:**
- `list`: List of dataset information dictionaries

### `ProgressBar` and `show_progress`

Progress indication utilities for large downloads:
```python
from kaggleease.progress import ProgressBar

progress = ProgressBar(1000, "Download")
progress.update(500)  # Update with downloaded bytes
progress.complete()
```

## Troubleshooting

### Authentication Error
If you encounter authentication errors, make sure you have your Kaggle API credentials set up:

1. Go to https://www.kaggle.com/account and download your `kaggle.json` file
2. Place it in `~/.kaggle/kaggle.json` (or `%USERPROFILE%\.kaggle\kaggle.json` on Windows)
3. Ensure the file has restricted permissions (600)

### Large Dataset Warning
When loading large datasets (>5GB), you'll see a warning. Consider if you really need the entire dataset or if you can work with a subset.

### Timeout Errors
If you're experiencing timeout errors, try increasing the timeout value:
```python
# Increase timeout to 600 seconds
df = load("titanic", timeout=600)
```

### Progress Indication
For large downloads, progress is shown automatically. For manual control:
```python
from kaggleease.progress import show_progress
progress = show_progress(250, 1000, "Download")  # 250 bytes of 1000 total
```

### Thread Safety
The library is thread-safe for concurrent operations. Multiple threads can safely call `load()` simultaneously.

## Intelligence Features (v1.3.8+)

KaggleEase is now **Universally Resilient**:
- **Automatic Competition Detection**: `load("titanic")` now works perfectly.
- **Universal Formats**: Native support for **CSV, Parquet, JSON, Excel** (`.xlsx`, `.xls`), and **SQLite** (`.sqlite`, `.db`).
- **No-Crash Fallback**: If a dataset contains images or other non-tabular files, `load()` returns the **local directory path** instead of crashing.
- **Deep Scanning**: Finds your data even if it's buried in subdirectories.
- **Implicit Resolution**: Still resolves `load("slug")` to the best owner/slug match.

---
