Metadata-Version: 2.4
Name: aikosh
Version: 0.1.0
Summary: Python SDK for the AIKosh platform (datasets, models, and more).
Author: AIKosh SDK contributors
License: Apache-2.0
Project-URL: Homepage, https://aikosh.indiaai.gov.in/home
Keywords: aikosh,indiaai,datasets,machine-learning
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Typing :: Typed
Requires-Python: >=3.9
Description-Content-Type: text/markdown
Requires-Dist: httpx<1,>=0.27
Provides-Extra: dev
Requires-Dist: pytest>=7.4; extra == "dev"

# AIKosh SDK (Python)

Open-source Python SDK for working with the AIKosh platform: **discover datasets and models, inspect files, and download assets** (whole packages or individual files).

## What is AIKosh SDK?

AIKosh SDK is a **developer-friendly Python library** that wraps the AIKosh platform’s APIs into **simple, stable functions** you can call from notebooks, scripts, and applications.

It’s designed to feel like an “ML developer tool” library (similar in spirit to libraries like `transformers`), where common workflows—search, browse files, download—are one import away.

## Why use an SDK (instead of calling APIs directly)?

Using an SDK helps developers by providing:

- **Simpler usage**: no manual URL construction, headers, or response parsing in every script.
- **Consistent patterns**: same function shapes for datasets and models (`list_directory`, `list_files`, `get_metadata`, `download`).
- **Safer downloads**: automatically fetches fresh temporary URLs and streams downloads to disk.
- **One place to evolve**: when the backend evolves, updating the SDK updates every downstream user.

## What this SDK aspires to help you build

Over time, the goal is to make it easy to build:

- **Repeatable data/model pipelines**: programmatic discovery + download for training/evaluation.
- **Dataset/model exploration tools**: list and traverse file trees for validation and QA.
- **Automation**: integrate AIKosh assets into CI workflows and internal platforms.

## Install

From TestPyPI (current distribution name):

```bash
pip install aikosh
```

For contributors / local development:

```bash
pip install -e ".[dev]"
```

## Configuration

### API key

Option A — environment variable:

```python
import os
os.environ["AIKOSH_API_KEY"] = "YOUR_KEY"
```

Option B — in code:

```python
import aikosh
aikosh.set_api_key("YOUR_KEY")
```

(`AIKOSH_ACCESS_KEY` is also supported.)

## Asset identifiers (`id`)

List and metadata responses expose each dataset or model under the **`id`** field. Use that value everywhere the SDK expects an **`identifier`** (or the first argument to domain helpers like `list_files`).

```python
import aikosh

out = aikosh.list_directory("dataset", filters={"page": 1, "size": 10})
items = out["data"]["items"]  # shape depends on API; each item has "id"
dataset_id = items[0]["id"]

out = aikosh.list_directory("model", filters={"page": 1, "size": 10})
model_id = out["data"]["items"][0]["id"]
```

Do not use human-readable slugs where the API expects the platform **`id`**.

## Download request parameters

| Parameter | Role |
|-----------|------|
| `identifier` | Dataset or model **`id`** from list/metadata |
| `type` | `"dataset"` or `"model"` |
| `destination_path` | **Local** folder or file path where the download is saved |
| `file_path` | **Remote** file path inside the asset (single-file download) |
| `directory_path` | **Remote** folder/path inside the asset (single-file download; can be combined with `filename`) |
| `filename` | Optional local output name; with `directory_path`, also joins the remote path |
| `version_id` | Optional version **`id`** when the API supports multiple versions |
| `max_workers` | Batch downloads only: parallel workers (default **4**, maximum **4**) |

**`directory_path` is not a local save path** — use **`destination_path`** for that.

## Quickstart

### 1) Check connectivity

```python
import aikosh
print(aikosh.ping())  # dataset filters endpoint
```

### 2) Discover available functions

```python
import aikosh
aikosh.list_functions()
# Returns: {"aikosh": {...}, "aikosh.datasets": {...}, "aikosh.models": {...}}
```

### 3) Filter master (codes for list filters)

```python
import aikosh
aikosh.get_datasets_filter_info()
# Returns : {"status": "success", "message": "filters endpoint reachable",
  "data": {
    "organisationList": [{id:..., name:...},{}..]
	"sectorsList": [{id:..., name:...},{}..]
	"licensesList": [{id:..., name:...},{}..]
	"datasetTypesList": [{id:..., name:...},{}..]
	
aikosh.get_models_filter_info()
# Returns : {"status": "success", "message": "filters endpoint reachable",
  "data": {
    "organisationList": [{id:..., name:...},{}..]
	"sectorsList": [{id:..., name:...},{}..]
	"licensesList": [{id:..., name:...},{}..]
	"modelTypesList": [{id:..., name:...},{}..]
```

### 4) List datasets or models

```python
import aikosh

out = aikosh.list_directory(
    "dataset",
    filters={"page": 1, "size": 20, "keyword": "sanskrit"},
)
print(out["data"])

out = aikosh.list_directory(
    "model",
    filters={"page": 1, "size": 20, "keyword": "Bhashini", "modelType": [374,375]},
)
print(out["data"])

# If filters match nothing, check the SDK message (status stays "success"):
if out.get("message"):
    print(out["message"])
```

#### Dataset list filters

```python
out = aikosh.list_directory(
    "dataset",
    filters={
        "page": 1,
        "size": 20,
        "license": [213,214],
        "sector": [3228,209],
        "fileFormat": ["csv", "json"],
        "versionScore": 3,
        "keyword": "Krishi",
    },
)
```

### 5) Get metadata (datasets and models)

One function with `type` set to `"dataset"` or `"model"`:

```python
import aikosh

dataset_id = "PUT_DATASET_ID_HERE"  # from list response item["id"]
model_id = "PUT_MODEL_ID_HERE"

print(aikosh.get_metadata("dataset", dataset_id)["data"])
print(aikosh.get_metadata("model", model_id)["data"])
```

Aliases: `aikosh.get_dataset_metadata(dataset_id)` and `aikosh.get_model_metadata(model_id)`.

### 6) List files (datasets and models)

`directory_path` in **filters** is the **remote** folder inside the asset (`""` for root).

```python
import aikosh

dataset_id = "PUT_DATASET_ID_HERE"
model_id = "PUT_MODEL_ID_HERE"

aikosh.list_files(
    "dataset",
    dataset_id,
    filters={"directory_path": "", "page": 1, "limit": 50},
)

aikosh.list_files(
    "model",
    model_id,
    filters={"directory_path": "", "page": 1, "limit": 50},
)
```

Domain shortcuts: `aikosh.datasets.list_files(dataset_id, ...)` and `aikosh.models.list_files(model_id, ...)`.

### 7) Download

#### Whole dataset or model

```python
import aikosh

dataset_id = "PUT_DATASET_ID_HERE"
out = aikosh.download(
    {
        "identifier": dataset_id,
        "type": "dataset",
        "destination_path": "./downloads",
    }
)

model_id = "PUT_MODEL_ID_HERE"
out = aikosh.download(
    {
        "identifier": model_id,
        "type": "model",
        "destination_path": "./downloads/models",
        # "version_id": "OPTIONAL_VERSION_ID",
    }
)
```

#### Single file (remote path + local destination)

```python
out = aikosh.download(
    {
        "identifier": dataset_id,
        "type": "dataset",
        "file_path": "documents/report.pdf",
        "destination_path": "./downloads/files",
        "filename": "report_copy.pdf",
    }
)

# Or remote folder + file name
out = aikosh.download(
    {
        "identifier": model_id,
        "type": "model",
        "directory_path": "weights/",
        "filename": "model.bin",
        "destination_path": "./downloads/models/files",
    }
)
```

#### Batch download

Pass a **list** of download request dicts. Concurrency is controlled with `max_workers` (default **4**; values above **4** are capped at **4**).

```python
out = aikosh.download(
    [
        {"identifier": "DATASET_ID_1", "type": "dataset", "destination_path": "./downloads"},
        {"identifier": "DATASET_ID_2", "type": "dataset", "destination_path": "./downloads"},
    ],
    max_workers=4,  # optional; maximum allowed is 4
)
print(out["status"])  # success | partial_success | failed
print(out["items"])
```

## Modules (what to import)

- **`import aikosh`**: most users only need this (high-level journey functions).
- **`import aikosh.datasets`**: dataset journey + raw HTTP helpers.
- **`import aikosh.models`**: model journey + raw HTTP helpers.
- **`from aikosh.datasets import api as ds_api`**: advanced usage (parsed `data` from HTTP).
- **`from aikosh.models import api as models_api`**: same for models.

## Reference: top-level package (`import aikosh`)

| Function | Typical use |
|----------|-------------|
| `set_api_key` / `set_access_key` | Store API key in-process (also reads env vars). |
| `get_access_key` | Read the configured key (if any). |
| `get_metadata(type, identifier, ...)` | Metadata for datasets or models. |
| `get_dataset_metadata(identifier, ...)` | Same as `get_metadata("dataset", identifier, ...)`. |
| `get_model_metadata(identifier, ...)` | Same as `get_metadata("model", identifier, ...)`. |
| `list_directory(type, filters=..., ...)` | List datasets or models. |
| `list_files(type, identifier, filters=..., ...)` | List files inside a dataset or model. |
| `download(request, ..., max_workers=...)` | Download dataset or model (single dict or batch list; `max_workers` default 4, max 4). |
| `to_json(data, ...)` | Serialize nested structures to a JSON string. |
| `ping(...)` | Connectivity check (dataset filters endpoint). |
| `list_functions(...)` | List user-facing functions and one-line descriptions (auto-generated). |
| `get_datasets_filter_info()` | Dataset filter master (codes for list filters). |
| `get_models_filter_info()` | Model filter master (codes for list filters). |
| `__version__` | Installed package version string. |

## Reference: `aikosh.models`

### Journey (user-facing)

| Function | Purpose |
|----------|---------|
| `list_directory(filters=..., ...)` | List models (`page`, `size`, `license`, `sector`, `fileFormat`, `modelType`, `keyword`). |
| `get_metadata(model_id, ...)` | Model metadata by **`id`**. |
| `list_files(model_id, filters=..., ...)` | Remote file tree (`directory_path`, optional `version_id`, `page`, `limit`). |
| `download(request, ..., max_workers=...)` | Download whole model or one file (batch supported; `max_workers` default 4, max 4). |
| `ping(...)` | Model filters connectivity check. |
| `to_json(data, ...)` | JSON helper. |

### Low-level API

| Function | Purpose |
|----------|---------|
| `require_uuid_string(name, value)` | Validate identifier format before API calls. |
| `get_filters`, `list_models`, `get_model_metadata`, `list_file_details` | Raw HTTP wrappers. |
| `get_model_download_url`, `get_file_download_url`, `stream_download_url_to_path` | Presigned URLs and streaming to disk. |

## Reference: `aikosh.datasets`

### Journey

| Function | Purpose |
|----------|---------|
| `list_directory("dataset", filters=..., ...)` | List datasets. |
| `get_metadata("dataset", dataset_id, ...)` | Dataset metadata by **`id`**. |
| `get_dataset_metadata_journey(dataset_id, ...)` | Same as `get_metadata("dataset", ...)`. |
| `list_files(dataset_id, filters=..., ...)` | Remote file tree for a dataset. |
| `download(..., max_workers=...)` | Dataset downloads (single or batch; `max_workers` default 4, max 4). |
| `ping(...)` | Dataset filters connectivity check. |
| `to_json(...)` | JSON helper. |

### Low-level API

`get_filters`, `list_datasets`, `get_dataset_metadata`, `list_file_details`, `get_dataset_version_download_url`, `get_file_download_url`, `stream_download_url_to_path`, `require_uuid_string`.

## Notes and limitations

- **Use `id` from API responses** as `identifier` in download/metadata/list_files calls.
- **Batch downloads**: `max_workers` defaults to **4** and cannot exceed **4**.
- **Unified top-level APIs**: `list_directory`, `get_metadata`, `list_files`, and `download` all accept `type="dataset"` or `type="model"`.
- **Model-specific shortcuts**: `aikosh.models.list_files`, `aikosh.models.ping`, etc., when you prefer not to pass `type`.

## Troubleshooting

- **401 / Invalid API key**: re-check `AIKOSH_API_KEY` or `aikosh.set_api_key(...)`.
- **422 invalid id**: pass the **`id`** from `list_directory(...)` / metadata, not a slug or display name.
- **No results from list search**: if `list_directory` is called with filters (e.g. `keyword`) and nothing matches, the response includes a **`message`** field (e.g. `"No dataset found with the passed filters; try a different combination."`). Pagination-only calls (`page` / `size` alone) do not add this message.
- **Wrong download location**: use `destination_path` for local saves; `directory_path` is only for remote paths inside the asset.

## License

Apache-2.0
