Metadata-Version: 2.4
Name: aikosh
Version: 1.1.1
Summary: Python SDK for the AIKosh platform (datasets, models, and more).
Author: AIKosh SDK contributors
License: Apache-2.0
Project-URL: Homepage, https://aikosh.indiaai.gov.in/home
Keywords: aikosh,indiaai,datasets,machine-learning
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Typing :: Typed
Requires-Python: >=3.9
Description-Content-Type: text/markdown
Requires-Dist: httpx<1,>=0.27
Provides-Extra: dev
Requires-Dist: pytest>=7.4; extra == "dev"

# AIKosh SDK (Python)

Open-source Python SDK for working with the AIKosh platform: **discover datasets and models, inspect files, and download assets** (whole packages or individual files).

## What is AIKosh SDK?

AIKosh SDK is a **developer-friendly Python library** that wraps the AIKosh platform’s APIs into **simple, stable functions** you can call from notebooks, scripts, and applications.

It’s designed to feel like an “ML developer tool” library (similar in spirit to libraries like `transformers`), where common workflows—search, browse files, download—are one import away.

## Why use an SDK (instead of calling APIs directly)?

Using an SDK helps developers by providing:

- **Simpler usage**: no manual URL construction, headers, or response parsing in every script.
- **Consistent patterns**: same function shapes for datasets and models (`list_directory`, `list_files`, `get_metadata`, `download`).
- **Safer downloads**: automatically fetches fresh temporary URLs and streams downloads to disk.
- **One place to evolve**: when the backend evolves, updating the SDK updates every downstream user.

## What this SDK aspires to help you build

Over time, the goal is to make it easy to build:

- **Repeatable data/model pipelines**: programmatic discovery + download for training/evaluation.
- **Dataset/model exploration tools**: list and traverse file trees for validation and QA.
- **Automation**: integrate AIKosh assets into CI workflows and internal platforms.

## Install

From PyPI (current distribution name):

```bash
pip install aikosh
```

For contributors / local development:

```bash
pip install -e ".[dev]"
```

## Configuration

### API key

Creating and Managing Your API Key :

The API key feature on AIKosh empowers you to securely access and integrate platform datasets into your applications and workflows. By generating a personal API key, you can automate data retrieval, build custom analytical pipelines, and programmatically interact with the platform's resources.

Steps to be followed:
1. For creation of API key, login to AIKosh platform, click on My profile on top right corner, and go on Account settings. 
2. Click on Create API key. A modal will show the Unique API key. Users may copy and store it as the unique API key will only be created once. Copy and securely store the key, as it will be used in your application headers for authenticated requests. 
3. Once the API is generated, the same is shown in encrypted format which can be generated before creation of the new key. 

For reference [Click Here](https://aikosh.indiaai.gov.in/public/files/selfcare/uploads-b7ba741d-ca01-426b-b539-6fea3e01d98c/User_Manual_AIKosh.pdf#page=41)

Option A — environment variable:

```python
import os
os.environ["AIKOSH_API_KEY"] = "YOUR_KEY"
```

Option B — in code:

```python
import aikosh
aikosh.set_api_key("YOUR_KEY")
```

(`AIKOSH_ACCESS_KEY` is also supported.)

## Asset identifiers (`id`)

List and metadata responses expose each dataset or model under the **`id`** field. Use that value everywhere the SDK expects an **`identifier`** (or the first argument to domain helpers like `list_files`).

```python
import aikosh

out = aikosh.list_directory("dataset", filters={"page": 1, "size": 10,"accessScope" : "all"})
items = out["data"]["items"]  # shape depends on API; each item has "id"
dataset_id = items[0]["id"]

out = aikosh.list_directory("model", filters={"page": 1, "size": 10, "accessScope" : "all"})
model_id = out["data"]["items"][0]["id"]
```

Do not use human-readable slugs where the API expects the platform **`id`**.
Note : "accessScope" is default "permitted" and shows only open datasets/models, if user need exhaustive list of dataset/models use {"accessScope" : "all"} filter option.

## Download request parameters

| Parameter | Role |
|-----------|------|
| `identifier` | Dataset or model **`id`** from list/metadata |
| `type` | `"dataset"` or `"model"` |
| `destination_path` | **Local** folder or file path where the download is saved |
| `file_path` | **Remote** file path inside the asset (single-file download) |
| `directory_path` | **Remote** folder/path inside the asset (single-file download; can be combined with `filename`) |
| `filename` | Optional local output name; with `directory_path`, also joins the remote path |
| `version_id` | Optional version **`id`** when the API supports multiple versions |
| `max_workers` | Batch downloads only: parallel workers (default **4**, maximum **4**) |

**`directory_path` is not a local save path** — use **`destination_path`** for that.

## Quickstart

### 1) Check connectivity

```python
import aikosh
print(aikosh.ping())  # dataset filters endpoint
```

### 2) Discover available functions

```python
import aikosh
aikosh.list_functions()
# Returns: {"aikosh": {...}, "aikosh.datasets": {...}, "aikosh.models": {...}}
```

### 3) Filter master (codes for list filters)

```python
import aikosh
aikosh.get_datasets_filter_info()
# Returns : {"status": "success", "message": "filters endpoint reachable",
  "data": {
    "organisationList": [{id:..., name:...},{}..]
	"sectorsList": [{id:..., name:...},{}..]
	"licensesList": [{id:..., name:...},{}..]
	"datasetTypesList": [{id:..., name:...},{}..]
	
aikosh.get_models_filter_info()
# Returns : {"status": "success", "message": "filters endpoint reachable",
  "data": {
    "organisationList": [{id:..., name:...},{}..]
	"sectorsList": [{id:..., name:...},{}..]
	"licensesList": [{id:..., name:...},{}..]
	"modelTypesList": [{id:..., name:...},{}..]
```

### 4) List datasets or models

```python
import aikosh

out = aikosh.list_directory(
    "dataset",
    filters={"page": 1, "size": 20, "keyword": "sanskrit","accessScope" : "all"},
)
print(out["data"])

out = aikosh.list_directory(
    "model",
    filters={"page": 1, "size": 20, "keyword": "Bhashini", "modelType": [374,375],"accessScope" : "all"},
)
print(out["data"])

# If filters match nothing, check the SDK message (status stays "success"):
if out.get("message"):
    print(out["message"])
```
Note : "accessScope" is default "permitted" and shows only open datasets/models, if user need exhaustive list of dataset/models use {"accessScope" : "all"} filter option.

#### Dataset list filters

```python
out = aikosh.list_directory(
    "dataset",
    filters={
        "page": 1,
        "size": 20,
        "license": [213,214],
        "sector": [3228,209],
        "fileFormat": ["csv", "json"],
        "versionScore": 3,
        "keyword": "Krishi",
        "accessScope" : "all",
    },
)
```
Note : "accessScope" is default "permitted" and shows only open datasets/models, if user need exhaustive list of dataset/models use {"accessScope" : "all"} filter option.

### 5) Get metadata (datasets and models)

One function with `type` set to `"dataset"` or `"model"`:

```python
import aikosh

dataset_id = "PUT_DATASET_ID_HERE"  # from list response item["id"]
model_id = "PUT_MODEL_ID_HERE"

print(aikosh.get_metadata("dataset", dataset_id)["data"])
print(aikosh.get_metadata("model", model_id)["data"])
```

Aliases: `aikosh.get_dataset_metadata(dataset_id)` and `aikosh.get_model_metadata(model_id)`.

### 6) List files (datasets and models)

`directory_path` in **filters** is the **remote** folder inside the asset (`""` for root).

```python
import aikosh

dataset_id = "PUT_DATASET_ID_HERE"
model_id = "PUT_MODEL_ID_HERE"

aikosh.list_files(
    "dataset",
    dataset_id,
    filters={"directory_path": "", "page": 1, "limit": 50},
)

aikosh.list_files(
    "model",
    model_id,
    filters={"directory_path": "", "page": 1, "limit": 50},
)
```

Domain shortcuts: `aikosh.datasets.list_files(dataset_id, ...)` and `aikosh.models.list_files(model_id, ...)`.

### 7) Download

#### Whole dataset or model

```python
import aikosh

dataset_id = "PUT_DATASET_ID_HERE"
out = aikosh.download(
    {
        "identifier": dataset_id,
        "type": "dataset",
        "destination_path": "./downloads",
    }
)

model_id = "PUT_MODEL_ID_HERE"
out = aikosh.download(
    {
        "identifier": model_id,
        "type": "model",
        "destination_path": "./downloads/models",
        # "version_id": "OPTIONAL_VERSION_ID",
    }
)
```

#### Single file (remote path + local destination)

```python
out = aikosh.download(
    {
        "identifier": dataset_id,
        "type": "dataset",
        "file_path": "documents/report.pdf",
        "destination_path": "./downloads/files",
        "filename": "report_copy.pdf",
    }
)

# Or remote folder + file name
out = aikosh.download(
    {
        "identifier": model_id,
        "type": "model",
        "directory_path": "weights/",
        "filename": "model.bin",
        "destination_path": "./downloads/models/files",
    }
)
```

#### Batch download

Pass a **list** of download request dicts. Concurrency is controlled with `max_workers` (default **4**; values above **4** are capped at **4**).

```python
out = aikosh.download(
    [
        {"identifier": "DATASET_ID_1", "type": "dataset", "destination_path": "./downloads"},
        {"identifier": "DATASET_ID_2", "type": "dataset", "destination_path": "./downloads"},
    ],
    max_workers=4,  # optional; maximum allowed is 4
)
print(out["status"])  # success | partial_success | failed
print(out["items"])
```

## Modules (what to import)

- **`import aikosh`**: most users only need this (high-level journey functions).
- **`import aikosh.datasets`**: dataset journey + raw HTTP helpers.
- **`import aikosh.models`**: model journey + raw HTTP helpers.
- **`from aikosh.datasets import api as ds_api`**: advanced usage (parsed `data` from HTTP).
- **`from aikosh.models import api as models_api`**: same for models.

## Reference: top-level package (`import aikosh`)

| Function | Typical use |
|----------|-------------|
| `set_api_key` / `set_access_key` | Store API key in-process (also reads env vars). |
| `get_access_key` | Read the configured key (if any). |
| `get_metadata(type, identifier, ...)` | Metadata for datasets or models. |
| `get_dataset_metadata(identifier, ...)` | Same as `get_metadata("dataset", identifier, ...)`. |
| `get_model_metadata(identifier, ...)` | Same as `get_metadata("model", identifier, ...)`. |
| `list_directory(type, filters=..., ...)` | List datasets or models. |
| `list_files(type, identifier, filters=..., ...)` | List files inside a dataset or model. |
| `download(request, ..., max_workers=...)` | Download dataset or model (single dict or batch list; `max_workers` default 4, max 4). |
| `to_json(data, ...)` | Serialize nested structures to a JSON string. |
| `ping(...)` | Connectivity check (dataset filters endpoint). |
| `list_functions(...)` | List user-facing functions and one-line descriptions (auto-generated). |
| `get_datasets_filter_info()` | Dataset filter master (codes for list filters). |
| `get_models_filter_info()` | Model filter master (codes for list filters). |
| `__version__` | Installed package version string. |

## Reference: `aikosh.models`

### Journey (user-facing)

| Function | Purpose |
|----------|---------|
| `list_directory(filters=..., ...)` | List models (`page`, `size`, `license`, `sector`, `fileFormat`, `modelType`, `keyword`). |
| `get_metadata(model_id, ...)` | Model metadata by **`id`**. |
| `list_files(model_id, filters=..., ...)` | Remote file tree (`directory_path`, optional `version_id`, `page`, `limit`). |
| `download(request, ..., max_workers=...)` | Download whole model or one file (batch supported; `max_workers` default 4, max 4). |
| `ping(...)` | Model filters connectivity check. |
| `to_json(data, ...)` | JSON helper. |

### Low-level API

| Function | Purpose |
|----------|---------|
| `require_uuid_string(name, value)` | Validate identifier format before API calls. |
| `get_filters`, `list_models`, `get_model_metadata`, `list_file_details` | Raw HTTP wrappers. |
| `get_model_download_url`, `get_file_download_url`, `stream_download_url_to_path` | Presigned URLs and streaming to disk. |

## Reference: `aikosh.datasets`

### Journey

| Function | Purpose |
|----------|---------|
| `list_directory("dataset", filters=..., ...)` | List datasets. |
| `get_metadata("dataset", dataset_id, ...)` | Dataset metadata by **`id`**. |
| `get_dataset_metadata_journey(dataset_id, ...)` | Same as `get_metadata("dataset", ...)`. |
| `list_files(dataset_id, filters=..., ...)` | Remote file tree for a dataset. |
| `download(..., max_workers=...)` | Dataset downloads (single or batch; `max_workers` default 4, max 4). |
| `ping(...)` | Dataset filters connectivity check. |
| `to_json(...)` | JSON helper. |

### Low-level API

`get_filters`, `list_datasets`, `get_dataset_metadata`, `list_file_details`, `get_dataset_version_download_url`, `get_file_download_url`, `stream_download_url_to_path`, `require_uuid_string`.

## Notes and limitations

- **Use `id` from API responses** as `identifier` in download/metadata/list_files calls.
- **Batch downloads**: `max_workers` defaults to **4** and cannot exceed **4**.
- **Unified top-level APIs**: `list_directory`, `get_metadata`, `list_files`, and `download` all accept `type="dataset"` or `type="model"`.
- **Model-specific shortcuts**: `aikosh.models.list_files`, `aikosh.models.ping`, etc., when you prefer not to pass `type`.

## Troubleshooting

- **401 / Invalid API key**: re-check `AIKOSH_API_KEY` or `aikosh.set_api_key(...)`.
- **422 invalid id**: pass the **`id`** from `list_directory(...)` / metadata, not a slug or display name.
- **No results from list search**: if `list_directory` is called with filters (e.g. `keyword`) and nothing matches, the response includes a **`message`** field (e.g. `"No dataset found with the passed filters; try a different combination."`). Pagination-only calls (`page` / `size` alone) do not add this message.
- **Wrong download location**: use `destination_path` for local saves; `directory_path` is only for remote paths inside the asset.

## License

Apache-2.0
