Metadata-Version: 2.4
Name: micantis
Version: 0.1.17
Summary: Package to simplify Micantis API usage
Author-email: Mykela DeLuca <mykela.deluca@micantis.io>
License-Expression: MIT
Requires-Python: >=3.7
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: requests
Requires-Dist: pandas
Requires-Dist: msal
Requires-Dist: azure-identity
Provides-Extra: parquet
Requires-Dist: pyarrow>=16.0.0; extra == "parquet"
Dynamic: license-file

# Micantis API Wrapper

A lightweight Python wrapper for interacting with the Micantis API plus some helpful utilities.  
Built for ease of use, fast prototyping, and clean integration into data workflows.

---

## 🚀 Features

- Authenticate and connect to the Micantis API service
- Download and parse CSV, binary, and Parquet data into pandas DataFrames
- Parquet support for efficient data storage with embedded metadata
- Filter, search, and retrieve metadata
- Utility functions to simplify common API tasks

---

## ⚠️ Important

This package is designed for authenticated Micantis customers only.  
If you are not a Micantis customer, the API wrapper and utilities in this package will not work for you.

For more information on accessing the Micantis API, please contact us at info@micantis.io.

---

## 📦 Installation

```bash
pip install micantis
```

### Optional: Parquet Support

For parquet file downloads and metadata extraction, install with parquet support:

```bash
pip install micantis[parquet]
```

Or install `pyarrow` separately:

```bash
pip install pyarrow
```

---

## 💻 Examples

### Import functions

``` python
import pandas as pd
from micantis import MicantisAPI
```

### Initialize API

``` python
# Option 1 - login with username and password
service_url = 'your service url'
username = 'your username'
password = 'your password'

api = MicantisAPI(service_url=service_url, username=username, password=password)
```

``` python
# Option 2 - login in with Microsoft Entra ID
SERVICE = 'your service url'
CLIENT_ID = 'your client id'
AUTHORITY = 'https://login.microsoftonline.com/organizations'
SCOPES = ['your scopes']

api = MicantisAPI(service_url=SERVICE, client_id=CLIENT_ID, authority=AUTHORITY, scopes=SCOPES)
```

``` python
# Option 3 - Use pre-existing token (for containerized apps)
service_url = 'your service url'
token = 'your bearer token'

api = MicantisAPI(service_url=service_url, token=token)
# No need to call authenticate() when using a token
```

``` python
# Option 4 - Use environment variables (recommended for containers/CI/CD)
# Set these environment variables:
# - WORKBOOK_API_URL: your service URL
# - WORKBOOK_TOKEN: your bearer token

api = MicantisAPI()  # Will automatically use environment variables
# No need to call authenticate() when using a token
```

### Authenticate API
``` api.authenticate() ```

**Note:** When using a pre-existing token (Option 3 or 4), you don't need to call `authenticate()` as the token is already configured.

### Download Data Table Summary

#### Optional parameters
- `search`: Search string (same syntax as the Micantis WebApp)
- `barcode`: Search for a specific barcode
- `limit`: Number of results to return (default: 500)
- `min_date`: Only return results after this date
- `max_date`: Only return results before this date
- `show_ignored`: Include soft-deleted files (default: `True`)

```python
table = api.get_data_table(search=search, barcode=barcode, min_date=min_date, max_date=max_date, limit = 10, show_ignored=show_ignored)
table
```

### Download Binary Files

``` python
# Download single file

file_id = 'File ID obtained from data table, id column'
df = api.download_binary_file(id)

```

``` python
# Download many files using list of files from the table

file_id_list = table['id'].to_list()
data = []

for id in file_id_list:
    df = api.download_csv_file(id)
    data.append(df)

all_data = pd.concat(data)
```

### Download CSV Files

``` python
# Download single file

file_id = 'File ID obtained from data table, id column'
df = api.download_csv_file(id)
```

``` python
# Download multiple files

id_list = table['id'].to_list()
data = []

for id in id_list:
    df = api.download_csv_file(id)
    data.append(df)

all_data = pd.concat(data)
```

### Download Parquet Files

Download cycle tester data as Apache Parquet files for efficient analysis. Parquet files are smaller, faster, and include embedded metadata.

#### Optional parameters
- `cycle_ranges`: Filter by cycle index (see examples below)
- `test_time_start`: Filter by test time start (seconds from test start)
- `test_time_end`: Filter by test time end (seconds from test start)
- `line_number_start`: Filter by line number start
- `line_number_end`: Filter by line number end
- `include_auxiliary_data`: Include auxiliary channels like temperature (default: `True`)
- `output_path`: Custom file path (default: uses cell_data_id as filename)
- `return_type`: What to return - `'dataframe'` (default), `'path'`, or `'bytes'`

#### Return Type Options
- **`'dataframe'`** (default): Saves file and returns pandas DataFrame - best for immediate analysis
- **`'dict'`**: Saves file and returns dict with data, metadata, and cycle_summaries - best when you need metadata (requires `pyarrow`)
- **`'path'`**: Saves file and returns path string - best for large files or batch processing
- **`'bytes'`**: Returns raw bytes without saving - best for direct cloud uploads (Databricks, Azure Blob, S3)

```python
# Download and get DataFrame (default)
file_id = 'File ID obtained from data table, id column'
df = api.download_parquet_file(file_id)
```

```python
# Get data + metadata in one call
result = api.download_parquet_file(file_id, return_type='dict')

df = result['data']                    # Cycle test data
metadata = result['metadata']          # Cell metadata (name, barcode, timestamps, etc.)
cycle_summaries = result['cycle_summaries']  # Per-cycle summary statistics
```

```python
# Save file and get path (memory efficient for large files)
path = api.download_parquet_file(file_id, return_type='path')

# Later, read when needed
df = pd.read_parquet(path)
```

```python
# Get raw bytes for direct cloud upload (no local file)
parquet_bytes = api.download_parquet_file(file_id, return_type='bytes')

# Upload to Azure Blob Storage
blob_client.upload_blob(name='test_data.parquet', data=parquet_bytes)

# Or read directly into DataFrame
import io
df = pd.read_parquet(io.BytesIO(parquet_bytes))
```

#### Cycle Range Filtering

Filter data by specific cycles or cycle ranges using the `cycle_ranges` parameter.

```python
# Download only cycles 1-10
df = api.download_parquet_file(
    file_id,
    cycle_ranges=[{"RangeStart": 1, "RangeEnd": 10}]
)
```

```python
# Download last 5 cycles
df = api.download_parquet_file(
    file_id,
    cycle_ranges=[{
        "RangeStart": 5,
        "IsStartFromBack": True,
        "RangeEnd": 1,
        "IsEndFromBack": True
    }]
)
```

```python
# Download specific cycles (1, 5, 10, 50)
df = api.download_parquet_file(
    file_id,
    cycle_ranges=[
        {"Single": 1},
        {"Single": 5},
        {"Single": 10},
        {"Single": 50}
    ]
)
```

```python
# Download first hour of data
df = api.download_parquet_file(
    file_id,
    test_time_start=0,
    test_time_end=3600
)
```

#### Extract Metadata from Parquet Files

Parquet files contain embedded metadata including cell info, timestamps, cycle counts, and per-cycle summaries. Extract this metadata using `unpack_parquet()` (requires `pyarrow`).

```python
# From a saved file
result = api.unpack_parquet('file.parquet')

df = result['data']                    # Cycle test data
metadata = result['metadata']          # Cell metadata (name, barcode, timestamps, etc.)
cycle_summaries = result['cycle_summaries']  # Per-cycle summary statistics
```

```python
# From bytes (no file needed)
parquet_bytes = api.download_parquet_file(file_id, return_type='bytes')
result = api.unpack_parquet(parquet_bytes)

df = result['data']
metadata = result['metadata']
cycle_summaries = result['cycle_summaries']
```

```python
# Extract and save metadata as CSV files for easy viewing
result = api.unpack_parquet('file.parquet', save_metadata=True)

# Creates:
# - file_metadata.csv
# - file_cycle_summaries.csv
```

```python
# Batch processing: Download multiple files without loading into memory
file_ids = table['id'].head(10).to_list()
paths = []

for file_id in file_ids:
    path = api.download_parquet_file(file_id, return_type='path')
    paths.append(path)

# Later, process files one at a time (memory efficient)
for path in paths:
    result = api.unpack_parquet(path)
    df = result['data']
    # Process df...
```

## Cells Table
### Download Cell ID Information
Retrieve a list of cell names and GUIDs from the Micantis database with flexible filtering options.

#### Optional parameters
- `search`: Search string (same syntax as the Micantis WebApp)
- `barcode`: Search for a specific barcode
- `limit`: Number of results to return (default: 500)
- `min_date`: Only return results after this date
- `max_date`: Only return results before this date
- `show_ignored`: Include soft-deleted files (default: `True`)

``` python
search = "*NPD*"
cells_df = api.get_cells_list(search=search)
cells_df.head()
```
### Download Cell Metadata

Fetch per-cell metadata and return a clean, wide-format DataFrame.

#### Parameters:
- `cell_ids`: **List[str]**  
  List of cell test GUIDs (**required**)

- `metadata`: **List[str] (optional)**  
  List of metadata **names** (e.g., `"OCV (V)"`) or **IDs**.  
  If omitted, all non-image metadata will be returned by default.

- `return_images`: **bool (optional)**  
  If `True`, includes image metadata fields. Default is `False`.

---

#### 📘 Examples

```python
# Example 1: Get all non-image metadata for a list of cells
cell_ids = cells_df["id"].to_list()
cell_metadata_df = api.get_cell_metadata(cell_ids=cell_ids)
```
```python
# Example 2: Get specific metadata fields by name
cell_metadata_df = api.get_cell_metadata(
    cell_ids=cell_ids,
    metadata=["Cell width", "Cell height"],
    return_images=False
)
```
```python
# Merge cell metadata table with cell names to get clean dataframe
# Merge id with Cell Name (as last column)
id_to_name = dict(zip(cells_df['id'], cells_df['name']))
cells_metadata_df['cell_name'] = cells_metadata_df['id'].map(id_to_name)
cells_metadata_df.head()
```

## Specifications Table
### Download Specifications List
Retrieve specifications with their associated user properties.

```python
# Get all specifications with their user properties
specs_df = api.get_specifications_table()
specs_df.head()
```

## Test Management
### Download Test Requests List
Retrieve test request data with flexible date filtering.

#### Optional parameters
- `since`: Date string in various formats (defaults to January 1, 2020 if not provided)
  - Full month names: `"May 1, 2025"`, `"January 15, 2024"`
  - ISO format: `"2025-05-01"` or `"25-05-01"`

```python
# Get all test requests (defaults to since 2020-01-01)
test_requests = api.get_test_request_list()

# Get test requests since a specific date using month name
test_requests = api.get_test_request_list(since="May 1, 2024")

# Get test requests using ISO format
test_requests = api.get_test_request_list(since="2024-05-01")
```

### Download Failed Test Requests
Retrieve only failed test requests with the same date filtering options.

```python
# Get failed test requests since a specific date
failed_requests = api.get_failed_test_requests(since="January 1, 2024")
failed_requests.head()
```

### Get Individual Test Request Details
Retrieve full details for a specific test request by ID.

**New Feature:** Multiple output format options for better data analysis!

#### Format Options
- `return_format='dict'`: Raw dictionary (default, backwards compatible)
- `return_format='dataframes'`: Returns 3 DataFrames - summary, tests, and status_log ⭐ **Recommended**
- `return_format='flat'`: Single-row DataFrame with basic info

```python
# Option 1: Dictionary format (default, backwards compatible)
request_id = "your-test-request-guid"
test_details = api.get_test_request(request_id)

# Option 2: DataFrames format (recommended for analysis) ⭐
test_details = api.get_test_request(request_id, return_format='dataframes')
print(test_details['summary'])      # Basic request information
print(test_details['tests'])        # All requested tests
print(test_details['status_log'])   # Status change history

# Option 3: Flat DataFrame (best for combining multiple requests)
test_details = api.get_test_request(request_id, return_format='flat')
```

#### Batch Processing Multiple Requests
```python
# Get summaries for multiple test requests
request_ids = test_requests['id'].head(10).to_list()

all_summaries = []
for req_id in request_ids:
    summary = api.get_test_request(req_id, return_format='flat')
    all_summaries.append(summary)

# Combine into single DataFrame
combined_df = pd.concat(all_summaries, ignore_index=True)
print(f"Retrieved {len(combined_df)} test requests")
combined_df.head()
```

## Write Cell Metadata
Micantis lets you programmatically assign or update metadata for each cell using either:
- the human-readable field name (e.g., "Technician", "Weight (g)")
- or the internal propertyDefinitionId (UUID)

#### 📘 Examples

```python
# Example 1: Update the technician field for a cell
changes = [
    {
        "id": "your-cell-test-guid-here",  # cell test GUID
        "field": "Technician",
        "value": "Mykela"
    },
    {
        "id": "your-cell-test-guid-here",
        "field": "Weight (g)",
        "value": 98.7
    }
]

api.write_cell_metadata(changes=changes)

# Verify the changes
api.get_cell_metadata(cell_ids=["your-cell-test-guid-here"], metadata=['Weight (g)', 'Technician'])
```

```python
# Example 2: Update using propertyDefinitionId (advanced)
changes = [
    {
        "id": "your-cell-test-guid-here",
        "propertyDefinitionId": "your-property-definition-guid",
        "value": 98.7
    }
]

api.write_cell_metadata(changes=changes)

# Verify the changes
api.get_cell_metadata(cell_ids=["your-cell-test-guid-here"], metadata=['Weight (g)', 'Technician'])
```

## Stitch Data
Combine multiple data sets into a single stitched data set. This is useful for creating continuous test data from multiple separate test runs.

#### Parameters
- `name`: **str (required)**
  Name for the stitched data set

- `cell_data_ids`: **List[str] (required)**
  List of cell data GUIDs to stitch together

- `increment_cycle_number`: **bool (optional)**
  Whether to increment cycle numbers when stitching. Default is `False`.

- `advanced_mode`: **bool (optional)**
  Advanced mode for manual ordering of data sets. Default is `False`.

- `archive_source_data`: **bool (optional)**
  Archive (soft delete) the source data sets after stitching. Default is `False`.

- `id`: **str (optional)**
  Optional ID for updating an existing stitched data set. Leave `None` to create new.

- `allow_async`: **bool (optional)**
  If `True`, runs asynchronously and returns job ID. If `False`, waits for completion. Default is `False`.

#### Returns
- If `allow_async=False`: Dictionary with `'stitched_data_id'`
- If `allow_async=True`: Dictionary with `'job_id'`

#### 📘 Examples

```python
# Example 1: Stitch multiple test runs together
cell_data_ids = ["guid1", "guid2", "guid3"]

result = api.stitch_data(
    name="Combined Test Data",
    cell_data_ids=cell_data_ids,
    increment_cycle_number=True
)

print(f"Stitched data ID: {result['stitched_data_id']}")
```

```python
# Example 2: Stitch and archive source data
result = api.stitch_data(
    name="Complete Test Sequence",
    cell_data_ids=cell_data_ids,
    increment_cycle_number=True,
    archive_source_data=True  # Source files will be soft-deleted
)

# Download the stitched result
stitched_df = api.download_parquet_file(result['stitched_data_id'])
```

```python
# Example 3: Async mode for large data sets
result = api.stitch_data(
    name="Large Combined Dataset",
    cell_data_ids=large_cell_data_list,
    allow_async=True
)

print(f"Job ID: {result['job_id']}")
# Check job status via the Micantis WebApp or use the job API
```

## Archive Data
Archive (soft delete) data sets. Archived data is hidden from the default list view but not permanently deleted and can be unarchived.

**Note:** The `archive_data()` method toggles the archive status - calling it on an archived file will unarchive it.

#### Parameters
- `cell_data_id`: **str (required)**
  Cell data GUID to archive/unarchive

#### Returns
- Dictionary with `'id'` and `'archived'` (bool indicating new state)

#### 📘 Examples

```python
# Example 1: Archive a single data set
result = api.archive_data(cell_data_id="your-guid-here")
print(f"File archived: {result['archived']}")  # True if now archived
```

```python
# Example 2: Archive multiple data sets
cell_data_ids = ["guid1", "guid2", "guid3"]

for cell_id in cell_data_ids:
    result = api.archive_data(cell_id)
    print(f"Archived {cell_id}: {result['archived']}")
```

```python
# Example 3: Unarchive by calling again
result = api.archive_data(cell_data_id="your-guid-here")
print(f"File archived: {result['archived']}")  # False if now unarchived
```

```python
# Example 4: Combined workflow - stitch and archive
# Stitch data with automatic archiving
result = api.stitch_data(
    name="Combined Data",
    cell_data_ids=cell_ids,
    archive_source_data=True  # Automatically archives source files
)

# Or manually archive after stitching
result = api.stitch_data(name="Combined Data", cell_data_ids=cell_ids)
for cell_id in cell_ids:
    api.archive_data(cell_id)
```

