Metadata-Version: 2.4
Name: datatailr
Version: 0.1.117
Summary: Ready-to-Use Platform That Drives Business Insights
Author-email: Datatailr <info@datatailr.com>
License-Expression: MIT
Project-URL: homepage, https://www.datatailr.com/
Project-URL: documentation, https://ai.docs.datatailr.com/
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3 :: Only
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Environment :: Console
Classifier: Operating System :: OS Independent
Classifier: Topic :: Scientific/Engineering
Requires-Python: >=3.9
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: requests
Provides-Extra: dev
Requires-Dist: ruff; extra == "dev"
Requires-Dist: pre-commit; extra == "dev"
Requires-Dist: mypy; extra == "dev"
Requires-Dist: types-setuptools; extra == "dev"
Requires-Dist: toml; extra == "dev"
Requires-Dist: coverage; extra == "dev"
Requires-Dist: sphinx-rtd-theme; extra == "dev"
Requires-Dist: sphinx; extra == "dev"
Requires-Dist: sphinx-autodoc-typehints; extra == "dev"
Requires-Dist: sphinx-autosummary; extra == "dev"
Requires-Dist: sphinx-design; extra == "dev"
Requires-Dist: sphinx-copybutton; extra == "dev"
Requires-Dist: myst-parser; extra == "dev"
Dynamic: license-file

<!-- --8<-- [start:intro] -->

<div style="text-align: center;">
  <a href="https://www.datatailr.com/" target="_blank">
    <img src="https://s3.eu-west-1.amazonaws.com/datatailr.com/assets/datatailr-logo.svg" alt="Datatailr Logo" />
  </a>
</div>

---

**Datatailr empowers your team to streamline analytics and data workflows
from idea to production without infrastructure hurdles.**

# What is Datatailr?

Datatailr is a platform that simplifies the process of building and deploying data applications.

It makes it easier to run and maintain large-scale data processing and analytics workloads.
<!-- --8<-- [end:intro] -->
## What is this package?
This is the Python package for Datatailr, which allows you to interact with the Datatailr platform.

It provides the tools to build, deploy, and manage batch jobs, data pipelines, services and analytics applications.

Datatailr manages the underlying infrastructure so your applications can be deployed in an easy, secure and scalable way.

## Installation
### Installing the Python package
You can install the Datatailr Python package using pip:
```bash
pip install datatailr
```

### Testing the installation
```python
import datatailr

print(datatailr.__version__)
print(datatailr.__provider__)
```

## Remote CLI (optional)
If you install the package outside the Datatailr platform, you can enable the remote `dt` CLI:

```bash
datatailr setup-cli
```

`datatailr login` prompts interactively for the base URL, username and password.
To skip the prompts (for CI or scripted setups), set all three of the following
environment variables before running `datatailr login`:

```bash
export DATATAILR_BASE_URL=https://your-datatailr-instance
export DATATAILR_USER_NAME=your-username
export DATATAILR_USER_PASSWORD=your-password
datatailr login
```

When all three are set, `DATATAILR_BASE_URL` takes precedence over the `--url`
flag. The resulting session is saved to `~/.dt/remote_client/remote_client.cfg`,
so the env vars are only needed for the `login` step.

After `datatailr login`, you can print the OIDC cookie line for scripts or HTTP clients:

```bash
datatailr export-auth
eval "$(datatailr export-auth --shell)"   # sets DATATAILR_OIDC_HEADER (sh/bash/zsh)
```

For fish:

```fish
eval (datatailr export-auth --fish)   # sets DATATAILR_OIDC_HEADER
```

From Python (after ``datatailr login``), read the same session at runtime:

```python
from datatailr import (
    get_remote_http_headers,
    get_remote_oidc_cookie_line,
    get_remote_oidc_jwt,
    load_remote_client_config,
)

cfg = load_remote_client_config()
print(cfg.base_url)

token = get_remote_oidc_jwt()
line = get_remote_oidc_cookie_line()  # X-Datatailr-Oidc-Data=<jwt>

import requests
requests.get(f"{cfg.base_url}/api/user/ls", headers=get_remote_http_headers())
```

Example usage:
```bash
dt job ls
dt user ls
dt job save path/to/local/file.json
```

Notes:
- Remote CLI configuration inside a virtual environment only applies inside that environment.
- The remote CLI cannot be installed inside Datatailr containers; the native CLI is used there.

## AI Agent Skills

The package includes agent skills that teach AI coding assistants (Cursor, Claude Code, Codex, Copilot, etc.) how to work with the Datatailr platform. Inside Datatailr workstations, skills are available automatically. On your local machine, run:

```bash
datatailr setup-skills
```

## Quickstart
The following example shows how to create a simple data pipeline using the Datatailr Python package.


```python
from datatailr import workflow, task

@task()
def func_no_args() -> str:
    return "no_args"


@task()
def func_with_args(a: int, b: float) -> str:
    return f"args: {a}, {b}"

@workflow(name="MY test DAG")
def my_workflow():
    for n in range(2):
        res1 = func_no_args().alias(f"func_{n}")
        res2 = func_with_args(1, res1).alias(f"func_with_args_{n}")
my_workflow(local_run=True)
```

Running this code will create a graph of jobs and execute it.
Each node on the graph represents a job, which in turn is a call to a function decorated with `@task()`.

Since this is a local run then the execution of each node will happen sequentially in the same process.

To take advantage of the datatailr platform and execute the graph at scale, you can run it using the job scheduler as presented in the next section.

## Budgets (spend reporting and limiting)

Jobs can be assigned to a **budget** for spend reporting and optional cost limiting. The Python SDK exposes the same operations as ``dt cost`` CLI command.

If not specified, all jobs are assigned to the `default` budget which is available to all users and has no limit by default. Admins can set a limit on the `default` budget, but it cannot be removed.

```python
from datatailr import ACL, Budget, Group, Permission, User

# List budgets visible to the current user
for b in Budget.ls():
    print(b.name, b.budget_usd, b.usage_usd, b.usage_percentage, b.prevent_overflow)

# Load one budget by name
b = Budget("my_budget")

# Create / update / remove (available to admins only)
Budget.add("my_budget", 50000.0, prevent_overflow=True)
Budget.update("my_budget", amount=75000.0, prevent_overflow=False)
Budget.remove("my_budget")

# Permissions (ACL):
# read = see limit and usage.
# operate = assign jobs to this budget
# Creating and deleting budgets, updating limits and ACLs are admin-only operations.
acl = ACL(
    {
        Permission.READ: [User("alice"), Group("analysts")],
        Permission.OPERATE: [Group("developers")],
    }
)
Budget.set_acl("my_budget", acl)
Budget.add_acl("my_budget", acl)
Budget.remove_acl("my_budget", acl)
Budget.set_acl("my_budget", None)  # replace ACL with {}
```

## Execution at Scale
To execute the graph at scale, you can use the Datatailr job scheduler. This allows you to run your jobs in parallel, taking advantage of the underlying infrastructure.

You will first need to separate your function definitions from the DAG definition. This means you should define your functions as a separate module, which can be imported into the DAG definition.


```python
# my_module.py

from datatailr import task

@task()
def func_no_args() -> str:
    return "no_args"


@task()
def func_with_args(a: int, b: float) -> str:
    return f"args: {a}, {b}"
```

To use these functions in a batch job, you just need to import them and run in a DAG context:

```python
from my_module import func_no_args, func_with_args
from datatailr import workflow

@workflow(name="MY test DAG")
def my_workflow():
    for n in range(2):
        res1 = func_no_args().alias(f"func_{n}")
        res2 = func_with_args(1, res1).alias(f"func_with_args_{n}")

schedule = Schedule(at_hours=0)
my_workflow(schedule=schedule)
```

This will submit the entire workflow for execution, and the scheduler will take care of running the jobs in parallel and managing the resources.
The workflow in the example above will be scheduled to run daily at 00:00.

___
Visit [our website](https://www.datatailr.com/) for more!
