Metadata-Version: 2.4
Name: orchestra-lang
Version: 0.0.1
Summary: Orchestra pipeline configuration language
License-Expression: Apache-2.0
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Programming Language :: Python :: Implementation :: CPython
Classifier: Operating System :: MacOS :: MacOS X
Classifier: Operating System :: Microsoft :: Windows
Classifier: Operating System :: POSIX :: Linux
Requires-Python: >=3.10
Description-Content-Type: text/markdown
Requires-Dist: cffi>=1.17.1

# orchestra-lang

**Parse, validate, and transpile [Orchestra](https://www.getorchestra.io) pipeline definitions from Python.**

`orchestra-lang` is the Python binding for the Orchestra pipeline DSL — the same
engine that powers the Orchestra VS Code extension. It ships the
dialect's schema and validators as a native library, so you get full
dialect-aware parsing and validation without calling out to a separate process.

## About the dialect

Orchestra pipelines are authored in a [KSON](https://kson.org)-based DSL for
defining, validating, and shipping data pipelines. A pipeline describes tasks,
dependencies, conditions, triggers, and variables across 95+ integrations
(Snowflake, dbt, Fivetran, Databricks, and more).

A minimal pipeline looks like this:

```
version: v1
name: 'task-references'
pipeline:
  producer:
    integration: SNOWFLAKE
    integrationJob: SNOWFLAKE_RUN_QUERY
    parameters:
      set_outputs: true
      statement: 'SELECT 1'
      .
    .
  consumer:
    integration: SNOWFLAKE
    integrationJob: SNOWFLAKE_RUN_QUERY
    condition: "${{ tasks['producer'].status == 'SUCCEEDED' }}"
    parameters:
      statement: "SELECT ${{ tasks['producer'].outputs['count'] }}"
      .
    dependsOn:
      - producer
```

The dialect validates required fields and types, integration-specific
parameters, variable references and expressions, task dependencies, cron
syntax, and branching conditions. See the
[Orchestra documentation](https://docs.getorchestra.io) for the full language
reference.

## Installation

```bash
pip install orchestra-lang
```

## API

The module exposes five top-level functions and return the raw result objects.

### `analyze(kson: str) -> Analysis`

Statically analyze an Orchestra document. The bundled engine runs the
Orchestra dialect validators automatically, so `analyze` catches parse
errors, schema violations, expression syntax errors, bad task
references, circular dependencies, invalid cron expressions, and so on.

```python
import orchestra_lang

src = open("pipeline.orc").read()
analysis = orchestra_lang.analyze(src)

for msg in analysis.errors():
    start = msg.start()
    print(f"[{msg.severity()}] {start.line()}:{start.column()}  {msg.message()}")
```

Example output for a pipeline with a broken condition expression and missing
required fields:

```
[MessageSeverity.ERROR] 3:53  Expression syntax error: Expected RPAREN but got ''
[MessageSeverity.WARNING] 0:0  Missing required properties: version, name
```

#### The `Analysis` object

`Analysis` exposes three views of the analyzed document:

- **`errors() -> list[Message]`** — every error and warning produced by the
  parser, schema validator, and dialect validators. Each `Message` carries a
  `severity()` (`MessageSeverity.ERROR` or `MessageSeverity.WARNING`), a
  `message()` string, and `start()` / `end()` `Position`s whose `line()` and
  `column()` methods return zero-based offsets. An empty list means the
  document is valid.

- **`tokens() -> list[Token]`** — the full lexed token stream, useful for
  syntax highlighting and editor tooling. Each `Token` has `token_type()`
  (a `TokenType` enum), `text()`, and `start()` / `end()` positions.

- **`kson_value() -> KsonValue | None`** — the parsed document as a typed
  value tree, or `None` if parsing failed fatally. Call `type()` to get a
  `KsonValueType` discriminator, or `isinstance` check against
  `KsonValue.KsonObject`, `KsonValue.KsonArray`, `KsonValue.KsonString`,
  `KsonValue.KsonNumber`, `KsonValue.KsonBoolean`, `KsonValue.KsonNull`,
  or `KsonValue.KsonEmbed` to walk the tree.

```python
from orchestra_lang import analyze, KsonValue

analysis = analyze(src)
root = analysis.kson_value()
if isinstance(root, KsonValue.KsonObject):
    # ... walk the object
    ...
```

### `to_json(kson: str, options: TranspileOptions.Json) -> Result`

Transpile Orchestra source to JSON. Returns a `Result` — pattern-match on
`Result.Success` / `Result.Failure`:

```python
from orchestra_lang import to_json, Result, TranspileOptions

result = to_json(src, TranspileOptions.Json(retain_embed_tags=False))
if isinstance(result, Result.Success):
    print(result.output())
else:
    for err in result.errors():
        print(err.message())
```

### `to_yaml(kson: str, options: TranspileOptions.Yaml) -> Result`

Same shape as `to_json`, but emits YAML and preserves comments.

```python
from orchestra_lang import to_yaml, TranspileOptions

result = to_yaml(src, TranspileOptions.Yaml(retain_embed_tags=False))
```

### `format_source(kson: str, format_options: FormatOptions) -> str`

Pretty-print Orchestra source with the given formatting options. Named
`format_source` rather than `format` so that `from orchestra_lang import *`
does not shadow the `format` builtin.

```python
from orchestra_lang import format_source, FormatOptions, FormattingStyle, IndentType

formatted = format_source(
    src,
    FormatOptions(
        indent_type=IndentType.Spaces(2),
        formatting_style=FormattingStyle.PLAIN,
        embed_block_rules=[],
    ),
)
```

### `parse_schema(schema_kson: str) -> SchemaResult`

Parse a KSON JSON-Schema document and, on success, return a reusable
`SchemaValidator`.

```python
from orchestra_lang import parse_schema, SchemaResult

result = parse_schema(open("my.schema.kson").read())
if isinstance(result, SchemaResult.Success):
    validator = result.schema_validator()
    messages = validator.validate(src, "document.orc")
```

## Links

- [Orchestra documentation](https://docs.getorchestra.io)
- [KSON language](https://kson.org)
