Metadata-Version: 2.4
Name: python-copybook
Version: 0.1.2
Summary: A parser for fixed-format 80-column COBOL copybooks.
License: MIT
License-File: LICENSE
Keywords: cobol,copybook,parser,mainframe,fixed-format
Author: soho
Author-email: dev@hanso.ca
Requires-Python: >=3.12
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Topic :: Text Processing :: General
Provides-Extra: dev
Requires-Dist: pytest (>=7.0) ; extra == "dev"
Requires-Dist: pytest-cov (>=4.0) ; extra == "dev"
Project-URL: Homepage, https://github.com/soho/python-copybook
Project-URL: Issues, https://github.com/soho/python-copybook/issues
Description-Content-Type: text/markdown

# python-copybook
[![PyPI](https://img.shields.io/pypi/v/dscanpy)](https://pypi.org/project/python-copybook/)
[![Python](https://img.shields.io/pypi/pyversions/dscanpy)](https://pypi.org/project/python-copybook/)
[![License](https://img.shields.io/pypi/l/python-copybook)](LICENSE)


A Python library for parsing fixed-format COBOL copybook files into structured, traversable data models.




## Overview

`python-copybook` reads `.cpy` and `.cbl` copybook files and produces a clean hierarchy of parsed COBOL data fields. It handles multi-line statements, continuation lines, inline comments, REDEFINES, OCCURS, RENAMES, and all standard data definition clauses.

The library is structured as a five-stage pipeline:

```
File → CobolLine → CobolStatement → CobolField → CobolNode tree → rendered output
```

Each stage is independent and produces typed objects, so you can stop at any layer depending on what you need.


## Current Scope

> **This library currently supports flat, self-contained copybooks only.**
>
> - Fixed-format 80-column COBOL records
> - No `COPY` statement resolution — copybooks with imports are not yet supported
> - Offset, length, and position calculation is in progress
> - Buffer and memory view generation (mapping a raw data record to named fields) is planned


## Installation

```bash
pip install python-copybook
```


## Requirements

Python 3.12+

## Quick Start

```python
from copybook.parser import read_copybook, read_statements, parse_fields
from copybook.tree import build_tree, build_flat_tree
from copybook.render import render_copybook, render_flat_copybook

# Stage 1 — read physical lines
lines = read_copybook("CLAIMREC.cpy")

# Stage 2 — assemble logical statements
statements = read_statements(lines)

# Stage 3 — parse fields from statements
fields = parse_fields(statements)

# Stage 4 — build hierarchy tree
tree = build_tree(fields)

# Stage 5 — render output
print(render_copybook(tree))
```

## Pipeline Stages

### Stage 1 — `read_copybook(path) → list[CobolLine]`

Reads the file and slices each physical line into its fixed-format column regions. Lines are stored raw — no interpretation occurs at this stage.

```
Columns 1–6   — Sequence number
Column  7     — Indicator  (* = comment, - = continuation, space = normal)
Columns 8–72  — Area  (actual COBOL content)
Columns 73–80 — Identification (free-form comment)
```

### Stage 2 — `read_statements(lines) → list[CobolStatement]`

Joins physical lines into complete logical declarations. In fixed-format COBOL, a declaration ends with a period and may span multiple lines. This stage handles:

- Full-line comments (`*` or `/` indicator)
- Continuation lines (`-` indicator)
- Inline comments (`*>` syntax, COBOL 2002+)

### Stage 3 — `parse_fields(statements) → list[CobolField]`

Extracts every COBOL clause from each statement using independent regex searches. Clause order in the source does not matter.

Clauses parsed: `PIC`/`PICTURE`, `REDEFINES`, `RENAMES`, `RENAMES THRU`, `USAGE`, `OCCURS`, `DEPENDING ON`, `INDEXED BY`, `VALUE`/`VALUES ARE`, `SIGN SEPARATE`, `SYNCHRONIZED`

### Stage 4 — `build_tree(fields) → list[CobolNode]`

Assigns parent-child relationships using level numbers. Returns root-level nodes; all others are reachable via `.children`.

Two tree views are available:

| Function | Contents |
|---|---|
| `build_tree(fields)` | Full hierarchy — mirrors source exactly |
| `build_flat_tree(fields)` | Elementary fields only — no groups, no 88s, no REDEFINES |

### Stage 5 — Rendering

| Function | Output |
|---|---|
| `render_copybook(tree)` | Full copybook, original level numbers preserved |
| `render_flat_copybook(nodes, record_name)` | All fields remapped to level 05 under one 01 record |

`render_copybook` accepts an `indent_size` parameter (default 4) to control visual depth:

```python
render_copybook(tree, indent_size=0)   # flat — all fields at Area B
render_copybook(tree, indent_size=4)   # default — 4 spaces per depth level
```

## Data Models

### `CobolLine`

One physical line from the file. Never mutated after creation.

```python
line.line_number      # int — 1-based
line.indicator        # str — single char, space = normal
line.area             # str — raw columns 8–72
line.is_comment       # bool — indicator is * or /
line.is_continuation  # bool — indicator is -
line.raw_line         # str — reconstructed 80-char record
```

### `CobolStatement`

One complete logical declaration, potentially spanning multiple lines.

```python
stmt.area             # str — full joined text e.g. '15 Field PIC X(10).'
stmt.source_lines     # list[CobolLine] — contributing physical lines
stmt.reserved_words   # list[str] — COBOL keywords found in this statement
stmt.copybook         # str — source file path
```

### `CobolField`

One parsed data field with all clauses extracted.

```python
field.level           # int  — level number
field.name            # str  — uppercased field name
field.picture         # str | None  — e.g. '9(07)', 'X(15)', 'S9(7)V99'
field.redefines       # str | None  — target field name
field.usage           # str | None  — e.g. 'COMP-3', 'BINARY', 'DISPLAY'
field.occurs          # int | None  — fixed array size
field.occurs_max      # int | None  — upper bound for variable arrays
field.depending_on    # str | None  — field holding current array size
field.indexed_by      # str | None  — index name for OCCURS tables
field.value           # str | None  — VALUE clause content
field.sign_separate   # bool — SIGN LEADING/TRAILING SEPARATE present
field.synchronized    # bool — SYNC/SYNCHRONIZED present

# Derived properties
field.is_filler         # name == 'FILLER'
field.is_condition      # level == 88
field.is_group          # no PIC clause
field.is_redefine       # has REDEFINES clause
field.is_array          # has OCCURS clause
field.is_variable_array # has DEPENDING ON clause
field.is_numeric        # PIC contains 9
field.is_signed         # PIC starts with S
```

### `CobolNode`

A node in the field hierarchy tree. Wraps a `CobolField` with parent/child references.

```python
node.cobol_field      # CobolField
node.parent           # CobolNode | None
node.children         # list[CobolNode]
node.level            # shortcut to cobol_field.level
node.name             # shortcut to cobol_field.name
node.to_dict()        # JSON string — parent serialized as name string to avoid circular refs
```

## Examples

### Parse and print a copybook tree

```python
from copybook.parser import read_copybook, read_statements, parse_fields
from copybook.tree import build_tree
from copybook.render import render_copybook

lines      = read_copybook("CLAIMREC.cpy")
statements = read_statements(lines)
fields     = parse_fields(statements)
tree       = build_tree(fields)

print(render_copybook(tree, indent_size=4))
```

```cobol
       01  Claim-Record.
           12 Insured-Details.
               15 Insured-Policy-No        PIC 9(07).
               15 Insured-Last-Name        PIC X(15).
           12 Policy-Details.
               15 Policy-Type              PIC 9.
                   88 Private              VALUE 1.
                   88 Medicare             VALUE 2.
```

### Generate a flat copybook (no groups, no REDEFINES)

```python
from copybook.tree import build_flat_tree
from copybook.render import render_flat_copybook

flat = build_flat_tree(fields)
print(render_flat_copybook(flat, "FLAT-CLAIMREC"))
```

```cobol
       01  FLAT-CLAIMREC.
           05 Insured-Policy-No        PIC 9(07).
           05 Insured-Last-Name        PIC X(15).
           05 Insured-First-Name       PIC X(10).
           05 Policy-Type              PIC 9.
           05 Policy-Benefit-Date-Num  PIC 9(08).
           05 Policy-Amount            PIC S9(7)V99.
```

### Inspect a field

```python
for field in fields:
    if field.is_numeric:
        print(f"{field.name}: PIC {field.picture}")
```

### Serialize a tree node to JSON

```python
tree = build_tree(fields)
print(tree[0].to_dict())
```

## Project Structure

```
copybook/
    models.py          — CobolLine, CobolStatement, CobolField, CobolNode
    patterns.py        — regex patterns, layout constants, reserved words
    parser.py          — read_copybook, read_statements, parse_fields
    tree.py            — build_tree, build_flat_tree
    render.py          — render_copybook, render_flat_copybook
    reserved_words.txt — COBOL reserved word list
```

## Supported COBOL Features

| Feature | Supported |
|---|---|
| Fixed-format 80-column records | ✅ |
| Full-line comments (`*`, `/` indicator) | ✅ |
| Continuation lines (`-` indicator) | ✅ |
| Inline comments (`*>`) | ✅ |
| Multi-line statements | ✅ |
| PIC / PICTURE clause | ✅ |
| REDEFINES | ✅ |
| OCCURS n TIMES | ✅ |
| OCCURS n TO m TIMES DEPENDING ON | ✅ |
| INDEXED BY | ✅ |
| VALUE / VALUES ARE | ✅ |
| USAGE / COMP / COMP-3 / BINARY etc. | ✅ |
| SIGN LEADING/TRAILING SEPARATE | ✅ |
| SYNCHRONIZED / SYNC | ✅ |
| RENAMES / RENAMES THRU | ✅ |
| Level 66, 77, 88 | ✅ |
| `:tag:` style replacement tokens | ✅ |
| Free-format COBOL | ❌ |


## Roadmap

- [ ] PIC clause byte length calculator (`DISPLAY`, `COMP-3`, `BINARY`, etc.)
- [ ] Field offset and position computation
- [ ] `REDEFINES`-aware offset handling (overlapping storage)
- [ ] `OCCURS` multiplier in offset calculation
- [ ] Raw record buffer slicing by field name
- [ ] Multiple memory view strategies (full tree, flat, storage-only)
- [ ] `COPY` statement resolution across multiple copybook files

## License

GPL-3.0 — see [LICENSE](LICENSE) for details.

