Metadata-Version: 2.4
Name: bclearer
Version: 0.3.0
Summary: A collection of interop, core, and orchestration services for the bclearer framework
Author-email: Mesbah Khan <khanm@ontoledgy.io>
Requires-Python: <4.0,>=3.12
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: setuptools<79.0.0,>=78.1.1
Requires-Dist: pandas<3.0.0,>=2.2.3
Requires-Dist: gitpython<4.0.0,>=3.1.43
Requires-Dist: xlsxwriter<4.0.0,>=3.2.0
Requires-Dist: xlrd<3.0.0,>=2.0.1
Requires-Dist: chardet<6.0.0,>=5.2.0
Requires-Dist: h5py<4.0.0,>=3.11.0
Requires-Dist: tables<4.0.0,>=3.10.1
Requires-Dist: radon<7.0.0,>=6.0.1
Requires-Dist: numpy<3.0.0,>=2.1.1
Requires-Dist: pyodbc<6.0.0,>=5.1.0
Requires-Dist: pylint<4.0.0,>=3.3.0
Requires-Dist: kafka-python-ng<3.0.0,>=2.2.2
Requires-Dist: psutil<7.0.0,>=6.0.0
Requires-Dist: networkx<4.0,>=3.3
Requires-Dist: msaccessdb<2.0.0,>=1.0.0
Requires-Dist: lxml<6.0.0,>=5.3.0
Requires-Dist: pyspark<4.0.0,>=3.5.2
Requires-Dist: delta-spark<4.0.0,>=3.2.0
Requires-Dist: deltalake<1.0.0,>=0.20.0
Requires-Dist: fastparquet<2025.0.0,>=2024.5.0
Requires-Dist: basexclient<9.0.0,>=8.4.4
Requires-Dist: pytest<9.0.0,>=8.3.3
Requires-Dist: pytest-cov<6.0.0,>=5.0.0
Requires-Dist: pytest-xdist<4.0.0,>=3.6.1
Requires-Dist: openpyxl<4.0.0,>=3.1.5
Requires-Dist: neo4j<6.0.0,>=5.24.0
Requires-Dist: pycozo==0.7.6
Requires-Dist: cozo-embedded==0.7.6
Requires-Dist: graphviz<0.21.0,>=0.20.3
Requires-Dist: defusedxml<0.8.0,>=0.7.1
Requires-Dist: psycopg2-binary<3.0.0,>=2.9.9
Requires-Dist: ruff<0.7.0,>=0.6.8
Requires-Dist: black<25.0.0,>=24.8.0
Requires-Dist: pre-commit<4.0.0,>=3.8.0
Requires-Dist: isort<6.0.0,>=5.13.2
Requires-Dist: jinja2<4.0.0,>=3.1.4
Requires-Dist: pymongo<5.0.0,>=4.10.0
Requires-Dist: docker<8.0.0,>=7.1.0
Requires-Dist: faker<31.0.0,>=30.1.0
Requires-Dist: filelock<4.0.0,>=3.20.3
Requires-Dist: pyyaml<7.0.0,>=6.0.2
Requires-Dist: pyshacl<0.31.0,>=0.30.1
Requires-Dist: tqdm<5.0.0,>=4.66.5
Requires-Dist: untangle<2.0.0,>=1.2.1
Requires-Dist: sqlalchemy<3.0.0,>=2.0.35
Requires-Dist: influxdb-client==1.49.0
Requires-Dist: fastapi<1.0.0,>=0.115.0
Requires-Dist: uvicorn[standard]<1.0.0,>=0.24.0
Requires-Dist: urllib3<3.0.0,>=2.6.3
Requires-Dist: bclearer-interop-services
Requires-Dist: bclearer-core
Requires-Dist: bclearer-orchestration-services
Requires-Dist: bnop
Requires-Dist: httpcore>=1.0.9
Requires-Dist: httpx>=0.28.1
Requires-Dist: regex>=2025.9.18
Requires-Dist: ipython>=9.6.0
Requires-Dist: aiofiles>=25.1.0
Requires-Dist: pytest-asyncio>=1.3.0
Requires-Dist: raphtory>=0.16.4
Requires-Dist: python-multipart>=0.0.22
Requires-Dist: blake3>=1.0.8
Dynamic: license-file

# bclearer Pipeline Development Kit (PDK)

The bCLEARer Pipeline Development Kit (PDK) bundles the libraries, scaffolding tools, and reference assets used to build semantic data pipelines on the bCLEARer platform. It delivers the core building blocks for configuration, data interoperability, orchestration, and ontology modelling so you can go from a pipeline blueprint to a running implementation quickly.

## Highlights

- Generate complete pipeline skeletons with the `bclearer-pipeline-builder` CLI (interactive authoring, JSON-driven creation, structural updates, and template extraction).
- Connect to the ecosystems your pipelines touch: CSV/Excel/JSON, Delta Lake, PySpark, HDF5, MongoDB, MS Access, PostgreSQL, SQL Server, Neo4j, CozoDB, Raphtory, Enterprise Architect, and more.
- Operate pipelines confidently with orchestration helpers covering app lifecycle management, UUID/identity services, logging, reporting, static analysis, version-control utilities, and unit-of-measure management.
- Model your universe with the BNOP ontology module—our BORO Native Objects implementation featuring factories, relationship management, and XML migrations.

## Workspace Packages

| Package | Path | Highlights |
| --- | --- | --- |
| `bclearer-core` | `libraries/core` | Configuration managers, canonical identifiers (CKIDs), pipeline stage definitions, and the pipeline builder engine + CLI. |
| `bclearer-interop-services` | `libraries/interop_services` | Data I/O adapters and transformations spanning DataFrames, parquet/delta, document stores, graph backends (Neo4j, Raphtory, CozoDB), RDBMS connectors, EA integrations, and session orchestration. |
| `bclearer-orchestration-services` | `libraries/orchestration_services` | Application runner wrappers, logging and reporting helpers, UUID generation, static code analysis, string/unicode tooling, unit-of-measure libraries, and version-control services. |
| `bnop` | `libraries/ontology` | BORO Native Objects (Python) ontology runtime with factories, facades, migrations, and serializers used across bCLEARer pipelines. |

## Repository Layout

- `pipelines/` – reference pipelines generated by the builder (template domain, BOSON, CFI, Uniclass).
- `documentation/` – architecture notes and feature blueprints (pipeline framework, Neo4j, Raphtory, universe designer, RDF/Jena, and more).
- `docker/` – container recipes for running services locally.
- `release_management/` – scripts supporting builds and releases.
- `ui/` – the React-based tooling used to drive pipeline authoring experiences.

## Getting Started

Requires Python 3.12+.

### WSL prerequisites

If running on WSL, install the ODBC driver library before installing Python dependencies:

```bash
sudo apt-get update && sudo apt-get install -y unixodbc-dev
```

Without this, `pyodbc` will fail with `ImportError: libodbc.so.2: cannot open shared object file: No such file or directory`.

### Install the workspace (recommended)

```bash
pip install uv
uv sync
source .venv/bin/activate
```

`uv sync` installs all workspace members (`bclearer-core`, `bclearer-interop-services`, `bclearer-orchestration-services`, `bnop`) in editable mode.

### Alternative: standard pip

```bash
python -m venv .venv
source .venv/bin/activate
pip install -e .
```

Install individual packages from PyPI if you only need a subset, for example `pip install bclearer-core`.

## Pipeline builder CLI

The pipeline builder turns JSON (or interactive prompts) into a fully structured bCLEARer pipeline: domains, pipelines, thin slices, stages, sub-stages, orchestrators, and b-units.

```bash
# Generate a sample configuration file
bclearer-pipeline-builder sample --output pipeline_config.json

# Create a pipeline in the current directory
bclearer-pipeline-builder create --config pipeline_config.json --output ./pipelines

# Update an existing pipeline from configuration
bclearer-pipeline-builder update --config pipeline_config.json --pipeline ./pipelines/example_domain

# Extract templates from a curated pipeline
bclearer-pipeline-builder update-templates --template-path pipelines/template_pipeline
```

Run `bclearer-pipeline-builder help` or `python -m bclearer_core.pipeline_builder help` for the full command reference. The generated pipelines follow the [bCLEARer pipeline framework](documentation/bclearer_pipeline_framework.md).

## Working with the libraries

### Data interchange

```python
from bclearer_interop_services.b_dictionary_service.table_as_dictionary_service import (
    TableAsDictionaryFromCsvFileReader,
    TableAsDictionaryToDataFrameConverter,
)

reader = TableAsDictionaryFromCsvFileReader()
table_dict = reader.read("data/example.csv")

converter = TableAsDictionaryToDataFrameConverter()
dataframe = converter.convert(table_dict)
```

Beyond CSV and DataFrames you will find adapters for Excel, JSON, XML, HDF5, Parquet/Delta Lake, PySpark sessions, MongoDB, MS Access, PostgreSQL, SQL Server, CozoDB, Neo4j, Raphtory, Enterprise Architect, filesystem snapshots, and more.

### Ontology modelling

```python
from bnop.bnop_facades import BnopFacades
from bclearer_orchestration_services.identification_services.uuid_service.uuid_helpers.uuid_factory import (
    create_new_uuid,
)

repository_uuid = create_new_uuid()

product_type = BnopFacades.create_new_bnop_type(repository_uuid)
product = BnopFacades.create_bnop_object(
    object_uuid=create_new_uuid(),
    owning_repository_uuid=repository_uuid,
    presentation_name="Example Product",
)

BnopFacades.write_bnop_object_to_xml("bnop_snapshot.xml")
```

Use CKIDs from `bclearer_core.ckids` to classify tuples and relationships when you need richer BORO semantics.

### Orchestration helpers

```python
from bclearer_orchestration_services.b_app_runner_service.b_application_runner import run_b_application

def bootstrap():
    print("hello bCLEARer")

run_b_application(bootstrap)
```

Complement this with utilities from `identification_services`, `log_environment_utility_service`, `static_code_analysis_service`, and `version_control_services` to manage runtime behaviour and governance.

## Testing & quality gates

```bash
pytest                     # run the complete suite
pytest -m "not heavy"      # skip connectors that rely on external services
ruff check                 # lint
black .                    # format
```

Most tests live under `libraries/*/tests`. Heavy tests target databases or graph backends and are opt-in via the `heavy` marker.

## Documentation & next steps

- Architecture overview: `documentation/bclearer_pipeline_framework.md`
- Feature workstreams: `documentation/features/`
- UI tooling walkthroughs: `documentation/ui/`

## Contributing

1. Fork the repository and create a feature branch.
2. Sync dependencies (`uv sync` or `pip install -e .`).
3. Add tests where sensible and run the quality gates.
4. Submit a pull request with context on the change.

We welcome issues and ideas—open a discussion in GitHub or drop us a line.

## License

MIT License. See [`LICENSE`](LICENSE).

## Contact

Mesbah Khan — khanm@ontoledgy.io
