Metadata-Version: 2.4
Name: oreum_core
Version: 0.12.18
Summary: Core tools for use on projects by Oreum Industries
Author-email: Oreum Industries <info@oreum.io>
Requires-Python: ==3.13.*
Description-Content-Type: text/markdown
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Science/Research
Classifier: Intended Audience :: Financial and Insurance Industry
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Natural Language :: English
Classifier: Operating System :: MacOS
Classifier: Operating System :: Unix
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Scientific/Engineering
Classifier: Topic :: Scientific/Engineering :: Information Analysis
Classifier: Topic :: Scientific/Engineering :: Visualization
License-File: LICENSE.md
License-File: LICENSES_3P.md
Requires-Dist: dask
Requires-Dist: ftfy
Requires-Dist: matplotlib
Requires-Dist: numpy>=2.0
Requires-Dist: pandas[excel, parquet, plot]>=3.0,<4
Requires-Dist: patsy
Requires-Dist: scikit-learn>=1.0
Requires-Dist: scipy>=1.9
Requires-Dist: seaborn>=0.13
Requires-Dist: bandit ; extra == "dev"
Requires-Dist: genbadge[coverage] ; extra == "dev"
Requires-Dist: hypothesis ; extra == "dev"
Requires-Dist: interrogate ; extra == "dev"
Requires-Dist: ipython ; extra == "dev"
Requires-Dist: meson ; extra == "dev"
Requires-Dist: ninja ; extra == "dev"
Requires-Dist: pipdeptree ; extra == "dev"
Requires-Dist: pip-licenses ; extra == "dev"
Requires-Dist: pooch ; extra == "dev"
Requires-Dist: pre-commit ; extra == "dev"
Requires-Dist: pytest ; extra == "dev"
Requires-Dist: pytest-cov ; extra == "dev"
Requires-Dist: ruff ; extra == "dev"
Requires-Dist: graphviz ; extra == "pymc"
Requires-Dist: nutpie ; extra == "pymc"
Requires-Dist: preliz ; extra == "pymc"
Requires-Dist: pymc ; extra == "pymc"
Requires-Dist: pytensor ; extra == "pymc"
Requires-Dist: pytensor-distributions ; extra == "pymc"
Requires-Dist: xgboost ; extra == "tree"
Project-URL: Homepage, https://github.com/oreum-industries/oreum_core
Provides-Extra: dev
Provides-Extra: pymc
Provides-Extra: tree

# Oreum Core Tools `oreum_core`

[![Python](https://img.shields.io/badge/python-3.13-blue)](https://www.python.org)
[![License](https://img.shields.io/badge/license-Apache2.0-blue.svg)](https://choosealicense.com/licenses/apache-2.0/)
[![GitHub Release](https://img.shields.io/github/v/release/oreum-industries/oreum_core?display_name=tag&sort=semver)](https://github.com/oreum-industries/oreum_core/releases)
[![PyPI](https://img.shields.io/pypi/v/oreum_core)](https://pypi.org/project/oreum_core)
[![lint](https://github.com/oreum-industries/oreum_core/workflows/lint/badge.svg)](https://github.com/oreum-industries/oreum_core/actions/workflows/lint.yml)
[![test](https://github.com/oreum-industries/oreum_core/workflows/test/badge.svg)](https://github.com/oreum-industries/oreum_core/actions/workflows/test.yml)
[![publish](https://github.com/oreum-industries/oreum_core/actions/workflows/publish.yml/badge.svg)](https://github.com/oreum-industries/oreum_core/actions/workflows/publish.yml)
[![code style: ruff](https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/astral-sh/ruff/main/assets/badge/v2.json)](https://github.com/astral-sh/ruff)
[![code security: bandit](https://img.shields.io/badge/code%20security-bandit-yellow.svg)](https://github.com/PyCQA/bandit)
[![code style: interrogate](https://raw.githubusercontent.com/oreum-industries/oreum_core/master/assets/img/interrogate_badge.svg)](https://pypi.org/project/interrogate/)
[![test coverage](https://raw.githubusercontent.com/oreum-industries/oreum_core/master/assets/img/coverage_badge.svg)](https://github.com/oreum-industries/oreum_core/actions/workflows/test.yml)

---

## 1. Description and Scope

`oreum_core` is an ever-evolving package of core tools for use on client
projects by Oreum Industries.

+ Provides an essential workflow for data curation, EDA, basic ML using the core
  scientific Python stack incl. `numpy`, `scipy`, `matplotlib`, `seaborn`,
  `pandas`, `scikit-learn`
+ Optionally provides an advanced Bayesian modeling workflow in R&D and
  Production using a leading probabilistic programming stack incl. `pymc`,
  `pytensor`, `arviz`
  (do `pip install oreum_core[pymc]`)
+ Optionally enables a generalist black-box ML workflow in R&D using a leading
  Gradient Boosted Trees stack using `xgboost`
  (do `pip install oreum_core[tree]`)
+ Also includes several utilities for text cleaning, sql scripting, file handling


This package **is**:

+ A work in progress (v0.y.z) and liable to breaking changes and inconvenience
  to the user
+ Solely designed for ease of use and rapid development by employees of
  Oreum Industries, and selected clients with guidance

This package **is not**:

+ Intended for public usage and will not be supported for public usage
+ Intended for contributions by anyone not an employee of Oreum Industries,
  and unsolicited contributions will not be accepted.


### Notes

+ Project began on 2021-01-01
+ The `README.md` is MacOS and POSIX oriented
+ See `LICENCE.md` for licensing and copyright details
+ See `pyproject.toml` for various package details
+ See `CLAUDE.md` for Claude Code rules
+ This uses a logger named `'oreum_core'`, feel free to incorporate or ignore
  see `__init__.py` for details
+ Hosting:
  + Source code repo on [GitHub](https://github.com/oreum-industries/oreum_core)
  + Source code release on [GitHub](https://github.com/oreum-industries/oreum_core/releases)
  + Package release on [PyPi](https://pypi.org/project/oreum_core)
+ Implementation:
  + This project is enabled by a modern, open-source, advanced software stack
    for data curation, statistical analysis and predictive modelling
  + Specifically we use an open-source Python-based suite of software packages,
    the core of which is often known as the Scientific Python stack, supported
    by [NumFOCUS](https://numfocus.org)
  + Once installed (see section 2), see `LICENSES_3P.md` for full
    details of all package licences
+ Environments: this project was originally developed on a Macbook Air M2
  (Apple Silicon ARM64) running MacOS 15 (Sequoia) using `osx-arm64` Accelerate


### Package Structure

Top-level:
```
oreum_core/
├── curate/      # Data ingestion & transformation
├── eda/         # Exploratory data analysis
├── model_pymc/  # Bayesian modeling (optional dep: pip install oreum_core[pymc])
├── model_tree/  # Gradient-boosted trees (optional dep: pip install oreum_core[tree])
└── utils/       # BaseFileIO base class for all I/O handlers, also string sanitization
```

---


## 2. Instructions to Create Dev Environment

For local development on MacOS

### 2.0 Pre-requisite installs via `homebrew`

1. Install Homebrew, see instructions at [https://brew.sh](https://brew.sh)
2. Install system-level tools incl. `direnv`, `gcc`, `git`, `graphviz`, `uv`:

```zsh
$> make brew
```

### 2.1 Git clone the repo

Assumes system-level tools installed as above:

```zsh
$> git clone https://github.com/oreum-industries/oreum_core
$> cd oreum_core
```
Then allow `direnv` on MacOS to autorun file `.envrc` upon directory open


### 2.2 Create virtual environment and install dev packages

Notes:

+ We use local `.venv/` virtual env via [`uv`](https://github.com/astral-sh/uv)
+ Packages are technically articulated in `pyproject.toml` and might not be the
  latest - to aid stability for `pymc` (usually in a state of development flux)


#### 2.2.1 Create the dev environment

From the dir above `oreum_core/` project dir:

```zsh
$> make -C oreum_core/ dev
```

This will also create some files to help confirm / diagnose successful installation:

+ `dev/install_log/blas_info.txt` for the `BLAS MKL` installation for `numpy`
+ `LICENSES_3P.md` details the license for each third-party package used


#### 2.2.2 (Optional best practice) Test successful installation of dev env

From the dir above `oreum_core/` project dir:

```zsh
$> make -C oreum_core/ dev-test
```

This will also add files `dev/install_log/tests_[numpy|scipy].txt` which detail
successful installation (or not) for `numpy`, `scipy`


#### 2.2.3 (Useful during env install experimentation): To remove the dev env

From the dir above `oreum_core/` project dir:

```zsh
$> make -C oreum_core/ dev-uninstall
```

### 2.3 Code Linting & Repo Control

#### 2.3.1 Pre-commit

We use [pre-commit](https://pre-commit.com) to run a suite of automated tests
for code linting & quality control and repo control prior to commit on local
development machines.

+ Precommit is already installed by the `make dev` command (which itself calls
`pip install -e .[dev]`)
+ The pre-commit script will then run on your system upon `git commit`
+ See this project's `.pre-commit-config.yaml` for details


#### 2.3.2 Github Actions

We use [Github Actions](https://docs.github.com/en/actions/using-workflows) aka
Github Workflows to run:

1. A suite of automated tests for commits received at the origin (i.e. GitHub)
2. Publishing to PyPi upon creating a GH Release

+ See `Makefile` for the CLI commands that are issued
+ See `.github/workflows/*` for workflow details


#### 2.3.3 Git LFS

We use [Git LFS](https://git-lfs.github.com) to store any large files alongside
the repo. This can be useful to replicate exact environments during development
and/or for automated tests

+ This requires a local machine install
  (see [Getting Started](https://git-lfs.github.com))
+ See `.gitattributes` for details


### 2.4 Configs for Local Development

Some notes to help configure local development environment

#### 2.4.1 Git config `~/.gitconfig`

```yaml
[user]
    name = <YOUR NAME>
    email = <YOUR EMAIL ADDRESS>
```


### 2.5 Install VSCode IDE

We strongly recommend using [VSCode](https://code.visualstudio.com) for all
development on local machines, and this is a hard pre-requisite to use
the `.devcontainer` environment (see section 3)

This repo includes relevant lightweight project control and config in:

```zsh
oreum_core.code-workspace
.vscode/extensions.json
.vscode/settings.json
```

### 2.6 Publishing to PyPi

A note for maintainers (Oreum Industries only), publishing to pypi, ensure
local dev machine presence of the following in a config file `~/.pypirc`

```yaml
[distutils]
index-servers =
   pypi
   testpypi

[pypi]
repository = https://upload.pypi.org/legacy/
username = __token__

[testpypi]
repository = https://test.pypi.org/legacy/
username = __token__

```

---

## 3. Code Standards

Even when writing R&D code, we strive to meet and exceed (even define) best
practices for code quality, documentation and reproducibility for modern
data science projects.

### 3.1 Code Linting & Repo Control

We use a suite of automated tools to check and enforce code quality. We indicate
the relevant shields at the top of this README. See section 1.4 above for how
this is enforced at precommit on developer machines and upon PR at the origin as
part of our CI process, prior to master branch merge.

These include:

+ [`ruff`](https://docs.astral.sh/ruff/) - extremely fast standardised linting
  and formatting, which replaces `black`, `flake8`, `isort`
+ [`interrogate`](https://pypi.org/project/interrogate/) - ensure complete Python
  docstrings
+ [`bandit`](https://github.com/PyCQA/bandit) - test for common Python security
  issues

We also run a suite of general tests pre-packaged in
[`precommit`](https://pre-commit.com).


---

## 4. Usage

### 4.1 Plot theming

```python
from oreum_core.eda import set_plot_theme
set_plot_theme()  # or pass overrides: set_plot_theme(context="paper")
```

---

Copyright 2026 Oreum FZCO t/a Oreum Industries. All rights reserved.
Oreum FZCO, IFZA, Dubai Silicon Oasis, Dubai, UAE, reg. 25515
[oreum.io](https://oreum.io)

---
Oreum Industries &copy; 2026

