Metadata-Version: 2.4
Name: jragbeer_common
Version: 0.2.64
Summary: Package to create JRAGBEER Common Functions
Author-email: jragbeer <jragbeer@gmail.com>
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Requires-Python: <3.13,>=3.12
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: psutil>=5.9.0
Requires-Dist: dask==2025.11.0
Requires-Dist: distributed==2025.11.0
Requires-Dist: urllib3==2.2.1
Requires-Dist: psycopg2-binary>=2.9.10
Requires-Dist: polars>=1.36.0
Requires-Dist: pyarrow==18.1.0
Requires-Dist: beautifulsoup4==4.12.2
Requires-Dist: pandas==2.2.2
Requires-Dist: selenium==4.28.1
Requires-Dist: SQLAlchemy==2.0.23
Requires-Dist: tqdm==4.66.1
Requires-Dist: numpy==1.26.0
Requires-Dist: paramiko==3.5.0
Requires-Dist: requests==2.31.0
Requires-Dist: azure-storage-blob==12.16.0
Requires-Dist: azure-core==1.26.4
Requires-Dist: types-paramiko>=3.5.0.20240928
Requires-Dist: types-pyyaml>=6.0.12.20241230
Provides-Extra: dev
Requires-Dist: debugpy>=1.8.20; extra == "dev"
Requires-Dist: docker>=7.1.0; extra == "dev"
Requires-Dist: pre-commit>=4.2.0; extra == "dev"
Requires-Dist: pytest>=9.0.3; extra == "dev"
Requires-Dist: python-dotenv>=1.2.2; extra == "dev"
Requires-Dist: ruff>=0.15.11; extra == "dev"
Requires-Dist: ty>=0.0.32; extra == "dev"
Dynamic: license-file

# jragbeer_common

[![License](https://img.shields.io/badge/License-Apache_2.0-blue.svg)](https://opensource.org/licenses/Apache-2.0) [![Python 3.10](https://shields.io/pypi/pyversions/astyle)](https://www.python.org/downloads/release/python-310/) [![pre-commit](https://img.shields.io/badge/pre--commit-enabled-brightgreen?logo=pre-commit&logoColor=white)](https://github.com/pre-commit/pre-commit)

## Overview

A Python utility library containing common functions and tools used across various projects. This library provides reusable components for:

- Data processing and engineering
- Azure blob storage operations
- Dask distributed computing
- Ubuntu system operations

## Features

- **Data Engineering Utilities**

  - DataFrame manipulation and cleaning
  - SQL database operations
  - Email notifications
  - Date/time processing
  - Logging configuration

- **Azure Integration**

  - Blob storage upload/download
  - Parquet file handling
  - Container management
  - Batch operations

- **Dask Distributed Computing**

  - Cluster deployment and management
  - Worker allocation
  - Task scheduling
  - Remote execution

- **Ubuntu System Operations**
  - Remote command execution
  - Process management
  - System monitoring
  - File operations

## Installation

1. Install `uv` (recommended):

```bash
pip install uv
```

2. Create and activate a virtual environment:

```bash
uv venv
source .venv/bin/activate  # Linux/Mac
# or
.venv\Scripts\activate  # Windows
```

3. Install the package:

```bash
uv pip install jragbeer-common
```

## Development Setup

1. Clone the repository:

```bash
git clone https://github.com/jragbeer/jragbeer_common.git
cd jragbeer_common
```

2. Create a virtual environment and install dependencies:

```bash
uv venv
source .venv/bin/activate
uv pip install -r requirements.txt
```

3. Install development dependencies:

```bash
uv pip install -e ".[dev]"
```

4. Install pre-commit hooks:

```bash
pre-commit install
```

## Usage

```python
from jragbeer_common import (
    jragbeer_common_data_eng,
    jragbeer_common_azure,
    jragbeer_common_dask,
    jragbeer_common_ubuntu
)

# Data Engineering
jragbeer_common_data_eng.parse_date_features(df)

# Azure Operations
jragbeer_common_azure.adls_upload_file("path/to/file", "blob_name")

# Dask Operations
jragbeer_common_dask.deploy_dask_home_setup()

# Ubuntu Operations
jragbeer_common_ubuntu.execute_cmd_ubuntu_sudo("command")
```

## Environment Variables

The following environment variables are required:

```bash
# Azure Storage
adls_connection_string="your_connection_string"
adls_container_name="your_container"

# Database
local_db_username="username"
local_db_password="password"
local_db_address="address"
local_db_port="port"

# Cluster Configuration
cluster_server_1_address="address"
cluster_server_1_username="username"
cluster_server_1_password="password"
```

## Building and Distribution

1. Build the package:

```bash
uv build
```

2. Install locally for testing:

```bash
uv pip install dist/jragbeer_common-0.2.0-py3-none-any.whl
```

## Contributing

1. Fork the repository
2. Create a feature branch
3. Make your changes
4. Run tests and linting:

```bash
uv sync --extra dev
pre-commit run --all-files
uv run pytest
```

CI runs the same full suite on each push and pull request (see
[`.github/workflows/test.yml`](.github/workflows/test.yml)), including tests
marked `integration` (Docker Postgres, etc.). You need a working Docker daemon
for a green run locally and in GitHub Actions (the runner provides Docker).
Markers (`unit`, `integration`, `contract`) still classify tests in
[`pyproject.toml`](pyproject.toml); nothing is excluded by default.

### Integration tests (Docker Postgres)

Database tests that open PostgreSQL live in
[`test/db/test_jragbeer_common_db.py`](test/db/test_jragbeer_common_db.py) and are
marked `@pytest.mark.integration`. They use the session-scoped container from
[`test/fixtures/postgresql.py`](test/fixtures/postgresql.py) plus the `docker`
Python client and `psycopg2-binary`. They run as part of the default
`uv run pytest` (and in CI).

Optional environment variables (defaults in parentheses):

- `TEST_POSTGRESQL_PORT` — host port mapped to the container (`54329`)
- `TEST_POSTGRESQL_USER`, `TEST_POSTGRESQL_PASSWORD`, `TEST_POSTGRESQL_DATABASE` — bootstrap credentials (`devuser` / `devpassword` / `postgres`)
- `TEST_POSTGRESQL_VERSION` — image tag after `postgres:` (`17.4`)

Readiness is checked with **TCP connections via psycopg2** from the host (same as SQLAlchemy), not `docker exec psql`, so you do not need the `docker` CLI or host `postgresql-client` for the wait loop.

**Future work:** tests that hit real services should live under the appropriate
`test/<subpackage>/` package, be marked `@pytest.mark.integration`, and
document required environment variables in the test module docstring.

## License

Copyright 2026 Julien Ragbeer

Licensed under the Apache License, Version 2.0
