Metadata-Version: 2.4
Name: core-etl
Version: 3.2.1
Summary: This project/library contains common elements related to ETL processes...
Author-email: Alejandro Cora González <alek.cora.glez@gmail.com>
Maintainer: Alejandro Cora González
License-Expression: MIT
Project-URL: Homepage, https://gitlab.com/bytecode-solutions/core/core-etl
Project-URL: Repository, https://gitlab.com/bytecode-solutions/core/core-etl
Project-URL: Documentation, https://core-etl.readthedocs.io/en/latest/
Project-URL: Issues, https://gitlab.com/bytecode-solutions/core/core-etl/-/issues
Project-URL: Changelog, https://gitlab.com/bytecode-solutions/core/core-etl/-/blob/master/CHANGELOG.md
Classifier: Intended Audience :: Developers
Classifier: Development Status :: 5 - Production/Stable
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Topic :: Utilities
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3 :: Only
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Programming Language :: Python :: 3.14
Classifier: Programming Language :: Python :: Implementation :: PyPy
Requires-Python: >=3.9
Description-Content-Type: text/x-rst
Requires-Dist: core-mixins>=3.2.0
Provides-Extra: dev
Requires-Dist: core-dev-tools>=2.0.0; extra == "dev"
Requires-Dist: core-tests>=2.1.0; extra == "dev"

# core-etl
===============================================================================

This library provides essential components for ETL processes, offering reusable interfaces 
for seamless data extraction, transformation, and loading....

===============================================================================

.. image:: https://img.shields.io/pypi/pyversions/core-etl.svg
    :target: https://pypi.org/project/core-etl/
    :alt: Python Versions

.. image:: https://img.shields.io/badge/license-MIT-blue.svg
    :target: https://gitlab.com/bytecode-solutions/core/core-etl/-/blob/main/LICENSE
    :alt: License

.. image:: https://gitlab.com/bytecode-solutions/core/core-etl/badges/release/pipeline.svg
    :target: https://gitlab.com/bytecode-solutions/core/core-etl/-/pipelines
    :alt: Pipeline Status

.. image:: https://readthedocs.org/projects/core-etl/badge/?version=latest
    :target: https://readthedocs.org/projects/core-etl/
    :alt: Docs Status

.. image:: https://img.shields.io/badge/security-bandit-yellow.svg
    :target: https://github.com/PyCQA/bandit
    :alt: Security

|


Installation
===============================================================================

Install from PyPI using pip:

.. code-block:: bash

    pip install core-etl
    uv pip install core-etl  # Or using UV...


Features
===============================================================================

**Base ETL Framework**

* Template method pattern for ETL workflow orchestration
* Comprehensive lifecycle hooks (pre-processing, execution, post-processing, cleanup)
* Built-in error handling with detailed exception logging
* Task status tracking (CREATED, EXECUTING, SUCCESS, ERROR)
* Timezone support for date/datetime processing (defaults to UTC)
* Temporary folder management for local file operations
* Extensible resource cleanup mechanisms

**File-Based ETL (IBaseEtlFromFile)**

* Process files from various sources (SFTP, local filesystem, cloud storage)
* Iterator-based file processing with error isolation per file
* Individual file success/error callbacks for custom handling
* Batch file operations with automatic error recovery
* Extensible hooks: ``get_paths()``, ``process_file()``, ``on_success()``, ``on_error()``

**Record-Based ETL (IBaseEtlFromRecord)**

* Process records from APIs, databases, files, message queues, and data streams
* Memory-efficient batch processing with configurable batch sizes
* Built-in transformation pipeline:

  * Field removal (``attrs_to_remove``)
  * Field renaming (``name_mapper``)
  * Data type casting (``type_mapper``)

* Pre and post transformation hooks for custom business logic
* Incremental processing support with ``last_processed`` markers
* Extensible methods: ``retrieve_records()``, ``process_records()``, ``pre_transformations()``, ``post_transformations()``

**Async ETL (IAsyncETL)**

* Concurrent record processing via asyncio producer/consumer pattern
* Configurable worker pool size (``max_workers``) and queue capacity (``max_queue_size``)
* Individual record failures are isolated, failed records are logged and skipped without aborting the pipeline
* Extensible methods: ``produce_records()``, ``_process_record()``
* Note: ``execute()`` uses ``asyncio.run()`` internally; call ``await asyncio.to_thread(task.execute)`` from async contexts


Quick Start
===============================================================================

Installation
-------------------------------------------------------------------------------

Install the package:

.. code-block:: bash

    pip install core-etl
    uv pip install core-etl     # Or using UV...
    pip install -e ".[dev]"     # For development...


Setting Up Environment
-------------------------------------------------------------------------------

1. Install required libraries:

.. code-block:: bash

    pip install --upgrade pip
    pip install virtualenv

2. Create Python virtual environment:

.. code-block:: bash

    virtualenv --python=python3.12 .venv

3. Activate the virtual environment:

.. code-block:: bash

    source .venv/bin/activate

Install packages
-------------------------------------------------------------------------------

.. code-block:: bash

    pip install .
    pip install -e ".[dev]"

Check tests and coverage
-------------------------------------------------------------------------------

.. code-block:: shell

    python manager.py run-tests
    python manager.py run-coverage


Contributing
===============================================================================

Contributions are welcome! Please:

1. Fork the repository
2. Create a feature branch
3. Write tests for new functionality
4. Ensure all tests pass: ``pytest -n auto``
5. Run linting: ``pylint core_etl``
6. Run security checks: ``bandit -r core_etl``
7. Submit a pull request


License
===============================================================================

This project is licensed under the MIT License. See the LICENSE file for details.


Links
===============================================================================

* **Documentation:** https://core-etl.readthedocs.io/en/latest/
* **Repository:** https://gitlab.com/bytecode-solutions/core/core-etl
* **Issues:** https://gitlab.com/bytecode-solutions/core/core-etl/-/issues
* **Changelog:** https://gitlab.com/bytecode-solutions/core/core-etl/-/blob/master/CHANGELOG.md
* **PyPI:** https://pypi.org/project/core-etl/


Support
===============================================================================

For questions or support, please open an issue on GitLab or contact the maintainers.


Authors
===============================================================================

* **Alejandro Cora González** - *Initial work* - alek.cora.glez@gmail.com
