Metadata-Version: 2.4
Name: safetar
Version: 0.1.2
Summary: Hardened TAR extraction for Python - secure by default.
Author-email: Artur Barseghyan <artur.barseghyan@gmail.com>
Maintainer-email: Artur Barseghyan <artur.barseghyan@gmail.com>
License-Expression: MIT
Project-URL: Homepage, https://github.com/barseghyanartur/safetar/
Project-URL: Repository, https://github.com/barseghyanartur/safetar/
Project-URL: Issues, https://github.com/barseghyanartur/safetar/issues
Keywords: tar,security,tarslip,tarbomb,hardened,safe
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Programming Language :: Python :: 3.14
Classifier: Programming Language :: Python
Classifier: Topic :: Security
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Topic :: System :: Archiving
Requires-Python: >=3.10
Description-Content-Type: text/x-rst
License-File: LICENSE
Provides-Extra: all
Requires-Dist: safetar[build,dev,docs,test]; extra == "all"
Provides-Extra: dev
Requires-Dist: detect-secrets; extra == "dev"
Requires-Dist: doc8; extra == "dev"
Requires-Dist: ipython; extra == "dev"
Requires-Dist: mypy; extra == "dev"
Requires-Dist: ruff; extra == "dev"
Requires-Dist: uv; extra == "dev"
Provides-Extra: test
Requires-Dist: pytest; extra == "test"
Requires-Dist: pytest-cov; extra == "test"
Requires-Dist: pytest-codeblock; extra == "test"
Provides-Extra: docs
Requires-Dist: sphinx; extra == "docs"
Requires-Dist: sphinx-autobuild; extra == "docs"
Requires-Dist: sphinx-rtd-theme>=1.3.0; extra == "docs"
Requires-Dist: sphinx-no-pragma; extra == "docs"
Requires-Dist: sphinx-markdown-builder; extra == "docs"
Requires-Dist: sphinx-llms-txt-link; extra == "docs"
Requires-Dist: sphinx-source-tree; extra == "docs"
Provides-Extra: build
Requires-Dist: build; extra == "build"
Requires-Dist: twine; extra == "build"
Requires-Dist: wheel; extra == "build"
Dynamic: license-file

=======
safetar
=======
.. image:: https://raw.githubusercontent.com/barseghyanartur/safetar/main/docs/_static/safetar_logo.webp
   :alt: SafeTar Logo
   :align: center

Hardened TAR extraction for Python - secure by default.

.. image:: https://img.shields.io/pypi/v/safetar.svg
   :target: https://pypi.python.org/pypi/safetar
   :alt: PyPI Version

.. image:: https://img.shields.io/pypi/pyversions/safetar.svg
   :target: https://pypi.python.org/pypi/safetar/
   :alt: Supported Python versions

.. image:: https://github.com/barseghyanartur/safetar/actions/workflows/test.yml/badge.svg?branch=main
   :target: https://github.com/barseghyanartur/safetar/actions
   :alt: Build Status

.. image:: https://readthedocs.org/projects/safetar/badge/?version=latest
    :target: http://safetar.readthedocs.io
    :alt: Documentation Status

.. image:: https://img.shields.io/badge/docs-llms.txt-blue
    :target: https://safetar.readthedocs.io/en/latest/llms.txt
    :alt: llms.txt - documentation for LLMs

.. image:: https://img.shields.io/badge/license-MIT-blue.svg
   :target: https://github.com/barseghyanartur/safetar/#License
   :alt: MIT

.. image:: https://coveralls.io/repos/github/barseghyanartur/safetar/badge.svg?branch=main&service=github
    :target: https://coveralls.io/github/barseghyanartur/safetar?branch=main
    :alt: Coverage

``safetar`` is a zero-dependency, production-grade wrapper around Python's
``tarfile`` module that defends against the most common TAR-based attacks:
TarSlip path traversal, decompression bombs, symlink/hardlink attacks,
device file injection, and crafted archives.

Features
========

- **TarSlip protection** - relative traversal, absolute paths, Unicode
  NFC normalisation attacks, PAX path overrides, GNU long-name reassembly,
  and null bytes in filenames are all blocked.
- **Decompression bomb protection** - archive-level compression ratio
  monitoring across GZ, BZ2, and XZ streams aborts extraction before
  runaway decompression can exhaust disk or memory.
- **File size limits** - per-member and total extraction size limits enforced
  at stream time (not based on untrusted header values).
- **Symlink policy** - configurable: ``REJECT`` (default), ``IGNORE``, or
  ``RESOLVE_INTERNAL`` (full chain verification with TOCTOU defence via
  deferred batch creation).
- **Hardlink policy** - configurable: ``REJECT`` (default) or ``INTERNAL``
  (target must exist on disk; forward references rejected).
- **Forbidden entry types** - character devices, block devices, FIFOs, and
  unknown type codes are always rejected.
- **setuid/setgid/sticky bit stripping** - dangerous permission bits are
  removed by default.
- **UID/GID ownership clamping** - archived ownership is clamped to the
  current user by default.
- **Timestamp sanitisation** - mtime values are clamped to ``[0, 2**32 - 1]``.
- **Sparse file policy** - ``REJECT`` (default) or ``MATERIALISE`` (extract
  as dense).
- **Atomic writes** - every member is written to a temporary file first;
  the destination is only created after all checks pass.  No partial files
  are left on disk after a security abort.
- **Secure by default** - all limits are active without any configuration.
- **Zero dependencies** - standard library only.
- **Python 3.12 data_filter** - applied as an additional defensive layer
  when available.

Prerequisites
=============

Python 3.10 or later.  No additional packages required.

Installation
============
With ``uv``:

.. code-block:: sh

    uv pip install safetar

Or with ``pip``:

.. code-block:: sh

    pip install safetar

Quick start
===========

Drop-in replacement for the common ``tarfile`` extraction pattern:

.. pytestfixture: file_tar_gz
.. code-block:: python
    :name: test_safe_extract

    from safetar import safe_extract

    safe_extract("path/to/upload.tar.gz", "/var/files/extracted/")

Or use the ``SafeTarFile`` context manager for more control:

.. pytestfixture: file_tar_gz
.. code-block:: python
    :name: test_safe_tarfile

    from safetar import SafeTarFile

    with SafeTarFile("path/to/upload.tar.gz") as stf:
        print(stf.getnames())
        stf.extractall("/var/files/extracted/")

Custom limits
=============
See the `Default limits`_ for reference.

.. pytestfixture: file_tar_gz
.. code-block:: python
    :name: test_custom_limits

    from safetar import SafeTarFile, SymlinkPolicy, HardlinkPolicy

    with SafeTarFile(
        "path/to/upload.tar.gz",
        max_file_size=100 * 1024 * 1024,          # 100 MiB per member (default: 1 GiB)
        max_total_size=500 * 1024 * 1024,         # 500 MiB total (default: 5 GiB)
        max_files=1_000,                          # (default: 10 000)
        max_ratio=50.0,                           # (default: 200)
        symlink_policy=SymlinkPolicy.IGNORE,      # (default: SymlinkPolicy.REJECT)
        hardlink_policy=HardlinkPolicy.INTERNAL,  # (default: HardlinkPolicy.REJECT)
    ) as stf:
        stf.extractall("/var/files/extracted/")

Recursive extraction
====================

When an archive contains nested ``.tar`` files, set ``recursive=True`` to
descend into them automatically. All safety limits apply at every level. Each
nested archive is extracted into a directory named after it (without the
extension). The nested ``.tar`` file is removed from disk after recursive
extraction (see ``_extract_nested_archive`` in ``_core.py``).

.. pytestfixture: nested_tar_archive
.. code-block:: python
    :name: test_recursive_extraction

    from safetar import SafeTarFile

    # archive.tar
    #   readme.txt
    #   inner.tar          ← will be descended into, not extracted as a blob
    #     inner_file.txt

    with SafeTarFile("path/to/archive.tar.gz", recursive=True, max_nesting_depth=3) as stf:
        stf.extractall("/var/files/extracted/")

    # Result on disk:
    #   /var/files/extracted/readme.txt
    #   /var/files/extracted/inner/inner_file.txt

By default, ``recursive=False`` and nested tar archives are extracted as
regular files. When ``recursive=True``, safetar detects and extracts nested
tar archives automatically using content-based
detection (``tarfile.is_tarfile()``), avoiding extension-spoofing attacks.

All security protections are applied to nested archives:

- Nesting depth is enforced (``max_nesting_depth``)
- File size limits apply across all nested extractions (``max_file_size``,
  ``max_total_size``)
- Symlink, hardlink, and sparse policies are enforced
- Permission, ownership, and timestamp sanitisation is applied
- All other security checks (path traversal, decompression bombs, etc.)

Security event monitoring
=========================

.. pytestfixture: file_tar_gz
.. code-block:: python
    :name: test_security_event_monitoring

    from safetar import SafeTarFile, SecurityEvent

    def my_monitor(event: SecurityEvent) -> None:
        print(f"[safetar] {event.event_type} archive={event.archive_hash}")

    with SafeTarFile(
        "path/to/upload.tar.gz", on_security_event=my_monitor
    ) as stf:
        stf.extractall("/var/files/extracted/")

Default limits
==============

+--------------------------+------------------+
| Parameter                | Default          |
+==========================+==================+
| ``max_file_size``        | 1 GiB            |
+--------------------------+------------------+
| ``max_total_size``       | 5 GiB            |
+--------------------------+------------------+
| ``max_files``            | 10 000           |
+--------------------------+------------------+
| ``max_ratio``            | 200              |
+--------------------------+------------------+
| ``max_nesting_depth``    | 3                |
+--------------------------+------------------+
| ``recursive``            | False            |
+--------------------------+------------------+
| ``symlink_policy``       | REJECT           |
+--------------------------+------------------+
| ``hardlink_policy``      | REJECT           |
+--------------------------+------------------+
| ``sparse_policy``        | REJECT           |
+--------------------------+------------------+
| ``strip_special_bits``   | True             |
+--------------------------+------------------+
| ``preserve_ownership``   | False            |
+--------------------------+------------------+
| ``clamp_timestamps``     | True             |
+--------------------------+------------------+

Environment variable configuration
===================================
See the `Default limits`_ for reference.

Every default can be overridden at process start via environment variables,
without modifying call sites.  Explicit constructor arguments always take
precedence over environment variables.

+---------------------------------------+---------------------------+
| Environment variable                  | Parameter                 |
+=======================================+===========================+
| ``SAFETAR_MAX_FILE_SIZE``             | ``max_file_size``         |
+---------------------------------------+---------------------------+
| ``SAFETAR_MAX_TOTAL_SIZE``            | ``max_total_size``        |
+---------------------------------------+---------------------------+
| ``SAFETAR_MAX_FILES``                 | ``max_files``             |
+---------------------------------------+---------------------------+
| ``SAFETAR_MAX_RATIO``                 | ``max_ratio``             |
+---------------------------------------+---------------------------+
| ``SAFETAR_MAX_NESTING_DEPTH``         | ``max_nesting_depth``     |
+---------------------------------------+---------------------------+
| ``SAFETAR_RECURSIVE``                 | ``recursive``             |
+---------------------------------------+---------------------------+
| ``SAFETAR_SYMLINK_POLICY``            | ``symlink_policy``        |
+---------------------------------------+---------------------------+
| ``SAFETAR_HARDLINK_POLICY``           | ``hardlink_policy``       |
+---------------------------------------+---------------------------+
| ``SAFETAR_SPARSE_POLICY``             | ``sparse_policy``         |
+---------------------------------------+---------------------------+
| ``SAFETAR_STRIP_SPECIAL_BITS``        | ``strip_special_bits``    |
+---------------------------------------+---------------------------+
| ``SAFETAR_PRESERVE_OWNERSHIP``        | ``preserve_ownership``    |
+---------------------------------------+---------------------------+
| ``SAFETAR_CLAMP_TIMESTAMPS``          | ``clamp_timestamps``      |
+---------------------------------------+---------------------------+

Integer and float variables accept standard numeric strings.  Boolean
variables accept ``1`` / ``true`` / ``yes`` / ``on`` (truthy) or
``0`` / ``false`` / ``no`` / ``off`` (falsy), case-insensitively.
Policy variables accept the lower-case enum value names (e.g.
``SAFETAR_SYMLINK_POLICY=resolve_internal``).  Unrecognised or unparseable
values are silently ignored and the built-in default is used instead.

CLI
===

``safetar`` ships with a CLI for quick extraction:

.. code-block:: sh

    # Extract an archive
    safetar extract path/to/archive.tar.gz /var/files/extracted/

    # List archive contents
    safetar list path/to/archive.tar.gz

    # Extract with custom limits
    safetar extract archive.tar /output/ \
        --max-file-size 104857600 \
        --max-total-size 524288000 \
        --max-files 1000

    # Enable recursive extraction
    safetar extract archive.tar /output/ --recursive

    # Show help
    safetar --help

The CLI supports all the same security options as the Python API.

Testing
=======

All tests run inside Docker to prevent accidental pollution of the host system:

.. code-block:: sh

    make test

To test a specific Python version:

.. code-block:: sh

    make test-env ENV=py312

Writing documentation
=====================

Keep the following hierarchy:

.. code-block:: text

    =====
    title
    =====

    header
    ======

    sub-header
    ----------

    sub-sub-header
    ~~~~~~~~~~~~~~

    sub-sub-sub-header
    ^^^^^^^^^^^^^^^^^^

    sub-sub-sub-sub-header
    ++++++++++++++++++++++

    sub-sub-sub-sub-sub-header
    **************************

License
=======

MIT

Support
=======
For security issues contact me at the e-mail given in the `Author`_ section.

For overall issues, go
to `GitHub <https://github.com/barseghyanartur/safetar/issues>`_.

Author
======

Artur Barseghyan <artur.barseghyan@gmail.com>
