Metadata-Version: 2.4
Name: pytest-data-loader
Version: 0.8.0
Summary: Pytest plugin for loading test data for data-driven testing (DDT)
Author: Yugo Kato
Project-URL: Homepage, https://github.com/yugokato/pytest-data-loader
Keywords: pytest,data-driven testing,DDT
Classifier: Development Status :: 4 - Beta
Classifier: Framework :: Pytest
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3 :: Only
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Programming Language :: Python :: 3.14
Classifier: Topic :: Software Development :: Quality Assurance
Classifier: Topic :: Software Development :: Testing
Requires-Python: >=3.10
Description-Content-Type: text/markdown
Requires-Dist: pytest<10,>=7.0.0
Requires-Dist: pluggy>=1.2.0
Requires-Dist: typing_extensions>=4.1.0; python_version < "3.11"
Provides-Extra: lint
Requires-Dist: mypy<2,>=1.15.0; extra == "lint"
Requires-Dist: pre-commit<5,>=3.0.0; extra == "lint"
Requires-Dist: ruff<0.16.0,>=0.15.0; extra == "lint"
Provides-Extra: test
Requires-Dist: pytest-mock<4,>=3.0.0; extra == "test"
Requires-Dist: pytest-smoke; extra == "test"
Requires-Dist: pytest-xdist[psutil]<4,>=2.3.0; extra == "test"
Requires-Dist: pandas>=2.0.0; extra == "test"
Requires-Dist: pypdf>=6.1.0; extra == "test"
Requires-Dist: pyyaml>=6.0.0; extra == "test"
Requires-Dist: tomli>=2.3.0; python_version < "3.11" and extra == "test"
Provides-Extra: dev
Requires-Dist: tox<5,>=4.0.0; extra == "dev"
Requires-Dist: tox-uv<2,>=1.0.0; extra == "dev"
Requires-Dist: pytest-data-loader[lint,test]; extra == "dev"

pytest-data-loader
======================

[![PyPI](https://img.shields.io/pypi/v/pytest-data-loader)](https://pypi.org/project/pytest-data-loader/)
[![Supported Python
versions](https://img.shields.io/pypi/pyversions/pytest-data-loader.svg)](https://pypi.org/project/pytest-data-loader/)
[![test](https://github.com/yugokato/pytest-data-loader/actions/workflows/test.yml/badge.svg?branch=main)](https://github.com/yugokato/pytest-data-loader/actions/workflows/test.yml?query=branch%3Amain)
[![Code style ruff](https://img.shields.io/badge/code%20style-ruff-000000.svg)](https://docs.astral.sh/ruff/)

`pytest-data-loader` is a `pytest` plugin that simplifies data-driven testing. It lets you load, transform, and 
parametrize test data directly from files and directories using simple decorators.



## Installation

```bash
pip install pytest-data-loader
```



## Quick Start

Load test data from a file and inject it directly into your test function.

```python
from pytest_data_loader import load


@load("data", "example.json")
def test_example(data):
    """
    example.json: '{"foo": 1, "bar": 2}'
    """
    assert "foo" in data
```



## Usage

The plugin provides three data loaders — `@load`, `@parametrize`, and `@parametrize_dir` — available as decorators for 
loading test data.

- `@load`: Loads file content into a test
- `@parametrize`: Load a file and parametrize a test by splitting its content
- `@parametrize_dir`: Load files from a directory and parametrize a test for each file

Each data loader requires two positional arguments:
- `fixture_names`: Names of the fixtures injected into the test function
  - Single name: Injects the file data
  - Two names: Injects both the resolved file path and the file data
- `path`: An absolute path or a path relative to a data directory
  - When a relative path is given, the plugin searches upward from the test file toward the pytest root to find the 
nearest data directory named `data` containing the target file or directory
  - For `@parametrize` and `@parametrize_dir`, this can be a list of paths to aggregate data from multiple sources

> [!TIP]
> - The default data directory name can be customized using an INI option. See the [INI Options](#ini-options) section for details
> - Each data loader supports different optional keyword arguments to customize how the data is loaded. See the 
> [Data Loading Pipeline](#data-loading-pipeline) and [Loader Options](#loader-options) sections for details
> - Each data loder can be stacked on a test function. See the [Stacking Data Loader](#stacking-data-loaders) section for details



## Examples

Given you have the following project structure:
```
.(pytest rootdir)
├── data/               # outer data directory
│   ├── data1.json
│   ├── data2.txt
│   └── images/
│       ├── image.gif
│       ├── image.jpg
│       └── image.png
├── tests1/
│   └── test_something.py
└── tests2/
    ├── data/           # inner data directory
    │   ├── data1.txt
    │   ├── data2.txt
    │   └── logos/
    │       ├── logo.jpg
    │       └── logo.png
    └── test_something_else.py
```

### 1. Load file data — `@load`
`@load` is a file loader that loads the file content and passes it to the test function.

```python
# test_something.py

from pytest_data_loader import load


@load("data", "data1.json")
def test_something1(data):
    """
    data1.json: '{"foo": 1, "bar": 2}'
    """
    assert data == {"foo": 1, "bar": 2}


@load(("file_path", "data"), "data2.txt")
def test_something2(file_path, data):
    """
    data2.txt: "line1\nline2\nline3"
    """
    assert file_path.name == "data2.txt"
    assert data == "line1\nline2\nline3"
```

```shell
$ pytest tests1/test_something.py -v
================================ test session starts =================================
<snip>
collected 2 items                                                                              

tests1/test_something.py::test_something1[data1.json] PASSED                    [ 50%]
tests1/test_something.py::test_something2[data2.txt] PASSED                     [100%]

================================= 2 passed in 0.01s ==================================
```

> [!NOTE]
> If both `./tests1/test_something.py` and `./tests2/test_something_else.py` happen to have the above same loader 
> definitions, the first test function will load `./data/data1.json` for both test files, and the second test function 
> will load `data2.txt` from each test file's **nearest** `data` directory. This ensures that each test file loads data 
> from its nearest data directory.  
> This behavior applies to all loaders.


### 2. Parametrize file data — `@parametrize`
`@parametrize` is a file loader that dynamically parametrizes the decorated test function by splitting the loaded file
content into logical parts. Each part is passed to the test function as a separate parameter.

```python
# test_something.py

from pytest_data_loader import parametrize


@parametrize("data", "data1.json")
def test_something1(data):
    """
    data1.json: '{"foo": 1, "bar": 2}'
    """
    assert data in [("foo", 1), ("bar", 2)]


@parametrize(("file_path", "data"), "data2.txt")
def test_something2(file_path, data):
    """
    data2.txt: "line1\nline2\nline3"
    """
    assert file_path.name == "data2.txt"
    assert data in ["line1", "line2", "line3"]
```

```shell
$ pytest tests1/test_something.py -v
================================ test session starts =================================
<snip>
collected 5 items                                                                              

tests1/test_something.py::test_something1[data1.json:part1] PASSED              [ 20%]
tests1/test_something.py::test_something1[data1.json:part2] PASSED              [ 40%]
tests1/test_something.py::test_something2[data2.txt:part1] PASSED               [ 60%]
tests1/test_something.py::test_something2[data2.txt:part2] PASSED               [ 80%]
tests1/test_something.py::test_something2[data2.txt:part3] PASSED               [100%]

================================= 5 passed in 0.01s ==================================
```

> [!TIP]
> - You can apply your own logic by specifying the `parametrizer_func` loader option
> - By default, the plugin will apply the following logic for splitting file content:
>   - Text file: Each line in the file
>   - JSON file:
>     - object: Each key–value pair in the object
>     - array: Each item in the array
>     - other types (string, number, boolean, null): The whole content as single data
>   - JSONL file: Each line (parsed as JSON)
>   - Binary file: Unsupported by default. You must provide a custom split logic as the `parametrizer_func` loader option


#### Parametrize from multiple files

You can pass a list of file paths to `@parametrize` to load and concatenate data from multiple files into a single
parameter list:

```python
# test_something_else.py

from pytest_data_loader import parametrize


@parametrize("data", ["data1.txt", "data2.txt"])
def test_something(data):
    """
    data1.txt: "line1\nline2"
    data2.txt: "line3\nline4"
    """
    assert data in ["line1", "line2", "line3", "line4"]
```

```shell
$ pytest tests2/test_something_else.py -v
================================ test session starts =================================
<snip>
collected 4 items

tests2/test_something_else.py::test_something[data1.txt:part1] PASSED           [ 25%]
tests2/test_something_else.py::test_something[data1.txt:part2] PASSED           [ 50%]
tests2/test_something_else.py::test_something[data2.txt:part1] PASSED           [ 75%]
tests2/test_something_else.py::test_something[data2.txt:part2] PASSED           [100%]

================================= 4 passed in 0.01s ==================================
```


### 3. Parametrize files in a directory — `@parametrize_dir`

`@parametrize_dir` is a directory loader that dynamically parametrizes the decorated test function with the contents
of files in the specified directory. Each file's content is passed to the test function as a separate parameter.

```python
# test_something.py

from pytest_data_loader import parametrize_dir


@parametrize_dir("data", "images")
def test_something(data):
    """
    images dir: contains 3 image files
    """
    assert isinstance(data, bytes)
```

```shell
$ pytest tests1/test_something.py -v
================================ test session starts =================================
<snip>
collected 3 items                                                                              

tests1/test_something.py::test_something[images/image.gif] PASSED               [ 33%]
tests1/test_something.py::test_something[images/image.jpg] PASSED               [ 66%]
tests1/test_something.py::test_something[images/image.png] PASSED               [100%]

================================= 3 passed in 0.01s ==================================
```

> [!NOTE]
> - File names starting with a dot (.) are considered hidden files regardless of your platform.
> These files are automatically excluded from the parametrization.
> - Specify `recursive=True` to include files in subdirectories


#### Parametrize files from multiple directories

You can pass a list of directory paths to `@parametrize_dir` to collect and concatenate files from multiple
directories into a single parameter list:

```python
# test_something_else.py

from pytest_data_loader import parametrize_dir


@parametrize_dir("data", ["images", "logos"])
def test_something(data):
    """
    images dir: contains 3 image files
    logos dir: contains 2 logo files
    """
    assert isinstance(data, bytes)
```

```shell
$ pytest tests2/test_something_else.py -v
================================ test session starts =================================
<snip>
collected 5 items

tests2/test_something_else.py::test_something[images/image.gif] PASSED          [ 20%]
tests2/test_something_else.py::test_something[images/image.jpg] PASSED          [ 40%]
tests2/test_something_else.py::test_something[images/image.png] PASSED          [ 60%]
tests2/test_something_else.py::test_something[logos/logo.jpg] PASSED            [ 80%]
tests2/test_something_else.py::test_something[logos/logo.png] PASSED            [100%]

================================= 5 passed in 0.01s ==================================
```



## Stacking Data Loaders

All three data loaders — `@load`, `@parametrize`, and `@parametrize_dir` — can be stacked on a single test function. 
This allows you to declaratively compose complex, data-driven test scenarios while keeping test logic fully decoupled 
from data.

### Examples:

#### 1. Load multiple datasets
Stack multiple `@load` to inject independent datasets into a single test.

```python
from pytest_data_loader import load


@load("input_data", "input.json")
@load("expected_output", "expected.json")
def test_transformation_matches_expected_output(input_data, expected_output):
    """Verify that transforming input data produces the expected output."""
    assert do_something(input_data) == expected_output
```

#### 2. Generate a Cartesian product of test cases
Stack multiple `@parametrize` to automatically test all combinations.

```python
from pytest_data_loader import parametrize


@parametrize("user", "users.txt")
@parametrize("feature", "features.txt")
def test_user_feature_access_matrix(user, feature):
    """Validate access control for every user-feature combination."""
    assert can_access(user, feature)
```

#### 3. Combine shared context with parametrized inputs
Stack `@load` with `@parametrize` to test variable inputs with shared context.

```python
from pytest_data_loader import load, parametrize


@load("prices", "prices.json")
@parametrize("order", "orders.json")
def test_order_total_matches_expected(prices, order):
    """Validate that each order total is calculated correctly using the shared price catalog."""
    total = calculate_total(order, prices)
    assert total == order["expected_total"]
```

#### 4. Combine shared context with directory-based test scenarios
Stack `@load` with `@parametrize_dir` to test structured test cases with shared context.

```python
from pytest_data_loader import load, parametrize_dir


@load("banned_words", "banned_words.txt")
@parametrize_dir("comment", "user_comments/flagged")    # Each comment data is stored as a .txt file
def test_flagged_comments_contain_banned_words(banned_words, comment):
    """Validate that flagged comments contain at least one banned word."""
    assert any(word in comment.lower() for word in banned_words)
```

> [!NOTE]
> - Fixture names must be unique across all stacked loaders on a test function
> - Stacking multiple `@parametrize` and/or `@parametrize_dir` decorators generates a Cartesian product of N × M test 
> cases (same behavior as `pytest.mark.parametrize`)
> - Files are loaded once per test function and cached across parametrized test cases

> [!TIP]
> When stacking data loaders, test IDs generated with the default parameter IDs may become less readable. Consider 
> explicitly specifying parameter IDs using the `id` option (`@load`) or the `id_func` option (`@parametrize`/`@parametrize_dir`).




## Lazy Loading

Lazy loading is enabled by default for all data loaders to improve efficiency, especially with large datasets. During 
test collection, pytest receives a lazy object as a test parameter instead of the actual data. The data is resolved 
only when it is needed during test setup.  
If you need to disable this behavior for a specific test, pass `lazy_loading=False` to the data loader.

> [!NOTE]
> Lazy loading for the `@parametrize` loader works slightly differently from other loaders. Since Pytest needs to know 
> the total number of parameters in advance, the plugin still needs to inspect the file data and split it once during 
> test collection phase. But once it's done, the split data will not be kept as parameter values and will be loaded 
> lazily later.



## Data Loading Pipeline
Each data loader follows a simple pipeline where you can use loader options to hook into stages and filter or 
transform data before it reaches your test.

### @load
```text
file 
  → open                 # with read options
  → read and parse       # with file_reader()
  → transform            # with onload_func()
  → test(data)
```

### @parametrize
```text
file 
  → open                 # with read options 
  → read and parse       # with file_reader() 
  → transform            # with onload_func()
  → split                # with default or custom parametrizer_func()
    ↳ for each part:
      → filter           # with filter_func()
      → transform        # with process_func()
  → test(data₁, data₂, ...)
```

### @parametrize_dir
```text
directory 
  → collect files 
    ↳ for each file:
      → filter           # with filter_func()
      → open             # with read options
      → read and parse   # with file_reader_func()
      → transform        # with process_func()
  → test(file₁, file₂, ...)
```



## File Reader

### Built-in defaults

By default, the plugin reads and parses file content on loading as follows:
- `.json` — Parsed with `json.load`
- `.jsonl` — Each line is parsed as JSON
- All other file types — Loads as raw text or binary content

### Customizing defaults

The above default behavior can be customized by specifying any file reader that accepts a file-like object returned by 
`open()`. This includes built-in readers, third-party library readers, and your own custom readers. File read 
options (e.g., `mode`, `encoding`, etc.) can also be provided and will be passed to `open()`.

Below are some common examples of file readers you might use:

| File type | Examples                                          | Notes                                            |
|-----------|---------------------------------------------------|--------------------------------------------------|
| .csv      | `csv.reader`, `csv.DictReader`, `pandas.read_csv` | `pandas.read_csv` requires `pandas`              |
| .yml      | `yaml.safe_load`, `yaml.safe_load_all`            | Requires `PyYAML`                                |
| .xml      | `xml.etree.ElementTree.parse`                     |                                                  |
| .toml     | `tomllib.load`                                    | `tomli.load` for Python <3.11 (Requires `tomli`) |
| .ini      | `configparser.ConfigParser().read_file`           |                                                  |
| .pdf      | `pypdf.PdfReader`                                 | Requires `pypdf`                                 |

This can be done either as a `conftest.py` level registration or as a test-level configuration. If both are done, the
test level configuration takes precedence over `conftest.py` level registration.
If multiple `conftest.py` files register a reader for the same file extension, the closest one from the current test
becomes effective.

Here are some examples of loading a CSV file using the built-in CSV readers with file read options:

### 1. `conftest.py` level registration

Register a file reader using `pytest_data_loader.register_reader()`. It takes a file extension and a file reader as 
positional arguments, and file read options as keyword arguments.

```python
# conftest.py

import csv

import pytest_data_loader


pytest_data_loader.register_reader(".csv", csv.reader, newline="")
```

The registered file reader automatically applies to all tests located in the same directory and any of its subdirectories.

```python
# test_something.py

from pytest_data_loader import load


@load("data", "data.csv")
def test_something(data):
    """Load CSV file with registered file reader"""
    for row in data:
        assert isinstance(row, list)
```


### 2. Per-test configuration with loader options

Specify a file reader with the `file_reader` loader option. This applies only to the configured test, and overrides the 
one registered in `conftest.py`. 

```python
# test_something.py

import csv

from pytest_data_loader import load, parametrize


@load("data", "data.csv", file_reader=csv.reader, encoding="utf-8-sig", newline="")
def test_something1(data):
    """Load CSV file with csv.reader reader"""
    for row in data:
        assert isinstance(row, list)


@parametrize("data", "data.csv", file_reader=csv.DictReader, encoding="utf-8-sig", newline="")
def test_something2(data):
    """Parametrize CSV file data with csv.DictReader reader"""
    assert isinstance(data, dict)
```

> [!NOTE]
> If read options are specified without a `file_reader`, the plugin uses the `conftest.py`-registered reader (if any)
> with those options. If a `file_reader` is specified without read options, no read options are applied.

> [!TIP]
> - A file reader must take one argument (a file-like object returned by `open()`)
> - If you need to pass options to the file reader, use `lambda` function or a regular function.  
> eg. `file_reader=lambda f: csv.reader(f, delimiter=";")`
> - You can adjust the final data the test function receives using loader functions. For example, 
> the following code will parametrize the test with the text data from each PDF page   
>  ```python
>  @parametrize(
>      "data", 
>      "test.pdf", 
>      file_reader=pypdf.PdfReader, 
>      parametrizer_func=lambda r: r.pages,
>      process_func=lambda p: p.extract_text().rstrip(),
>      mode="rb"
>  )
>  def test_something(data: str):
>      ...
>  ```



## Loader Options

Each data loader supports different optional parameters you can use to change how your data is loaded.

### @load
- `lazy_loading`: Enable or disable lazy loading
- `file_reader`: A file reader the plugin should use to read the file data
- `onload_func`: A function to transform or preprocess loaded data before passing it to the test function
- `id`: The parameter ID for the loaded data. If not specified, the relative or absolute file path is used
- `**read_options`: File read options the plugin passes to `open()`. Supports only `mode`, `encoding`, `errors`, and 
`newline` options

> [!NOTE]
> `onload_func` must take either one (data) or two (file path, data) arguments. When `file_reader` is provided, the 
data is the reader object itself.


### @parametrize
- `lazy_loading`: Enable or disable lazy loading
- `file_reader`: A file reader the plugin should use to read the file data
- `onload_func`: A function to adjust the shape of the loaded data before splitting into parts
- `parametrizer_func`: A function to customize how the loaded data should be split
- `filter_func`: A function to filter the split data parts. Only matching parts are included as test parameters
- `process_func`: A function to adjust the shape of each split data before passing it to the test function
- `marker_func`: A function to apply Pytest marks to matching part data
- `id_func`: A function to generate a parameter ID for each part data
- `**read_options`: File read options the plugin passes to `open()`. Supports only `mode`, `encoding`, `errors`, 
and `newline` options

> [!NOTE]
> Each loader function must take either one (data) or two (file path, data) arguments. When `file_reader` is provided, 
the data is the reader object itself.


### @parametrize_dir
- `lazy_loading`: Enable or disable lazy loading
- `recursive`: Recursively load files from all subdirectories of the given directory. Defaults to `False`
- `file_reader_func`: A function to specify file readers to matching file paths
- `filter_func`: A function to filter file paths. Only the contents of matching file paths are included as the test 
parameters
- `process_func`: A function to adjust the shape of each loaded file's data before passing it to the test function
- `marker_func`: A function to apply Pytest marks to matching file paths
- `id_func`: A function to generate a parameter ID from each file path
- `read_option_func`: A function that returns file read options (as a dict) for matching file paths. The returned dict 
may contain only `mode`, `encoding`, `errors`, and `newline` keys, which are passed to `open()`

> [!NOTE]
> - `process_func` must take either one (data) or two (file path, data) arguments
> - `file_reader_func`, `filter_func`, `marker_func`, `id_func`, and `read_option_func` must take only one argument (file path)



## INI Options

### `data_loader_dir_name`
The base directory name to load test data from. When a relative file or directory path is provided to a data loader, 
it is resolved relative to the nearest matching data directory in the directory tree.  
Plugin default: `data`

### `data_loader_root_dir`
Absolute or relative path to the project's root directory. By default, the search is limited to 
within pytest's rootdir, which may differ from the project's top-level directory. Setting this option allows data 
directories located outside pytest's rootdir to be found. 
Environment variables are supported using the `${VAR}` or `$VAR` (or `%VAR%` on Windows) syntax.
Plugin default: Pytest rootdir (`config.rootpath`)

### `data_loader_strip_trailing_whitespace`
Automatically remove trailing whitespace characters when loading text data.  
Plugin default: `true`
