dfdiff - A pandas.DataFrame differencing tool

Purpose:

This module provides DataFrame differencing logic.

The caller creates an instance which accepts the two DataFrames to be compared as the arguments. When the diff() method is called, a list of columns containing value mismatches is compiled. Then, the list of column mismatches is iterated with each value in the column being compared. All value mismatches are reported to the terminal.

Platform:

Linux/Windows | Python 3.7+

Developer:

J Berendt

Email:

support@s3dev.uk

Note:

It’s worth noting that current functionality does not check data types, unlike the pandas pd.DataFrame.equals() method. This functionality may be added in a future release.

Example:

Short example for differencing two DataFrames:

>>> from utils4 import dfdiff

>>> d = dfdiff.DataFrameDiff(df_source, df_test)
>>> d.diff()
class dfdiff.DataFrameDiff(df_source: pandas.DataFrame, df_test: pandas.DataFrame)[source]

Test and report differences in two pandas DataFrames.

Parameters:
  • df_source (pd.DataFrame) – DataFrame containing source data. This dataset holds the expected results.

  • df_test (pd.DataFrame) – DataFrame containing the test data. This dataset is compared against the ‘expected’ dataset.

__init__(df_source: pandas.DataFrame, df_test: pandas.DataFrame)[source]

DataFrame difference class initialiser.

diff()[source]

Compare DataFrames and report the differences.