{% extends base %} {% block title %}pidge web ui{% endblock %} {% block preamble %} {% endblock %} {% block postamble %} {% endblock %} {% block contents %}

pidge UI

{{ embed(roots.DASHBOARD) }}

What does the UI show?

This is the pidge UI with example data that takes the table found under Input Data and maps it to the table found under Mapped Data . In this example the column recipient is mapped to the column expense_category . A few mappings are already part of the example data and new mappings can be added via the UI. For details regrding the mapping logic see here .

What is pidge?

pidge is an open-source python package that helps with the creation of mappings for tabular string data. The main goal is to leverage a UI to make make it easy to create mappings, evaluate how much data is already mapped. This should speed up workflows that otherwise involve, updating a mapping, re-applying the updated mapping and re-evaluating the result.

Use cases for pidge are primarily data cleaning and categorization. It consists out of the following two parts.

  1. An interactive UI to help with the creation of mappings and assessing their completeness
  2. Library functionality to apply mappings inside of data pipelines, after they have been exported from the UI

This website embeds the UI component of pidge together with information about the project. The UI can of course also be run locally when using the project, but the hope is that this website illustrates pidge well enough to reach users for which it might be potentially useful. Typical usage would involve running the UI inside a jupyter notebook, while performing EDA, data cleaning or labelling.

Help with the MVP

The project itself is still very young and this is considered an MVP phase. In particular this means that any feedback by potentially interested users is very much appreciated. The goal is to create a too that is simple, yet useful to data scientists, data analysts and similar professions. If you have any feedback, please leave an issue on github . Feedback on the following aspects will be particularly helpfull:

  • Bugs
  • Feature requests
  • Feedback on the UI
  • Information about similar projects
  • Feedback on documentation

Example Data

When the UI is used locally it does not start up with any data. To give interested persons a quicker feeling for what the tool is capable of, this hosted version already loads with test data and a few test rules. The data is fictitious, but inspired expenses that I had to clean as a real task. There are manly two types of recipients. Shops and fast food chains. Mapping goals could be to label all the fast food, or to label all the supermarkets. It might also be a possible goal to distinguish between the supermarkets from a particular chain, that doe have different recipient fields depending on the branch.

UI Usage

The UI can be used as follows:

  • Add a category and a pattern into the input field and click on insert rule to add the rule
  • If the category already exists the pattern is appended, i.e. it is used in addition to the already existing patterns.
  • In the config tab the source and target column can be adjusted
  • In the config tab other data can be loaded in the form of a csv that as long as it can be parsed with standard pandas.read_csv settings
  • Reset rules deletes all currently stored rules
  • The download button can be used to download the mapping json file for all the rules that where already added.
{% endblock %}