Metadata-Version: 2.3
Name: xleda
Version: 0.8.181
Summary: xleda is a Microsoft Excel powered EDA tool for Python data
Author: InfoDesigner
Author-email: InfoDesigner <101991676+InfoDesigner@users.noreply.github.com>
License: Apache-2.0
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Programming Language :: Python :: 3.14
Classifier: Development Status :: 4 - Beta
Classifier: Environment :: Console
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: End Users/Desktop
Classifier: Intended Audience :: Financial and Insurance Industry
Classifier: Intended Audience :: Healthcare Industry
Classifier: Intended Audience :: Information Technology
Classifier: Intended Audience :: Legal Industry
Classifier: Intended Audience :: Manufacturing
Classifier: Intended Audience :: Science/Research
Classifier: Intended Audience :: System Administrators
Classifier: Intended Audience :: Telecommunications Industry
Classifier: Natural Language :: English
Classifier: Operating System :: Microsoft :: Windows :: Windows 11
Classifier: Operating System :: MacOS :: MacOS X
Classifier: Programming Language :: Python
Classifier: Programming Language :: Visual Basic
Classifier: Topic :: Database :: Front-Ends
Classifier: Topic :: Documentation
Classifier: Topic :: Office/Business :: Financial :: Spreadsheet
Classifier: Topic :: Scientific/Engineering :: Visualization
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Dist: ipywidgets>=7.7.1
Requires-Dist: matplotlib>=3.6.0
Requires-Dist: numpy>=1.26
Requires-Dist: pandas>=2.0
Requires-Dist: rich>=13.0.0
Requires-Dist: seaborn>=0.13.0
Requires-Dist: ipywidgets>=7.7.1
Requires-Dist: matplotlib>=3.6.0
Requires-Dist: numpy>=1.26
Requires-Dist: pandas>=2.0
Requires-Dist: rich>=13.0.0
Requires-Dist: seaborn>=0.13.0
Requires-Dist: xlwings>=0.35.2
Requires-Python: >=3.10
Project-URL: Homepage, https://github.com/InfoDesigner/xleda
Project-URL: Issues, https://github.com/InfoDesigner/xleda/issues
Description-Content-Type: text/markdown

[![License: MIT](https://img.shields.io/badge/license-Apache%20License%202.0-**blue**)](https://www.apache.org/licenses/LICENSE-2.0.txt)
[![PyPI - Version](https://img.shields.io/pypi/v/xleda)](https://pypi.org/project/xleda)
[![PyPI - Python Version](https://img.shields.io/pypi/pyversions/xleda.svg)](https://pypi.org/project/xleda)
[![Downloads](https://static.pepy.tech/badge/xleda)](https://pepy.tech/project/xleda)
[![Buy Me A Coffee](https://img.shields.io/badge/Buy%20Me%20a%20Coffee-ffdd00?&logo=buy-me-a-coffee&logoColor=black)](https://buymeacoffee.com/informationdesigner)


# **xleda is a Microsoft Excel powered EDA tool for Python data.**

* Produces a Microsoft Excel workbook from a pandas dataframe that is highly optimized to both perform and document [the activity of Exploratory Data Analysis](https://www.geeksforgeeks.org/data-analysis/what-is-exploratory-data-analysis/) .

* Visually explore your data, navigate with your keyboard, take field or record notes, create lists of fields/records for editing, round-trip your edits/analysis back into python, share your workbook with other contributors.

* There are some amazing EDA tools for Python.  You shouldn't have to start from scratch to include Microsoft Excel among them.

* xleda provides a good start to a robust EDA.  

* See [some example xleda workbooks](https://github.com/InfoDesigner/xleda/tree/main/examples).<br>

<br>
<p align="center">
	<img src="https://github.com/InfoDesigner/xleda/blob/main/assets/images/top_view.webp?raw=true"  width="700" alt="Example Top View"> 
	<br>
	<em>An xleda workbook made with diamond data.</em>
</p>
<br><br>


## **Requirements/Compatibility**

* Requires Microsoft Excel to create the workbook. 

* It has been tested on Windows though it should also work on MacOS.  

* xleda workbooks should work in anything that reads Microsoft Excel workbooks.<br><br>


## **Quick Start**


```python
from xleda import FieldAnalysis
import seaborn as sns

# < your dataframe goes here >
df = sns.load_dataset("titanic")
 

# Configure xleda
xleda = FieldAnalysis(input_df=df,
                      name="Titanic")

# Create workbook
xleda.create_workbook()


```

<br><br>
## **Basic xleda Components**
#### **Field Metadata**

* Most of the field metadata comes from the built-in pandas features [describe](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.describe.html), [info](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.info.html), and **[quantile](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.quantile.html)**<br><br>

<p align="center">
	<img src="https://github.com/InfoDesigner/xleda/blob/main/assets/images/basic_metadata.webp?raw=true"  width="600" alt="Basic Metadata"> 
	<br>
	<em>Basic Metadata.</em>
</p>


#### **Overview**

* The `Overview` worksheet rotates the field metadata 90 degrees so that you can sort/filter fields by their name, metadata, or notes/definitions/etc. <br>

<p align="center">
	<img src="https://github.com/InfoDesigner/xleda/blob/main/assets/images/overview.webp?raw=true"  width="600" alt="Composition Table"> 
	<br>
	<em>Sorting fields from MLB data by memory usage.</em>
</p>



#### **Per Field Charts**

Two charts are produced for each column in your dataframe.
1. A composition table showing the top 5 values per column and their percentages.
2. A histogram/KDE showing min/max, distribution, and mean

<p align="center">
	<img src="https://github.com/InfoDesigner/xleda/blob/main/assets/images/default_charts.webp?raw=true"  width="500" alt="Default Charts"> 
	<br>
	<em>Default charts for MLB player height.</em>
</p>


#### **Source Data Table**

* A copy of your source data is included as an Excel table so you can visually inspect it, sort/filter it, etc. 

* Includes a way to make lists of individual records.  More on that below.

* Includes a way to round-trip your source data back into Python so that you can use Excel to replace values, delete records/columns/etc.  More on that below.


<p align="center">
	<img src="https://github.com/InfoDesigner/xleda/blob/main/assets/images/source_data.webp?raw=true"  width="600" alt="Source Data"> 
	<br>
	<em>Source Data Table for MLB player data.</em>
</p>

#### **Blanks**

* A pivot table configured to drill down into where blank values are.
* The first 10 fields of the source data are added by default. 

<p align="center">
	<img src="https://github.com/InfoDesigner/xleda/blob/main/assets/images/blanks.webp?raw=true"  width="600" alt="Blanks"> 
	<br>
	<em>Blanks with Titanic survivor data.</em>
</p>


#### **Pivot**

* A bare-bones pivot table ready to be configured.
* The first 10 fields of the source data are added by default.

<p align="center">
	<img src="https://github.com/InfoDesigner/xleda/blob/main/assets/images/pivot.webp?raw=true"  width="600" alt="Pivot"> 
	<br>
	<em>Pivot using Titanic survivor data.</em>
</p>
<br>

## **xleda Configuration**

#### **input_df** | Mandatory

* A pandas dataframe of any size

#### **name** | Mandatory

* Name of the dataset.  Also used for the file name of the created workbook

#### **theme_color** | Optional

* Sets the primary color of the charts and the color of the headings in the workbook to a hex color of your choice.  `theme_color="random"` sets a random theme<br>
* Defaults to a neutral color <br>

```python
# Configure xleda
xleda = FieldAnalysis(input_df=df,
                      name=Theme Example Name,
                      theme_color="#031835")
```
<br>
<p align="center">
	<img src="https://github.com/InfoDesigner/xleda/blob/main/assets/images/theme_colors.webp?raw=true" width="800" alt="Theme Colors">
	<br>
	<em>theme_color affects the workbooks and default charts.</em>
</p>
<br><br>

#### **add_plots** | Optional

* `add_plots={'plotname': Figure, ...}` will add additional worksheets with plots of your choosing. 

	* No styling/sizing of additional plots is performed.
	
	* The example below adds two additional plot worksheets, one from seaborn and another from missingno.  The workbook can be found [here](https://github.com/InfoDesigner/xleda/raw/refs/heads/main/examples/Penguins.xlsm).


```python
from xleda import FieldAnalysis

import matplotlib.pyplot as plt
import seaborn as sns
import missingno as msno
  

# < your dataframe goes here >
df = penguins = sns.load_dataset("penguins")
  

# Style the additional plots | optional
plt.style.use("dark_background")


# Create additional plots
pair_plots = sns.pairplot(df, hue="species").figure
null_matrix = msno.matrix(df).get_figure()


# Resize the null matrix  | optional
null_matrix.set_size_inches(9.35, 4.5)


# Configures xleda
xleda = FieldAnalysis(input_df=df,
                      name="Penguins",
                      theme_color="#4C4C4C",
                      add_plots={'Pair Plots': pair_plots,
                                 'Null Matrix': null_matrix})

# Creates the workbook
xleda.create_workbook()

```


#### **wb_path** | Optional

* Sets the target folder for the xleda workbook. 
* Uses a pathlib Path object.
* Defaults to current working directory<br>
```python
from pathlib import Path

xleda = FieldAnalysis(input_df=df,
                      name="Penguins",
                      wb_path=Path(r"c:\my_target_folder"))

# Creates "Penguins.xlsm" in the "c:\my_target_folder" directory
xleda.create_workbook()

```
<br>

#### **overwrite** | Optional

* Whether to overwrite existing workbooks of the same name. 
* Defaults to False


#### **large_report** | Optional

* Raises the default dataframe size limits of 100,000 rows/100 columns to Excel's limits of 1,000,000 rows and 16,000 columns.  
* The closer your are to this limit, the more RAM and time you'll need to produce a workbook.
* See additional details in the performance section below.
* Defaults to False

#### **no_vba** | Optional

* Creates the workbook as an xlsx file without VBA. 
* Defaults to False

<br>

## **Usage Notes**

#### **Field/Record Lists**


* The `Field Lists` section helps you create lists of the fields in your data.  

	* Anything not marked as `False` will be included in each list.   

	* The `Record List` field added to your source data works the same way except it creates a list of records instead of a list of fields.  More on that below.

	* You can see your lists in the `Compiled Lists` section.  

	* You can rename `Record List` or any `Field List` to `Anything You Want` and the list will be renamed to `anything_you_want`.

	* The Compiled Lists section formats your lists as python lists for easy copy/pasting.

	* You can use `export_analysis()` to get your lists, and other things, back into Python.  See details below.

	* Adding/deleting rows or columns on the Field Analysis worksheet may offset the formulas which compile your lists.   Spot check them before using them if you have.<br>

<br>
<p align="center">
	<img src="https://github.com/InfoDesigner/xleda/blob/main/assets/images/field_lists.webp?raw=true" width="606" alt="Field Lists">
	<br>
	<em>Easily create lists of fields in your data.</em>
</p>
<br><br>



#### **Record List Details**

* Two additional columns are added to your data to support being able to create a list of records for further processing.

	* `Record Hash`:  Uses a built-in pandas feature [hash_pandas_object](https://pandas.pydata.org/docs/reference/api/pandas.util.hash_pandas_object.html) to uniquely identify records.  If two records share all column values they also share a `Record Hash`. 

	* `Record List`:  Used to create a list of `Record Hash` values.<br><br>

#### **Exporting back into Python**

* At some point, you'll likely need to get some of your analysis back into Python.  `export_analysis()` exports your definitions, notes, lists, and more back into Python.<br>
<br>
<p align="center">
	<img src="https://github.com/InfoDesigner/xleda/blob/main/assets/images/completed_field_analysis.webp?raw=true" width="606" alt="Completed Field Analysis">
	<br>
	<em>A completed xleda workbook of Titanic passenger showing definitions, notes, lists, etc.</em>
</p>
<br><br>

##### **Export Details**

* All exported data comes from the Field Analysis worksheet.

* The export dictionary includes:

	* `description`: Dataframe description if you've added one
	* `definitions`: Any field definitions you've added.
	* `notes`: Any field notes you've added
	* `lists`: Any lists showing in the compiled lists section
	* `source_data`: A copy of your unaltered source data that includes `Record Hash`/`Record List` columns.
	* `altered_source_data`: `Source Data Table` from the workbook that includes any manual edits you've made such as removing records, renaming fields, replacing values, etc. **

		 ** *Note that data types will likely change in the round-trip translation.* **<br>

##### **Example Export Code**
```python
from xleda import FieldAnalysis

# After editing your workbook


# If you created your workbook somewhere else, configure xleda before export
xleda = FieldAnalysis(input_df=df,
                      name="Titanic")

# Export notes, definitions, data, etc from an xleda workbook
xleda_dict = xleda.export_analysis()

```

<br>

##### **Example Export Dictionary**

<p align="center">
	<img src="https://github.com/InfoDesigner/xleda/blob/main/assets/images/completed_analysis_export.webp?raw=true" width="500" alt="Export Dict">
	<br>
	<em>An example export from a completed field analysis on Titanic passenger data.</em>
</p>
<br><br>


##### **Exporting Cautions**


###### **Spot Check your Lists**

* Most of the few formulas in an xleda workbook are used to compile your lists. 

* If you've added/deleted rows/columns, double-check that your lists show correctly in the `Compiled Lists` section  before using them downstream.

###### **Round-Tripping Source Data Edits**

* `altered_source_data` will likely change the field data types vs what was in the original dataframe.

* The Source Data table shouldn't be renamed in Excel.  Any other edits such as deleting rows/columns, editing values, etc. are fine.


## **Performance**

xleda creates workbooks for most data sets less than 20 seconds.   
<br>
<p align="center">
	<img src="https://github.com/InfoDesigner/xleda/blob/main/assets/images/create_example.webp?raw=true" width="400" alt="Create Example">
	<br>
	<em>The Penguin example from above that includes extra plots took only 7 seconds to create.</em>
</p>
<br><br>

#### **Limits with Large Data Sets**

* To ensure workbooks are created quickly, defaults limit data to the first 100 columns and a random sample of 100,000 records.  You'll see a warning banner if you hit a limit.

<p align="center">
	<img src="https://github.com/InfoDesigner/xleda/blob/main/assets/images/warning.webp?raw=true" width="800" alt="Create Example">
</p><br>

* One of the more complex data sets tested was a 600 column/1,200 row dataframe.
  
	* It took ~5 minutes to create, in part because most values are unique for all 600 columns and xleda give you a top 5 members composition chart per column.

	* It is still snappy to use even though it has 1,200 charts on a single worksheet.  That example is [here](https://github.com/InfoDesigner/xleda/raw/refs/heads/main/examples/African%20Soil.xlsm).<br><br>



## **VBA Code**

* There is a small amount of VBA code in the template that makes the sections expand/collapse when you select them as pictured above.  

* If you can't or don't want to enable VBA, you can use row groupings to navigate as pictured below. 

* In addition, you may want to create the report with `no_vba=True` which creates an  xlsx file that contains no VBA.<br>

<br>
<p align="center">
	<img src="https://github.com/InfoDesigner/xleda/blob/main/assets/images/row_groupings.webp?raw=true" width="462" alt="Row Groupings">
	<br>
	<em>Use row groupings to navigate if you can't use VBA.</em>
</p>
<br><br>

## **Extensible**

* Because it's an ordinary workbook, you can use any tool that works with Microsoft Excel workbooks to do more.  [xlwings](https://www.xlwings.org/) is recommended if you need more. <br><br>



## **Troubleshooting**

* If your workbook appears with "Document Recovery", it's probably fine.  Neither a root cause or a consequence for ignoring this has been identified.
