Metadata-Version: 2.4
Name: nested-pandas
Version: 0.6.8
Summary: An extension of pandas for efficient representation of nested associated datasets.
Author-email: LINCC Frameworks <brantd@uw.edu>
License-Expression: MIT
Project-URL: Source Code, https://github.com/lincc-frameworks/nested-pandas
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: numpy>=2
Requires-Dist: pandas<2.4,>=2.2.3
Requires-Dist: pyarrow>=16
Requires-Dist: Deprecated>=1.2.0
Requires-Dist: wrapt>=1.12.1
Requires-Dist: fsspec!=2025.12,!=2026.1,>=2025.7.0
Requires-Dist: universal_pathlib>=0.3.1
Provides-Extra: dev
Requires-Dist: asv[virtualenv]==0.6.5; extra == "dev"
Requires-Dist: jupyter; extra == "dev"
Requires-Dist: mypy; extra == "dev"
Requires-Dist: pre-commit; extra == "dev"
Requires-Dist: pytest; extra == "dev"
Requires-Dist: pytest-cov; extra == "dev"
Requires-Dist: ruff; extra == "dev"
Requires-Dist: aiohttp; extra == "dev"
Requires-Dist: requests; extra == "dev"
Requires-Dist: s3fs; extra == "dev"
Requires-Dist: types-Deprecated; extra == "dev"
Dynamic: license-file

# nested-pandas

[![Template](https://img.shields.io/badge/Template-LINCC%20Frameworks%20Python%20Project%20Template-brightgreen)](https://lincc-ppt.readthedocs.io/en/latest/)

[![PyPI](https://img.shields.io/pypi/v/nested-pandas?color=blue&logo=pypi&logoColor=white)](https://pypi.org/project/nested-pandas/)
[![Conda](https://img.shields.io/conda/vn/conda-forge/nested-pandas.svg?color=blue&logo=condaforge&logoColor=white)](https://anaconda.org/conda-forge/nested-pandas)

[![GitHub Workflow Status](https://img.shields.io/github/actions/workflow/status/lincc-frameworks/nested-pandas/smoke-test.yml)](https://github.com/lincc-frameworks/nested-pandas/actions/workflows/smoke-test.yml)
[![codecov](https://codecov.io/gh/lincc-frameworks/nested-pandas/branch/main/graph/badge.svg)](https://codecov.io/gh/lincc-frameworks/nested-pandas)
[![Read the Docs](https://img.shields.io/readthedocs/nested-pandas)](https://nested-pandas.readthedocs.io/)
[![benchmarks](https://img.shields.io/github/actions/workflow/status/lincc-frameworks/nested-pandas/asv-main.yml?label=benchmarks)](https://lincc-frameworks.github.io/nested-pandas/)

An extension of pandas for efficient representation of nested
associated datasets.

Nested-Pandas extends the [pandas](https://pandas.pydata.org/) package with 
tooling and support for nested dataframes packed into values of top-level 
dataframe columns. [Pyarrow](https://arrow.apache.org/docs/python/index.html) 
is used internally to aid in scalability and performance.

Nested-Pandas allows data like this:

<p align="left">
    <img src="https://github.com/lincc-frameworks/nested-pandas/raw/refs/heads/main/docs/intro_images/pandas_dfs.png" alt="pandas dataframes" width="400"/>
</p>

To instead be represented like this:

<p align="left">
    <img src="https://github.com/lincc-frameworks/nested-pandas/raw/refs/heads/main/docs/intro_images/nestedframe_example.png" alt="nestedframe" width="300"/>
</p>

Where the nested data is represented as nested dataframes:

```python
   # Each row of "object_nf" now has it's own sub-dataframe of matched rows from "source_df"
   object_nf.loc[0]["nested_sources"]
```

<p align="left">
    <img src="https://github.com/lincc-frameworks/nested-pandas/raw/refs/heads/main/docs/intro_images/loc_into_nested.png" alt="sub-dataframe" width="225"/>
</p>

Allowing powerful and straightforward operations, like:

```python
   # Compute the mean flux for each row of "object_nf"
   import numpy as np

   def mean_flux(row):
   """Calculates the mean flux for each object"""
       return np.mean(row["nested_sources.flux"])

   object_nf.map_rows(mean_flux, output_names="mean_flux")
```

<p align="left">
    <img src="https://github.com/lincc-frameworks/nested-pandas/raw/refs/heads/main/docs/intro_images/reduce.png" alt="using reduce" width="150"/>
</p>

Nested-Pandas is motivated by time-domain astronomy use cases, where we see
typically two levels of information, information about astronomical objects and
then an associated set of `N` measurements of those objects. Nested-Pandas offers
a performant and memory-efficient package for working with these types of datasets. 

Core advantages being:
* hierarchical column access
* efficient packing of nested information into inputs to custom user functions
* avoiding costly groupby operations



This is a LINCC Frameworks project - find more information about LINCC Frameworks [here](https://lsstdiscoveryalliance.org/programs/lincc-frameworks/).



## Acknowledgements

This project is supported by Schmidt Sciences.
