Metadata-Version: 2.4
Name: pandasmore
Version: 0.0.7
Summary: Extends pandas with common functions used in finance and economics research
Home-page: https://github.com/ionmihai/pandasmore
Author: ionmihai
Author-email: mihaiion@email.arizona.edu
License: Apache Software License 2.0
Keywords: nbdev jupyter notebook python
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Natural Language :: English
Classifier: Programming Language :: Python :: 3.7
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: License :: OSI Approved :: Apache Software License
Requires-Python: >=3.7
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: fastcore
Requires-Dist: pandas
Provides-Extra: dev
Dynamic: author
Dynamic: author-email
Dynamic: classifier
Dynamic: description
Dynamic: description-content-type
Dynamic: home-page
Dynamic: keywords
Dynamic: license
Dynamic: license-file
Dynamic: provides-extra
Dynamic: requires-dist
Dynamic: requires-python
Dynamic: summary

# pandasmore

<!-- WARNING: THIS FILE WAS AUTOGENERATED! DO NOT EDIT! -->

The full documentation site is
[here](https://ionmihai.github.io/pandasmore/), and the GitHub page is
[here](https://github.com/ionmihai/pandasmore).

Here is a short description of some of the main functions (more details
below and in the
[documentation](https://ionmihai.github.io/pandasmore/core.html)):

- [`setup_tseries`](https://ionmihai.github.io/pandasmore/core.html#setup_tseries):
  cleans up dates and sets them as the index
- [`setup_panel`](https://ionmihai.github.io/pandasmore/core.html#setup_panel):
  cleans up dates and panel id’s and sets them as the index (panel id,
  period date)
- [`lag`](https://ionmihai.github.io/pandasmore/core.html#lag): robust
  lagging that accounts for panel structure, unsorted or duplicate
  dates, or gaps in the time-series

## Install

``` sh
pip install pandasmore
```

## How to use

First, we set up an example dataset to showcase the functions in this
module.

``` python
import pandas as pd
import numpy as np
import pandasmore as pdm
```

``` python
raw = pd.DataFrame(np.random.rand(15,2), 
                    columns=list('AB'), 
                    index=pd.MultiIndex.from_product(
                        [[1,2, np.nan],[np.nan,'2010-01','2010-02','2010-02','2010-04']],
                        names = ['firm_id','date'])
                      ).reset_index()
raw
```

<div>


|     | firm_id | date    | A        | B        |
|-----|---------|---------|----------|----------|
| 0   | 1.0     | NaN     | 0.249370 | 0.926335 |
| 1   | 1.0     | 2010-01 | 0.282501 | 0.513859 |
| 2   | 1.0     | 2010-02 | 0.804278 | 0.307171 |
| 3   | 1.0     | 2010-02 | 0.828895 | 0.746789 |
| 4   | 1.0     | 2010-04 | 0.569099 | 0.331814 |
| 5   | 2.0     | NaN     | 0.533977 | 0.823457 |
| 6   | 2.0     | 2010-01 | 0.207558 | 0.401378 |
| 7   | 2.0     | 2010-02 | 0.086001 | 0.959371 |
| 8   | 2.0     | 2010-02 | 0.054230 | 0.993980 |
| 9   | 2.0     | 2010-04 | 0.062525 | 0.200272 |
| 10  | NaN     | NaN     | 0.091012 | 0.635409 |
| 11  | NaN     | 2010-01 | 0.866369 | 0.972394 |
| 12  | NaN     | 2010-02 | 0.432087 | 0.837597 |
| 13  | NaN     | 2010-02 | 0.878219 | 0.148009 |
| 14  | NaN     | 2010-04 | 0.820386 | 0.834821 |

</div>

``` python
df = pdm.setup_tseries(raw.query('firm_id==1'),
                        time_var='date', time_var_format="%Y-%m",
                        freq='M')
df
```

<div>


|         | date    | dtdate     | firm_id | A        | B        |
|---------|---------|------------|---------|----------|----------|
| Mdate   |         |            |         |          |          |
| 2010-01 | 2010-01 | 2010-01-01 | 1.0     | 0.282501 | 0.513859 |
| 2010-02 | 2010-02 | 2010-02-01 | 1.0     | 0.828895 | 0.746789 |
| 2010-04 | 2010-04 | 2010-04-01 | 1.0     | 0.569099 | 0.331814 |

</div>

``` python
df = pdm.setup_panel(raw,
                        panel_ids='firm_id',
                        time_var='date', time_var_format="%Y-%m",
                        freq='M')
df
```

<div>


|         |         | date    | dtdate     | A        | B        |
|---------|---------|---------|------------|----------|----------|
| firm_id | Mdate   |         |            |          |          |
| 1       | 2010-01 | 2010-01 | 2010-01-01 | 0.282501 | 0.513859 |
|         | 2010-02 | 2010-02 | 2010-02-01 | 0.828895 | 0.746789 |
|         | 2010-04 | 2010-04 | 2010-04-01 | 0.569099 | 0.331814 |
| 2       | 2010-01 | 2010-01 | 2010-01-01 | 0.207558 | 0.401378 |
|         | 2010-02 | 2010-02 | 2010-02-01 | 0.054230 | 0.993980 |
|         | 2010-04 | 2010-04 | 2010-04-01 | 0.062525 | 0.200272 |

</div>

``` python
pdm.lag(df['A'])
```

    firm_id  Mdate  
    1        2010-01         NaN
             2010-02    0.282501
             2010-04         NaN
    2        2010-01         NaN
             2010-02    0.207558
             2010-04         NaN
    Name: A_lag1, dtype: float64
