Helpers

Various helper functions for working with Polars.

oi_tools.polars_helpers.clean_col_name(
name: str,
) str

Normalize a column name to snake_case.

Parameters:

name (str) – Raw column name string.

Returns:

Cleaned column name with whitespace replaced by underscores, camelCase converted to snake_case, and non-alphanumeric characters replaced with underscores.

Return type:

str

oi_tools.polars_helpers.to_expr(
x: str | Expr | int | float,
) Expr

Convert the input to a Polars expression.

Parameters:

x (str | Expr | int | float) – A Polars expression, column name string, or numeric literal.

Returns:

A Polars expression.

Return type:

pl.Expr

oi_tools.polars_helpers.to_masked_expr(
*xs: str | Expr | int | float,
) Sequence[Expr]

Create a set of expressions with a standardized null mask.

All output expressions evaluate to null wherever any input expression is null.

Parameters:

*xs (str | Expr | int | float) – Expressions to mask.

Returns:

Expressions that evaluate to null when any input is null.

Return type:

Sequence[pl.Expr]

oi_tools.polars_helpers.to_selector(
x: Collection[str] | Selector,
) Selector

Convert the input to a Polars selector.

Parameters:

x (Collection[str] | Selector) – A Polars selector or a collection of column names.

Returns:

A Polars column selector.

Return type:

cs.Selector

Various helper/utility functions.

oi_tools.helpers.inflation_adjust(
col: str | Expr | int | float,
*,
from_year: str | Expr | int | float,
to_year: str | Expr | int | float,
series: str = 'CUUR0000SA0',
) Expr

Adjust for inflation using the Consumer Price Index.

Useful references:

Parameters:
  • col (str | Expr | int | float) – The column (or columns) to adjust.

  • from_year (str | Expr | int | float) – The year in which the dollar value is currently measured.

  • to_year (str | Expr | int | float) – The year to which you would like to inflation adjust.

  • series (str) – The CPI series used for inflation adjustment, CUUR0000SA0 by default.

Return type:

Expr

Examples

>>> df = pl.DataFrame({"income": [50000, 75000], "year": [2010, 2015]})
>>> df.with_columns(
...     income_2023=inflation_adjust("income", from_year="year", to_year=2023)
... )
shape: (2, 3)
┌────────┬──────┬──────────────┐
│ income ┆ year ┆ income_2023  │
│ ---    ┆ ---  ┆ ---          │
│ i64    ┆ i64  ┆ f64          │
╞════════╪══════╪══════════════╡
│ 50000  ┆ 2010 ┆ 69867.832117 │
│ 75000  ┆ 2015 ┆ 96417.767502 │
└────────┴──────┴──────────────┘