Metadata-Version: 2.4
Name: babynamesil
Version: 0.2.1
Summary: Israeli baby names dataset (1949-2024) from CBS
Author-email: Aviezer Lifshitz <aviezer.lifshitz@weizmann.ac.il>
License-Expression: CC0-1.0
Project-URL: Homepage, https://github.com/aviezerl/babynamesIL
Project-URL: Documentation, https://aviezerl.github.io/babynamesIL/
Project-URL: Repository, https://github.com/aviezerl/babynamesIL
Project-URL: Issues, https://github.com/aviezerl/babynamesIL/issues
Keywords: baby names,Israel,CBS,demographics,data,pandas
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Science/Research
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Scientific/Engineering
Classifier: Operating System :: OS Independent
Requires-Python: >=3.9
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: pandas>=1.5.0
Requires-Dist: pyarrow>=8.0.0
Provides-Extra: dev
Requires-Dist: pytest; extra == "dev"
Requires-Dist: pytest-cov; extra == "dev"
Requires-Dist: ruff; extra == "dev"
Requires-Dist: build; extra == "dev"
Requires-Dist: twine; extra == "dev"
Dynamic: license-file

# babynamesil

[![PyPI version](https://badge.fury.io/py/babynamesil.svg)](https://pypi.org/project/babynamesil/)
[![Python 3.9+](https://img.shields.io/badge/python-3.9+-blue.svg)](https://www.python.org/downloads/)
[![License: CC0-1.0](https://img.shields.io/badge/License-CC0_1.0-lightgrey.svg)](http://creativecommons.org/publicdomain/zero/1.0/)

Israeli baby names dataset (1949-2024) from the Central Bureau of Statistics (CBS).

This package provides easy access to comprehensive baby name statistics from Israel, including names given to at least 5 babies per year, categorized by demographic sector, sex, and year.

## Installation

```bash
pip install babynamesil
```

## Quick Start

```python
import babynamesil

# Load the main dataset
df = babynamesil.load_data()
print(df.head())
```

Output:
```
   sector  year sex  name     n      prop
0  Jewish  1949   F  רחל  1362  0.038065
1  Jewish  1949   F  אסתר  1344  0.037562
2  Jewish  1949   F  שרה  1190  0.033258
3  Jewish  1949   F  מרים   964  0.026942
4  Jewish  1949   F  חנה   895  0.025013
```

## Usage Examples

### Find the most popular names in 2024

```python
import babynamesil

df = babynamesil.load_data()

# Top 10 Jewish names in 2024
top_2024 = (
    df[(df['year'] == 2024) & (df['sector'] == 'Jewish')]
    .groupby('sex')
    .apply(lambda x: x.nlargest(10, 'n')[['name', 'n']])
)
print(top_2024)
```

### Track a name over time

```python
import babynamesil
import matplotlib.pyplot as plt

df = babynamesil.load_data()

# Track the name "נועם" (Noam) over time
noam = df[(df['name'] == 'נועם') & (df['sector'] == 'Jewish')]
noam_pivot = noam.pivot(index='year', columns='sex', values='prop')

noam_pivot.plot(title='נועם - Popularity Over Time')
plt.ylabel('Proportion')
plt.show()
```

### Get all-time totals

```python
import babynamesil

totals = babynamesil.load_totals()

# Most popular names of all time (Jewish sector)
jewish_totals = totals[totals['sector'] == 'Jewish']
print(jewish_totals.groupby('sex').apply(lambda x: x.nlargest(5, 'total')))
```

## Available Datasets

| Function | Description | Years | Rows |
|----------|-------------|-------|------|
| `load_data()` | Main baby names by year/sector/sex | 1949-2024 | ~160K |
| `load_totals()` | Aggregated totals by name | 1949-2024 | ~8K |
| `load_1948()` | Legacy 1948 data | 1948 | ~500 |
| `load_other()` | Archived "Other" sector | 1985-2021 | ~5K |

## Data Structure

### Main Dataset (`load_data()`)

| Column | Type | Description |
|--------|------|-------------|
| `sector` | str | "Jewish", "Muslim", "Christian-Arab", or "Druze" |
| `year` | int | Birth year (1949-2024) |
| `sex` | str | "M" (male) or "F" (female) |
| `name` | str | Baby name in Hebrew |
| `n` | int | Count of babies with this name |
| `prop` | float | Proportion within year/sector/sex (0-1) |

### Totals Dataset (`load_totals()`)

| Column | Type | Description |
|--------|------|-------------|
| `sector` | str | Demographic sector |
| `sex` | str | "M" or "F" |
| `name` | str | Baby name in Hebrew |
| `total` | int | Total count across all years |

## Data Source

All data is sourced from CBS (Israel Central Bureau of Statistics) Release 391/2025:

- **Press release**: [השמות הפרטיים שניתנו לילידי 2024](https://www.cbs.gov.il/he/mediarelease/Pages/2025/%D7%94%D7%A9%D7%9E%D7%95%D7%AA-%D7%94%D7%A4%D7%A8%D7%98%D7%99%D7%99%D7%9D-%D7%A9%D7%A0%D7%99%D7%AA%D7%A0%D7%95-%D7%9C%D7%99%D7%9C%D7%99%D7%93%D7%99-2024.aspx)
- **Data file**: [11_25_391t1.xlsx](https://www.cbs.gov.il/he/mediarelease/DocLib/2025/391/11_25_391t1.xlsx)

## Related Projects

- **R package**: [babynamesIL](https://github.com/aviezerl/babynamesIL) on CRAN
- **Web app**: [babynames.lifshitz.xyz](http://babynames.lifshitz.xyz)

## License

CC0 1.0 Universal - This work is dedicated to the public domain.

## Contributing

Contributions are welcome! Please open an issue or submit a pull request on [GitHub](https://github.com/aviezerl/babynamesIL).
