Metadata-Version: 2.4
Name: jcp-data-manager
Version: 0.1.0
Summary: Utilities for merging session data with LinkedIn member exports and optional demographic enrichment.
Requires-Python: >=3.10
Description-Content-Type: text/markdown
Requires-Dist: polars>=1.0.0
Requires-Dist: requests>=2.31.0
Provides-Extra: image
Requires-Dist: deepface>=0.0.93; extra == "image"
Provides-Extra: names
Requires-Dist: gender-guesser>=0.4.0; extra == "names"
Requires-Dist: ethnicolr>=0.18.4; extra == "names"
Requires-Dist: pandas>=2.0.0; extra == "names"
Provides-Extra: dev
Requires-Dist: pytest>=8.0.0; extra == "dev"

# jcp-data-manager

This folder now contains a package-friendly Python implementation of the workflow from `merge_testing.ipynb`.
It is designed to work with arbitrary input files, not just the example JSON files in this repo, as long as they follow the same two source formats.

## What it does

- Loads LinkedIn member JSON data
- Loads session JSON data
- Normalizes and merges both datasets on `user_id`
- By default, enriches rows with image-based DeepFace analysis
- By default, enriches rows with name-based gender and ethnicity predictions

## Expected input shapes

The LinkedIn file should be a top-level JSON list of member records and must include either `wordpress_user_id` or `user_id`.

The sessions file should be a top-level JSON object with a `sessions` key whose value is a list. Each session record must include at least `user_id` and `session_id`.

## Install

```bash
pip install -e .
```

Optional extras:

```bash
pip install -e ".[image]"
pip install -e ".[names]"
pip install -e ".[image,names]"
```

## CLI usage

```bash
jcp-data-manager ^
  --sessions .\jcpst-sessions-2026-04-07-22-48-30.json ^
  --linkedin .\linkedin-member-data-2026-04-07-224846.json ^
  --output .\merged.parquet
```

You can also run it as a module:

```bash
python -m jcp_data_manager.cli --help
```

To skip an enrichment step, use `--skip-image-analysis` or `--skip-name-analysis`.

## Project layout

```text
src/jcp_data_manager/
  __init__.py
  cli.py
  enrichment.py
  io.py
  merge.py
```

The notebook was left untouched so you can compare outputs while you migrate.
