Metadata-Version: 2.4
Name: candidate-transformer
Version: 0.1.0
Summary: A production-grade Python framework for converting heterogeneous candidate information into canonical profiles.
Author-email: Principal Engineer <engineering@example.com>
License: MIT
Requires-Python: >=3.10
Requires-Dist: phonenumbers>=8.13.0
Requires-Dist: pydantic>=2.0.0
Requires-Dist: structlog>=23.1.0
Requires-Dist: typer>=0.9.0
Provides-Extra: dev
Requires-Dist: black>=23.0.0; extra == 'dev'
Requires-Dist: mypy>=1.5.0; extra == 'dev'
Requires-Dist: pytest-cov>=4.0.0; extra == 'dev'
Requires-Dist: pytest>=7.0.0; extra == 'dev'
Requires-Dist: ruff>=0.1.0; extra == 'dev'
Description-Content-Type: text/markdown

# Candidate Transformer

A production-grade Python framework for converting heterogeneous candidate information into unified canonical profiles.

## Installation

```bash
# Standard installation
pip install candidate-transformer

# Development installation
git clone https://github.com/example/candidate-transformer.git
cd candidate-transformer
pip install -e ".[dev]"
```

## Quick Start

### Python API

```python
from candidate_transformer import CandidateTransformer, PipelineConfig
import json

# Initialize the facade (loads default.json automatically if no config provided)
transformer = CandidateTransformer()

# Load heterogeneous data sources
with open('sample_data/recruiter.csv', 'r') as f:
    transformer.load('recruiter_csv', f)

with open('sample_data/ats.json', 'r') as f:
    transformer.load('ats_json', f)

# Execute pipeline and export JSON
output = transformer.export()
print(json.dumps(output, indent=2))
```

### CLI Execution

Transform a single source or multiple heterogeneous sources sequentially:
```bash
candidate-transformer transform \
    --source recruiter_csv=sample_data/recruiter.csv \
    --source ats_json=sample_data/ats.json \
    --source resume_text=sample_data/resume.txt \
    --config configs/default.json
```

The CLI supports orchestrating true multi-source merging. Each `--source` defines a `connector=file_path` pair. The system will load all provided sources, deduplicate identities, and merge them into canonical profiles. 

If you prefer configuration-driven workflows over CLI arguments, you can define `sources` in your JSON config:
```json
{
  "sources": [
    { "connector": "recruiter_csv", "input": "sample_data/recruiter.csv" },
    { "connector": "ats_json", "input": "sample_data/ats.json" }
  ]
}
```
**Note:** If `--source` CLI arguments are provided, they will strictly override the `sources` array in your JSON configuration.

Validate projected output:
```bash
candidate-transformer validate \
    --config configs/default.json \
    --input output.json
```

## Configuration

Configurations dictate how the internal `CanonicalCandidate` is reshaped (projected) into the final JSON output, as well as resolving conflict priorities.
See `configs/default.json` and `configs/minimal.json` for examples.

## Core Features

### Entity Resolution
The framework deterministically merges candidate profiles based on a strict priority cascade, avoiding fragile fuzzy-matching or ML-based heuristics. Records are merged if they share:
1. Exact Phone Match
2. Exact Email Match
3. Exact Name Match (case-insensitive)

### Confidence Scoring
Each resolved candidate receives a deterministic confidence score (0.0 to 1.0) based on:
- **Completeness**: Evaluates presence of Name, Contact, and Experience/Education.
- **Source Agreement**: Rewards candidates that appear consistently across multiple sources (e.g. found in ATS, Resume, and CSV).

### Output Validation
The projected JSON outputs are validated strictly against the dynamic schema defined in your configuration:
- Strongly typed fields (`string`, `number`, `string[]`)
- Strict requirement enforcement
- Deep nesting validation

Use `candidate-transformer validate` to ensure downstream systems receive perfectly formatted data.

## Testing & Quality

To run the exhaustive test suite and quality checks:
```bash
pytest
ruff check .
black --check .
mypy src
```

## Plugin Development
You can register new Connectors, Normalizers, or Strategies dynamically using the registries.
```python
from candidate_transformer.connectors import connector_registry
from candidate_transformer.interfaces.connector import BaseConnector

@connector_registry("my_custom_source")
class CustomConnector(BaseConnector):
    pass
```

## Publishing to PyPI
This package is configured with a modern `pyproject.toml` and `hatchling`.
```bash
python -m build
twine upload dist/*
```
