Metadata-Version: 2.4
Name: pydata-constraints
Version: 1.0.1
Summary: The easiest way to validate your data streams in Python. Whether you have small JSON files or massive CSV dumps, this tool ensures your data isn't garbage.
Author: Francisco Pinto-Santos
License: MIT
Project-URL: Homepage, https://github.com/GandalFran/pydata-constraints
Project-URL: Issues, https://github.com/GandalFran/pydata-constraints/issues
Keywords: data-validation,json,csv,schema-validation,cli
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Topic :: Scientific/Engineering :: Information Analysis
Requires-Python: >=3.8
Description-Content-Type: text/markdown
Requires-Dist: PyYAML>=6.0.1
Requires-Dist: ijson>=3.2.3
Requires-Dist: typer>=0.12.0
Provides-Extra: dev
Requires-Dist: pytest>=7.0; extra == "dev"
Requires-Dist: pytest-cov>=4.0; extra == "dev"
Requires-Dist: ruff>=0.3.0; extra == "dev"

# PyData Constraints

![License](https://img.shields.io/badge/license-MIT-blue.svg)
![Python](https://img.shields.io/badge/python-%3E%3D3.10-blue.svg)
[![CI](https://github.com/GandalFran/pydata-constraints/actions/workflows/ci.yml/badge.svg)](https://github.com/GandalFran/pydata-constraints/actions/workflows/ci.yml)
[![PRs Welcome](https://img.shields.io/badge/PRs-welcome-brightgreen.svg)](https://github.com/GandalFran/pydata-constraints/compare)

**PyData Constraints** is the easiest way to validate your data streams in Python. Whether you have small JSON files or massive CSV dumps, this tool ensures your data isn't garbage.

---

## 🚀 Why PyData Constraints?

- **Rule-Based**: Define your rules in simple JSON/YAML files. No coding required.
- **Universal**: Works with JSON and CSV out of the box using efficient streaming.
- **Developer Friendly**: Written in pure Python with minimal dependencies.

## ⚡ Basic Use Case (At a glance)

Imagine you have a `users.json` file and you want to ensure all emails are valid.

**1. Your Data (`users.json`)**:
```json
[
  { "id": 1, "email": "alice@example.com" },
  { "id": 2, "email": "bob-has-no-domain" }
]
```

**2. Your Rules (`config.json`)**:
```json
{
  "sources": [
    {
      "service": "users",
      "type": "file",
      "path": "users.json",
      "format": "json"
    }
  ],
  "constraints": [
    {
      "type": "format",
      "id": "valid-email",
      "service": "users",
      "field": "email",
      "regex": "^[^\\s@]+@[^\\s@]+\\.[^\\s@]+$",
      "message": "Invalid email: {{email}}"
    }
  ]
}
```

**3. Run and get results**:
```bash
$ data-constraints validate --config config.json
[INFO] Validating data...
[valid-email] (format) Invalid email: bob-has-no-domain
Validation finished. Found 1 issues.
```

## 📦 Installation

To install via pip:
```bash
pip install pydata-constraints
```

This installs both the python package `pydata_constraints` and the CLI command `data-constraints`.

## 🧠 Basic Concepts

**PyData Constraints** works with three core files:
1. **Data Files**: Your actual data dumps in `.json` or `.csv` format.
2. **Config File**: A JSON/YAML file pointing the engine to your data and rules.
3. **Constraints (Rules)**: The definitions of what is valid.

### Rule Types at a Glance

*   **📝 Format**: Ensure strings look correct (e.g. Emails).
    *   *Example*: `"regex": "^[^\\s@]+@[^\\s@]+\\.[^\\s@]+$"`
*   **🆔 Unique**: Ensure no duplicate IDs exist across a file.
    *   *Example*: `"field": "employee_id"`
*   **🔗 Foreign Key**: Ensure referenced IDs actually exist in another file.
    *   *Example*: `order.userId` must exist in `users.id`

## 📚 Documentation

For full documentation, guides and advanced use cases, please check the [**docs/ directory**](https://github.com/GandalFran/pydata-constraints/tree/main/docs/).

- **[Key Concepts](https://github.com/GandalFran/pydata-constraints/tree/main/docs/concepts.md)**: Easy-to-understand explanation of file types and constraints.
- **[User Guide](https://github.com/GandalFran/pydata-constraints/tree/main/docs/guide.md)**: The comprehensive guide to using the CLI and defining rules.
- **[Integration Guide](https://github.com/GandalFran/pydata-constraints/tree/main/docs/integration.md)**: How to integrate the engine programmatically in Python.
- **[Examples](https://github.com/GandalFran/pydata-constraints/tree/main/docs/examples)**: Runnable examples, ranging from simple to e-commerce.

## 🛠️ Features

| Feature | Description |
| :--- | :--- |
| **Format Validation** | Regex-based validation for strings (Emails, Phones, Codes). |
| **Unique Validation** | Ensure IDs and codes are unique across your dataset. |
| **Foreign Keys** | Validate relationships between different files (e.g. `order.userId` -> `user.id`). |
| **Multiple Reporters** | Output results to Console, JSON, or Markdown files. |

## 🤝 Contributing

In **PyData Constraints** contributions, bug reports, and feature requests are welcome. If you have ideas, just launch your PRs!

## 📄 License

This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
