Metadata-Version: 2.4
Name: dto-strict
Version: 0.4.0
Summary: AST-based linter for Python DTO discipline and facade-ban enforcement — framework-agnostic.
Project-URL: Homepage, https://github.com/jekhator/dto-strict
Project-URL: Repository, https://github.com/jekhator/dto-strict.git
Project-URL: Issues, https://github.com/jekhator/dto-strict/issues
Author: James Ekhator
License: Apache-2.0
License-File: LICENSE
Keywords: ast,code-quality,dataclass,dto,linter,static-analysis
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Software Development :: Libraries
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Topic :: Software Development :: Quality Assurance
Classifier: Typing :: Typed
Requires-Python: >=3.10
Requires-Dist: tomli>=1.1.0; python_version < '3.11'
Provides-Extra: dev
Requires-Dist: pytest-cov; extra == 'dev'
Requires-Dist: pytest>=8; extra == 'dev'
Description-Content-Type: text/markdown

# dto-strict

AST-based linter for Python DTO discipline and facade-ban enforcement — pluggable, framework-agnostic.

## Why dto-strict?

Data Transfer Objects (DTOs) provide a critical boundary between services and prevent the fragmentation of business-logic definitions across codebases. However, when function signatures leak `Dict[str, Any]` or when services build dict literals inline instead of using structured DTOs, code becomes:

- **Loosely typed**: Shape mismatches only surface at runtime.
- **Duplicated**: The same business object gets redefined wherever it's used.
- **Hard to evolve**: Changing a field requires updating dicts in 10+ places.

Facade functions (module-level helpers that wrap framework machinery) similarly tend to proliferate and obscure intent when unmarked. The "facade—celery schedule" pattern makes intent explicit.

**Why in healthcare?** Healthcare systems (HIPAA/PHI/HIPAA-regulated compliance platforms) benefit from strong DTO boundaries because they force explicit thinking about what data is structured, typed, and auditable. When handling patient records, medical documents, and compliance reports, untyped dicts create liability: a field can be added silently, changed in shape unpredictably, and no type checker catches missing PII handling.

**dto-strict** enforces DTO and facade discipline via static AST analysis, with 6 focused rules:

1. **R001 (HIGH)**: Detect `Dict[str, Any]` or bare `dict`/`list`/`tuple` in service-layer function signatures (strict mode optional).
2. **R002 (MEDIUM)**: Flag inline dict literals with 3+ string keys; exception tags can require justification.
3. **R003 (MEDIUM)**: Flag `repr=False` in dataclasses (v0.2 canonical: plain `@dataclass(frozen=True, slots=True)` without `repr=False`; legacy mode available).
4. **R004 (HIGH)**: Demand exception tags on module-level functions (e.g., `# facade — celery schedule`).
5. **R005 (LOW)**: Encourage validators to use `DTO.from_dict()` pattern.
6. **R006 (HIGH)**: Detect `typing.Any` in function signatures (parameters and return types).

All rules are configurable; violations can be disabled, severity overridden, or paths scoped.

## Install

```bash
pip install dto-strict
```

## Quick Start

### Basic CLI Usage

```bash
# Lint a single file
dto-strict apps/compliance/services.py

# Lint a directory
dto-strict apps/

# Output as GitHub Actions annotations
dto-strict apps/ --format github

# Output as JSON
dto-strict apps/ --format json
```

### Configuration (pyproject.toml)

```toml
[tool.dto-strict]
service_paths = [
    "apps/*/services/*.py",
    "**/services/*.py",
]
dto_paths = [
    "**/dtos.py",
    "**/dtos/*.py",
]
exception_tags = [
    "facade — celery schedule",
    "FRAMEWORK",
]
disabled_rules = ["R005"]  # Disable low-priority rules if desired
severity_overrides = { "R002" = "low" }  # Downgrade specific rules
```

### Strict Mode (v0.2)

v0.2 introduces **canonical mode** alignment with modern DTO practices and strict collection detection:

```toml
[tool.dto-strict]

# R001: Catch bare dict/list/tuple without type parameters
strict_collections = true  # Default: false. When true, bare collections trigger violations.

# R002: Require justification on exception tags + configurable dict key threshold
exception_tag_requires_justification = true  # Default: false.
# Tags must now use format: "tag: explanation" (e.g., "facade — celery schedule: transient event payload")

min_dict_keys = 3  # NEW in v0.2: Threshold for R002 dict literal flagging (default: 3)

# Limit reuse of exception tags in a single file
max_exception_tags_per_file = 3  # Default: null (no limit)

# R003: Canonical mode (v0.2 default) flags repr=False as anti-canonical + strict/relaxed modes
r003_mode = "canonical"  # Default: "canonical" (v0.2). Use "legacy" for v0.1 behavior.
# In canonical mode: @dataclass(frozen=True, slots=True) is correct; repr=False is flagged.
# In legacy mode: @dataclass must include frozen=True, slots=True, AND repr=False (v0.1 requirement).

r003_strict_repr = true  # NEW in v0.2: In canonical mode, flag repr=False (default: true)
# Set to false for relaxed mode: only checks frozen+slots, ignores repr=False

# R004: NEW auto-detect class-method-wrapping pattern
# Module-level functions that delegate to class methods are now auto-detected
# (no exception tag needed; reduces false positives)

# R006: Scope typing.Any detection to specific paths
r006_paths = [
    "apps/*/services/*.py",
    "**/services/*.py",
]
```

### Baseline Ratchet Mode (v0.2)

Accept current violations as "baseline" debt and track only new violations:

```bash
# Generate baseline from current state
dto-strict apps/ --generate-baseline > .dto-strict-baseline.json

# Subsequent runs accept baseline violations; new ones trigger failure
dto-strict apps/ --baseline .dto-strict-baseline.json
```

Baseline tracks violations by file, line, and rule ID. When violations are fixed and removed from the codebase, the baseline can be regenerated (exit code 0 + notice on removal).

**Why canonical mode?** Per 2026-05-09 DTO-strict pivot, the canonical pattern is:
- `@dataclass(frozen=True, slots=True)` — immutability + memory efficiency
- **NO `repr=False`** — let repr work normally; custom `__repr__` not needed
- Store values, don't override output; if a field is PII-sensitive, use external redaction tools

### GitHub Actions

Create `.github/workflows/dto-strict.yml`:

```yaml
name: dto-strict
on:
  pull_request:
    paths: ['apps/**.py']

jobs:
  lint:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with:
          python-version: '3.12'
      - run: pip install dto-strict
      - run: dto-strict apps/ --format github
```

### Pre-commit Hook

Add to `.pre-commit-config.yaml`:

```yaml
- repo: local
  hooks:
    - id: dto-strict
      name: dto-strict
      entry: dto-strict
      language: python
      types: [python]
      additional_dependencies: ['dto-strict']
      stages: [commit]
```

## Rules

### R001: Dict[str, Any] and Bare Collections in Service Signatures (HIGH)

Service-layer functions should not accept or return `Dict[str, Any]`. With `strict_collections=true`, bare `dict`, `list`, and `tuple` without type parameters are also flagged.

**Fail (always):**
```python
def process_user(config: Dict[str, Any]) -> Dict[str, Any]:
    return {"status": "ok"}
```

**Fail (with strict_collections=true):**
```python
def fetch_users() -> list:  # Bare list
    return []

def merge_configs(base: dict, overrides: dict) -> dict:  # Bare dicts
    return {**base, **overrides}
```

**Pass:**
```python
from typing import Dict

@dataclass(frozen=True, slots=True)
class UserConfigDTO:
    timeout: int
    retries: int

def process_user(config: UserConfigDTO) -> Dict[str, str]:
    return {"status": "ok"}

def fetch_users() -> list[UserDTO]:  # Typed list
    return []

def merge_configs(base: dict[str, Any], overrides: dict[str, Any]) -> dict[str, Any]:  # Typed dicts
    return {**base, **overrides}
```

**Rationale:** Typed parameters enable IDE completion and catch shape mismatches early. Bare collections hide shape from static checkers and readers.

---

### R002: Inline Dict Literals (MEDIUM)

Service files with inline dict literals containing 3+ string keys should define a DTO instead. Exception tags allow one-off inline dicts; with `exception_tag_requires_justification=true`, tags must include a colon-delimited explanation.

**Fail (no tag):**
```python
def build_response(user_id: int) -> dict:
    return {
        "user_id": user_id,
        "status": "active",
        "timestamp": "2025-01-01",
    }
```

**Fail (tag without justification, if required):**
```python
def build_response(user_id: int) -> dict:  # facade — celery schedule
    return {
        "user_id": user_id,
        "status": "active",
        "timestamp": "2025-01-01",
    }
```

**Pass (with justified tag):**
```python
def build_response(user_id: int) -> dict:  # facade — celery schedule: SNS event envelope (transient)
    return {
        "user_id": user_id,
        "status": "active",
        "timestamp": "2025-01-01",
    }
```

**Pass (define DTO instead):**
```python
@dataclass(frozen=True, slots=True)
class ResponseDTO:
    user_id: int
    status: str
    timestamp: str

def build_response(user_id: int) -> ResponseDTO:
    return ResponseDTO(user_id, "active", "2025-01-01")
```

**Rationale:** Shared shapes should live in DTOs. Inline dicts make duplication invisible. Exception tags are for rare transient payloads; they should explain why.

---

### R003: Dataclass Canonical Form (MEDIUM)

**Canonical mode (v0.2 default):** Dataclasses must use `frozen=True, slots=True` WITHOUT `repr=False`.

**Legacy mode (v0.1):** Requires `frozen=True, slots=True, repr=False`.

#### Canonical Mode (v0.2)

**Fail (canonical mode):**
```python
@dataclass(frozen=True, slots=True, repr=False)  # Anti-canonical: has repr=False
class UserDTO:
    user_id: int
```

**Pass (canonical mode):**
```python
@dataclass(frozen=True, slots=True)
class UserDTO:
    user_id: int

@dataclass(frozen=True, slots=True)  # Both params present
class ConfigDTO:
    timeout: int
```

**Rationale (canonical):**
- `frozen=True`: Immutability enforces single-source-of-truth.
- `slots=True`: Memory efficiency and prevents attribute typos.
- **NO `repr=False`**: Default repr is fine; if a field is sensitive, use external redaction (logging mixin, etc.)

#### Legacy Mode (v0.1)

Use `r003_mode = "legacy"` in `pyproject.toml` if your codebase still requires `repr=False`:

```python
@dataclass(frozen=True, slots=True, repr=False)
class UserDTO:
    user_id: int
```

---

### R004: Module-Level Functions (HIGH)

Bare module-level functions (facades, framework hooks) must carry an exception tag in a comment or docstring.

**Fail:**
```python
def process_user(user_id: int):
    pass

def send_notification(message: str):
    pass
```

**Pass:**
```python
def process_user(user_id: int):  # facade — celery schedule
    pass

def send_notification(message: str):  # FRAMEWORK
    """Send via SNS."""
    pass

class UserService:
    def process(self, user_id: int):
        # Class methods don't need tags
        pass
```

**Exception Tags:** Configurable via `pyproject.toml` `exception_tags` list.

**Rationale:** Facades blur intent. Tags make intent explicit and signal "this is framework-specific, not business logic."

---

### R005: Validator Pattern (LOW)

`validate_*()` functions should use `DTO.from_dict()` or raise `ValidationError` to enforce payload shape.

**Fail:**
```python
def validate_user_payload(payload: dict) -> bool:
    return "user_id" in payload and "email" in payload
```

**Pass:**
```python
def validate_user_payload(payload: dict) -> UserDTO:
    try:
        user = UserDTO(
            user_id=payload["user_id"],
            email=payload["email"],
        )
        return user
    except (KeyError, TypeError) as e:
        raise ValidationError(f"Invalid shape: {e}")
```

**Rationale:** Validators should enforce structure, not just presence.

---

### R006: typing.Any in Signatures (HIGH)

Function signatures in service files should not use `typing.Any`. Build a proper DTO or use narrow type protocols instead.

**Fail:**
```python
from typing import Any

def process(data: Any) -> Any:  # Bad: loses all type info
    pass

def fetch_config() -> Optional[Any]:  # Bad: Any defeats Optional
    return None
```

**Pass:**
```python
from typing import Optional, Protocol

class Readable(Protocol):
    def read(self) -> bytes:
        ...

def process(data: dict[str, str]) -> dict[str, int]:  # Properly typed
    pass

def fetch_config() -> Optional[ConfigDTO]:  # Specific type
    return None

def read_file(f: Readable) -> bytes:  # Protocol for file-like objects
    return f.read()
```

**Rationale:** `Any` defeats static type checking and IDE completion. It hides shape assumptions and makes refactoring dangerous. Use protocols for file-like or callback types; use DTOs for business shapes.

---

## PHI / Sensitive Data Handling (Pattern 1)

**Why R003 removed blanket `repr=False`:** The v0.2 canonical pivot intentionally moves away from blanket `repr=False` as a PHI masking mechanism. Instead, use **explicit `__repr__` overrides** on DTOs containing sensitive fields.

**Pattern 1: Explicit `__repr__` on Sensitive DTOs**

```python
from dataclasses import dataclass

@dataclass(frozen=True, slots=True)
class Patient:
    """Patient DTO with sensitive fields."""
    patient_id: str
    name: str
    ssn: str  # Sensitive
    date_of_birth: str  # Sensitive

    def __repr__(self) -> str:
        """Mask PHI fields in repr."""
        return f"Patient(patient_id={self.patient_id!r}, name=<redacted>, ssn=<redacted>, date_of_birth=<redacted>)"
```

When a Patient DTO is logged or printed, only non-sensitive fields appear:
```python
>>> p = Patient(patient_id="P123", name="Alice", ssn="123-45-6789", date_of_birth="1990-01-01")
>>> print(p)
Patient(patient_id='P123', name=<redacted>, ssn=<redacted>, date_of_birth=<redacted>)
```

**Why explicit over blanket?**
- **Auditable:** Developers explicitly decide which fields are sensitive and how to mask them.
- **Flexible:** Different DTOs can have different masking strategies (redact, hash, truncate, etc.).
- **Future-proof:** External tools (e.g., AWS Comprehend Medical) can be layered on top for dynamic PHI detection.
- **Healthcare / HIPAA:** The combination of explicit DTOs + selective `__repr__` overrides is a standard privacy-by-design pattern in regulated systems.

---

## Suppressing Violations

Violations can be suppressed using `# noqa` comments. The linter recognizes:

- **`# noqa`** — Suppress all rules on this line
- **`# noqa: dto-strict`** — Suppress all dto-strict rules on this line
- **`# noqa: dto-strict-R001`** — Suppress rule R001 only
- **`# noqa: dto-strict-R001, dto-strict-R002`** — Suppress multiple rules

**Examples:**

```python
# Suppress a Dict[str, Any] violation on a specific function
def legacy_callback(config: Dict[str, Any]) -> None:  # noqa: dto-strict-R001
    """Old API we can't change."""
    pass

# Suppress all rules on a line
def process() -> dict:  # noqa
    return {}

# Suppress just R002 (inline dict literal) violation
error_response = {  # noqa: dto-strict-R002
    "status": "error",
    "code": 500,
    "message": "Internal server error",
}
```

---

## Output Formats

### Text (default)

```
app.py:10: R001 Dict[str, Any] in signature: process_user
service.py:20: R002 Inline dict literal with 4 keys
```

### GitHub Actions

```
::error file=app.py,line=10,col=5::R001 Dict[str, Any] in signature: process_user
::warning file=service.py,line=20,col=0::R002 Inline dict literal with 4 keys
```

### JSON

```json
[
  {
    "rule_id": "R001",
    "severity": "HIGH",
    "file": "app.py",
    "line": 10,
    "col": 5,
    "message": "Dict[str, Any] in signature: process_user"
  }
]
```

## Exit Codes

| Code | Meaning |
|------|---------|
| 0    | No violations |
| 1    | HIGH severity violations present |
| 2    | MEDIUM severity violations only |
| 3    | LOW severity violations only |

## Configuration Reference

```toml
[tool.dto-strict]

# Paths to check for service-layer violations (R001, R002, R004, R006)
# Default: ["apps/*/services/*.py", "**/services/*.py"]
service_paths = [
    "apps/*/services/*.py",
    "**/services/*.py",
]

# Paths to check for DTO definitions (R003)
# Default: ["**/dtos.py", "**/dtos/*.py"]
dto_paths = [
    "**/dtos.py",
    "**/dtos/*.py",
]

# Paths for R006 (typing.Any detection)
# Default: ["apps/*/services/*.py", "**/services/*.py"]
r006_paths = [
    "apps/*/services/*.py",
    "**/services/*.py",
]

# Allowed exception tags for R004 (module-level facades)
# Default: ["facade — celery schedule", "FRAMEWORK"]
exception_tags = [
    "facade — celery schedule",
    "FRAMEWORK",
    "CUSTOM_TAG",
]

# (v0.2) Bare dict/list/tuple without type parameters flagged as violations
# Default: false
strict_collections = true

# (v0.2) Exception tags must include colon-delimited justification
# Default: false
exception_tag_requires_justification = true

# (v0.2) Maximum exception tags per file (null = unlimited)
# Default: null
max_exception_tags_per_file = 3

# (v0.2) R003 mode: "canonical" (v0.2 default) or "legacy" (v0.1)
# In canonical: repr=False is anti-canonical and flagged
# In legacy: frozen=True, slots=True, repr=False all required
# Default: "canonical"
r003_mode = "canonical"

# Disable specific rules entirely
# Default: []
disabled_rules = ["R005"]

# Override severity for specific rules
# Valid values: "HIGH", "MEDIUM", "LOW"
# Default: {}
severity_overrides = {
    "R002" = "low",
}
```

## Design Philosophy

**Pluggable, not opinionated.** Every rule is:

- **Configurable**: Path patterns, exception tags, severity levels.
- **Disable-able**: Set `disabled_rules = ["R001"]` to skip it entirely.
- **Framework-agnostic**: No Django/FastAPI/Flask assumptions; adapters for each framework are opt-in extras.

**Defaults bundled, not imposed.** Out-of-the-box rules target Django + DRF + Celery patterns, but you can customize for your stack.

## Development

```bash
git clone https://github.com/jekhator/dto-strict.git
cd dto-strict
python3 -m venv .venv && source .venv/bin/activate
pip install -e .[dev]

# Run tests
pytest tests/ -v

# Run linter on itself
dto-strict src/ --format github
```

## License

Apache License 2.0. See LICENSE.

## Contributing

Issues and PRs welcome. Please include fixtures (good + bad examples) for new rules.

## See Also

- [pii-aware-mixin](https://github.com/jekhator/pii-aware-mixin) — Auto-hide PII in dataclass repr/logging.
- [logging-mixin](https://github.com/jekhator/logging-mixin) — Class-bound structured logging with correlation IDs.
