Metadata-Version: 2.2
Name: scim-sanity
Version: 0.5.5
Summary: Live SCIM server conformance testing and payload validation (RFC 7643/7644)
Author-email: Thomas Betz <thomas@seattlecodestudio.com>
License: MIT
Project-URL: Homepage, https://github.com/thomaselliottbetz/scim-sanity
Project-URL: Documentation, https://github.com/thomaselliottbetz/scim-sanity#readme
Project-URL: Repository, https://github.com/thomaselliottbetz/scim-sanity
Project-URL: Issues, https://github.com/thomaselliottbetz/scim-sanity/issues
Keywords: scim,scim2,validation,rfc7643,rfc7644,cli,identity,agent,agentic
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Topic :: System :: Systems Administration :: Authentication/Directory
Requires-Python: >=3.9
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: click>=8.0.0
Provides-Extra: dev
Requires-Dist: pytest>=7.0.0; extra == "dev"
Requires-Dist: jsonschema>=4.0.0; extra == "dev"

# scim-sanity

Find out exactly where your SCIM server deviates from RFC 7643/7644 — before client integrations fail in production. Also validates SCIM payloads statically before they reach a server. Supports User, Group, Agent, and AgenticApplication resources, including agentic identity types per `draft-abbey-scim-agent-extension-00`.

[![Python 3.9+](https://img.shields.io/badge/python-3.9+-blue.svg)](https://www.python.org/downloads/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
[![pre-commit.ci status](https://results.pre-commit.ci/badge/github/thomaselliottbetz/scim-sanity/main.svg)](https://results.pre-commit.ci/latest/github/thomaselliottbetz/scim-sanity/main)

## Features

**scim-sanity** is a **pragmatic, production-oriented SCIM conformance and interoperability harness**:
- **Server conformance probe** — Run a 7-phase CRUD lifecycle test against a live SCIM endpoint. Tests discovery, User/Group/Agent/AgenticApplication operations, search, pagination, and error handling.
- **Payload validation (linting)** — Static SCIM JSON analysis before sending data to a server. Catches missing required attributes, immutable field violations, null value misuse, and schema URN errors.
- **Agentic identity support** — Validates Agent and AgenticApplication resources per IETF `draft-abbey-scim-agent-extension-00`.
- **Strict and compat modes** — Strict mode (default) treats all spec deviations as failures. Compat mode downgrades known real-world deviations (e.g., `application/json` instead of `application/scim+json`) to warnings.
- **Behavioral, black-box testing** — Tests servers via real CRUD, search, and lifecycle flows against the failure modes that break real integrations.
- **Minimal dependencies** — Requires only Click. The `requests` library is auto-detected and used when available for richer HTTP handling, but is not required.

## Installation

```bash
pip install scim-sanity
```

Or from source:

```bash
git clone https://github.com/thomaselliottbetz/scim-sanity.git
cd scim-sanity
python -m venv venv
source venv/bin/activate
pip install -e ".[dev]"
```

## Server Conformance Probe

Test a live SCIM server for RFC 7643/7644 conformance. The probe **creates, modifies, and deletes real resources** on the target server, then cleans up after itself.

⚠️ Warning: This tool performs destructive operations. Do not run against production tenants without explicit authorization.

```bash
# Basic probe with bearer token
scim-sanity probe https://example.com/scim/v2 --token <token> --i-accept-side-effects

# Basic auth
scim-sanity probe https://example.com/scim/v2 --username admin --password secret --i-accept-side-effects

# Compat mode (known deviations become warnings, not failures)
scim-sanity probe <url> --token <token> --compat --i-accept-side-effects

# JSON output for CI/CD
scim-sanity probe <url> --token <token> --json-output --i-accept-side-effects

# Test only a specific resource type
scim-sanity probe <url> --token <token> --resource Agent --i-accept-side-effects

# Self-signed certificates
scim-sanity probe <url> --token <token> --tls-no-verify --i-accept-side-effects

# Leave test resources on the server for inspection
scim-sanity probe <url> --token <token> --skip-cleanup --i-accept-side-effects

# Custom timeout and proxy
scim-sanity probe <url> --token <token> --timeout 60 --proxy http://proxy:8080 --i-accept-side-effects

# Custom CA bundle
scim-sanity probe <url> --token <token> --ca-bundle /path/to/ca-cert.pem --i-accept-side-effects
```

### Probe Options

| Option | Description |
|--------|-------------|
| `--token` | Bearer token for authentication |
| `--username` / `--password` | Basic auth credentials |
| `--i-accept-side-effects` | **Required.** Acknowledge that the probe creates/deletes resources |
| `--strict` / `--compat` | Strict (default) or compat validation mode |
| `--json-output` | Output results as JSON |
| `--resource` | Test a specific resource type (User, Group, Agent, AgenticApplication) |
| `--skip-cleanup` | Leave test resources on the server |
| `--tls-no-verify` | Skip TLS certificate verification |
| `--timeout` | Per-request timeout in seconds (default: 30) |
| `--proxy` | HTTP/HTTPS proxy URL |
| `--ca-bundle` | Path to custom CA certificate bundle |

### Safety Guardrails

The probe implements several safety measures to prevent accidental damage:

- **Explicit consent** — Refuses to run without `--i-accept-side-effects`.
- **Namespace isolation** — All test resources are prefixed with `scim-sanity-test-` to avoid collisions with real data.
- **Resource caps** — Hard limit of 10 agents in rapid lifecycle tests.
- **429 retry** — Automatically retries on 429 Too Many Requests, honoring `Retry-After` headers (max 3 retries).
- **500 transience detection** — When a POST returns 500, the probe retries once after a brief delay using the same request headers. If the retry succeeds, the result is recorded as a warning ("transient instability") and the CRUD lifecycle continues with the resource created by the retry. If both attempts fail, content-type rejection diagnosis runs before reporting the final failure.
- **Timeouts** — Per-request timeouts prevent hung runs.
- **Cleanup** — Deletes all created test resources in reverse order (groups before users). Skippable with `--skip-cleanup`.
- **Failure semantics** — If the process is interrupted, partial cleanup may occur; orphaned test resources are possible and should be removed manually.
- **Secret redaction** — Authorization headers are redacted in any JSON output or logs.

### Test Sequence

The probe runs 7 phases. Each phase tests specific RFC clauses against real HTTP traffic — no mocking.

1. **Discovery** (RFC 7644 §4)
   - GET `/ServiceProviderConfig`, `/Schemas`, `/ResourceTypes`
   - Asserts: HTTP 200, `Content-Type: application/scim+json`, parseable JSON body
   - A server that omits these endpoints forces clients to hardcode assumptions about server capabilities

2. **User CRUD Lifecycle** (RFC 7644 §3.3, §3.4.1, §3.5.1, §3.6; RFC 7643 §4.1)
   - POST → asserts 201, `Content-Type: application/scim+json`, `Location` header, `id`, `meta.created`, `meta.lastModified`
   - GET by id → asserts 200, same Content-Type and meta fields
   - PUT → asserts 200, same Content-Type and meta fields
   - GET after PUT → asserts the updated field value persisted
   - PATCH `active=false` → asserts 200 or 204
   - GET after PATCH → asserts `active` is `false`
   - DELETE → asserts 204 No Content (RFC 7644 §3.6)
   - GET after DELETE → asserts 404

3. **Group CRUD Lifecycle** (RFC 7644 §3.3; RFC 7643 §4.2)
   - Same sequence as User
   - Additional PATCH: add a member, then remove all members — asserts 200 each

4. **Agent CRUD Lifecycle** (draft-abbey-scim-agent-extension-00)
   - Same sequence as User
   - Skipped if server does not advertise Agent support in `/ResourceTypes`
   - **Agent Rapid Lifecycle** — create and immediately delete multiple agents (default 10) to test ephemeral provisioning at machine speed

5. **AgenticApplication CRUD Lifecycle** (draft-abbey-scim-agent-extension-00)
   - Same sequence as User
   - Skipped if server does not advertise AgenticApplication support

6. **Search** (RFC 7644 §3.4.2, §8.1)
   - GET `/Users` → asserts ListResponse envelope (`schemas`, `totalResults`, `Resources`), `Content-Type: application/scim+json`
   - GET `/Users?filter=...` → asserts 200 (or 400 if partial filter support)
   - GET `/Users?startIndex=1&count=1` → asserts pagination parameters honored
   - GET `/Users?count=0` → asserts `totalResults` present with empty `Resources`

7. **Error Handling** (RFC 7644 §3.12)
   - GET nonexistent resource → asserts 404 with SCIM error schema (`schemas`, `status`)
   - POST invalid JSON body → asserts 400 with SCIM error schema
   - POST missing required field (`userName`) → asserts 400 with SCIM error schema

### Strict vs Compat Mode

**Strict mode** (`--strict`, default) treats all RFC deviations as failures.

**Compat mode** (`--compat`) applies a curated **Deviation Policy**: known, widespread ecosystem deviations are downgraded to warnings instead of failures. This list is intentional and versioned.
Current compat warnings include:
- `application/json` instead of `application/scim+json`
- DELETE 204 with response body
- Location header mismatch with `meta.location`
- Missing error schema in error responses
- ETag/meta.version mismatch

Warnings appear in output but don't cause a non-zero exit code.

**Always failures (not compat-eligible):** Some deviations are reported as `FAIL` in both strict and compat mode because they fundamentally break RFC-compliant clients:
- Server rejects `Content-Type: application/scim+json` requests (e.g., with 500) but accepts `application/json` — diagnosed automatically and cited against RFC 7644 §8.2.

**Error response reporting:** When a server returns a 4xx or 5xx status for a resource endpoint, only the unexpected status code is reported. Predictable side-effects (missing `id`, `meta`, `schemas` in the error body) are suppressed to avoid obscuring the root cause with cascade noise.

#### Real-World Server Behavior

Enterprise SCIM servers often exhibit:

- **Rate limiting** (429 + Retry-After)
- **Eventual consistency** (a GET immediately after PUT may briefly return stale data)
- **Partial filter support** or restricted query capabilities

scim-sanity attempts to behave accordingly by retrying on 429, validating boundary cases, and clearly reporting unsupported or nonconformant behavior.

### Fix Summary

When failures are present, the probe appends a prioritised **Fix Summary** after the results. Each entry has three lines:

```
  [P1] Trouble: Wrong Content-Type on SCIM responses (12 tests affected)
       Fix: Set Content-Type: application/scim+json on all responses served from /scim/v2/
       Rationale: Compliant clients inspect Content-Type before parsing — every response
                  is rejected regardless of whether the body is otherwise correct.
```

Issues are ordered by severity (P1 most critical). The fix summary is omitted when all tests pass. In JSON output mode, the same information is available as an `issues` array (see below).

### JSON Output (Stable Interface)

```bash
scim-sanity probe <url> --token <token> --json-output --i-accept-side-effects
```

```json
{
  "scim_sanity_version": "0.5.4",
  "mode": "strict",
  "timestamp": "2026-02-24 09:15:00",
  "summary": {
    "total": 32,
    "passed": 14,
    "failed": 15,
    "warnings": 0,
    "skipped": 3,
    "errors": 0
  },
  "issues": [
    {
      "priority": "P1",
      "title": "Wrong Content-Type on SCIM responses",
      "rationale": "Compliant clients inspect Content-Type before parsing — every response is rejected regardless of whether the body is otherwise correct.",
      "fix": "Set Content-Type: application/scim+json on all responses served from /scim/v2/",
      "affected_tests": 12
    }
  ],
  "results": [
    {"name": "GET /ServiceProviderConfig", "status": "fail", "message": "Content-Type should be application/scim+json, got 'text/html; charset=utf-8'", "phase": "Phase 1 — Discovery"}
  ]
}
```

The JSON schema is treated as a public interface and is stable within major versions.

## Payload Validation (Linting)

Statically validate (lint) SCIM resource payloads and PATCH operations before sending them to a server. Resource type is auto-detected from schema URNs. This is a spec-driven validator with linter-style ergonomics: fast, offline, and suitable for CI/CD gating.

```bash
# Validate a resource file
scim-sanity user.json

# Validate a PATCH operation
scim-sanity --patch patch.json

# Validate from stdin
echo '{"schemas":["urn:ietf:params:scim:schemas:core:2.0:User"],"userName":"user@example.com"}' | scim-sanity --stdin

# Use in CI/CD pipelines
scim-sanity payload.json || exit 1
```

### Validation Rules

**Required attributes:**
- User: `userName`
- Group: `displayName`
- Agent: `name`
- AgenticApplication: `name`

**What it checks:**
- Schema URN validity and presence
- Required attributes per resource type
- Immutable attributes (`id`, `meta`) not set by client
- Null values (use PATCH `remove` instead)
- PATCH operation structure (`op`, `path`, `value` correctness)
- Complex and multi-valued attribute structure

### Exit Codes

- `0` — Validation passed (or all probe tests passed)
- `1` — Validation failed, probe failures detected, or error

## Payload Examples

### What the linter catches

Given a payload with a missing required field and a client-set immutable attribute:

```json
{
  "schemas": ["urn:ietf:params:scim:schemas:core:2.0:User"],
  "id": "123",
  "name": {"givenName": "John"}
}
```

```
Found 3 error(s):

❌ Missing required attribute: 'userName' (schema: urn:ietf:params:scim:schemas:core:2.0:User) at userName
❌ User resource missing required attribute: 'userName'
❌ Immutable attribute 'id' should not be set by client (mutability: readOnly) at id
```

### Minimal valid examples

**User**
```json
{
  "schemas": ["urn:ietf:params:scim:schemas:core:2.0:User"],
  "userName": "john.doe@example.com"
}
```

**Group**
```json
{
  "schemas": ["urn:ietf:params:scim:schemas:core:2.0:Group"],
  "displayName": "Engineering Team"
}
```

**Agent**
```json
{
  "schemas": ["urn:ietf:params:scim:schemas:core:2.0:Agent"],
  "name": "automation-agent"
}
```

**PATCH operation**
```json
{
  "schemas": ["urn:ietf:params:scim:api:messages:2.0:PatchOp"],
  "Operations": [{"op": "replace", "path": "displayName", "value": "New Name"}]
}
```

## Pre-commit Integration

```yaml
repos:
  - repo: local
    hooks:
      - id: scim-sanity
        name: Validate SCIM resources
        entry: python -m scim_sanity
        language: system
        types: [json]
        exclude: |
          (?x)^(
            .*/node_modules/.*|
            .*/\.venv/.*|
            .*/venv/.*|
            .*package\.json$|
            .*package-lock\.json$|
            .*tsconfig.*\.json$|
            .*jsconfig\.json$
          )$
        pass_filenames: true
        stages: [commit]
```

## Ansible Integration

Action plugin for SCIM validation in Ansible playbooks. See [ansible/README.md](ansible/README.md).

```yaml
- name: Validate SCIM payload
  scim_validate:
    payload: "{{ user_payload }}"
    operation: full
  register: validation_result
```

## Identity Provider Guides

- [Microsoft Entra ID Integration](docs/integrations/entra-id.md)
- [Google Workspace Integration](docs/integrations/google-workspace.md)

## Security and Compliance

- [Security and Compliance Guide](docs/security/compliance.md)

## Development

```bash
git clone https://github.com/thomaselliottbetz/scim-sanity.git
cd scim-sanity
python -m venv venv
source venv/bin/activate
pip install -e ".[dev]"
pytest -v
```

## Planned Improvements

**PATCH filter expression testing** (RFC 7644 §3.5.2) — The probe currently tests simple PATCH paths (`active`, `members`). Complex filter-based paths such as `emails[type eq "work"].value` are a known interop pain point and are not yet covered.

**Phase 1 schema content validation** — Discovery endpoint tests currently verify HTTP 200 and correct Content-Type but do not validate that the returned schema bodies are well-formed or consistent with the resources the server actually implements.

**Phase 6 resource body validation** — The search phase validates the ListResponse envelope structure but does not inspect individual resources within the `Resources` array. A server returning well-formed envelopes with non-conformant resource bodies would currently pass.

**GitHub Action** — A ready-to-use GitHub Action for running the probe or linter in CI/CD pipelines without requiring a local Python environment.

**Docker image** — A zero-setup container image for running the probe against any reachable SCIM endpoint without installing Python or pip.

## Contributing

Contributions via Pull Request.

## License

MIT License - see [LICENSE](LICENSE) file.

## References

- [RFC 7643 - SCIM: Core Schema](https://tools.ietf.org/html/rfc7643)
- [RFC 7644 - SCIM: Protocol](https://tools.ietf.org/html/rfc7644)
- [draft-abbey-scim-agent-extension-00](https://datatracker.ietf.org/doc/draft-abbey-scim-agent-extension/)
