Metadata-Version: 2.4
Name: license-compliance-checker
Version: 1.0.0
Summary: License Compliance Checker (LCC) - Enterprise-grade open source compliance tool.
Author: Ajay Pundhir
License: Apache-2.0
Project-URL: Homepage, https://github.com/apundhir/license-compliance-checker
Project-URL: Repository, https://github.com/apundhir/license-compliance-checker
Project-URL: Documentation, https://github.com/apundhir/license-compliance-checker#readme
Project-URL: Bug Tracker, https://github.com/apundhir/license-compliance-checker/issues
Project-URL: Changelog, https://github.com/apundhir/license-compliance-checker/releases
Keywords: license,compliance,open-source,sbom,spdx,cyclonedx,software-composition-analysis
Classifier: Development Status :: 5 - Production/Stable
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: System Administrators
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Security
Classifier: Topic :: Software Development :: Quality Assurance
Classifier: Topic :: Software Development :: Build Tools
Classifier: Typing :: Typed
Requires-Python: >=3.11
Description-Content-Type: text/markdown
License-File: LICENSE
License-File: NOTICE
Requires-Dist: rich>=13.7.0
Requires-Dist: requests>=2.31.0
Requires-Dist: cachetools>=5.3.0
Requires-Dist: platformdirs>=4.2.0
Requires-Dist: packaging>=23.2
Requires-Dist: redis>=5.0.0
Requires-Dist: fastapi>=0.115.0
Requires-Dist: uvicorn>=0.30.0
Requires-Dist: python-jose[cryptography]>=3.3.0
Requires-Dist: passlib[argon2]>=1.7.4
Requires-Dist: argon2-cffi>=21.0.0
Requires-Dist: python-multipart>=0.0.6
Requires-Dist: slowapi>=0.1.9
Requires-Dist: pydantic[email]>=2.0.0
Requires-Dist: cyclonedx-python-lib>=5.0.0
Requires-Dist: spdx-tools>=0.8.0
Requires-Dist: packageurl-python>=0.11.0
Requires-Dist: huggingface-hub>=0.19.0
Requires-Dist: python-gnupg>=0.5.0
Requires-Dist: sqlalchemy[asyncio]>=2.0.0
Requires-Dist: alembic>=1.13.0
Requires-Dist: asyncpg>=0.29.0
Requires-Dist: arq>=0.25.0
Requires-Dist: gitpython>=3.1.0
Requires-Dist: openai>=1.0.0
Requires-Dist: python-dotenv>=1.0.0
Requires-Dist: aiosqlite>=0.19.0
Provides-Extra: test
Requires-Dist: pytest>=7.4; extra == "test"
Requires-Dist: pytest-cov>=4.1; extra == "test"
Requires-Dist: responses>=0.25; extra == "test"
Requires-Dist: freezegun>=1.4.0; extra == "test"
Requires-Dist: fakeredis>=2.21.3; extra == "test"
Requires-Dist: pytest-asyncio>=0.23.0; extra == "test"
Requires-Dist: aiosqlite>=0.17.0; extra == "test"
Dynamic: license-file

# License Compliance Checker (LCC)

[![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](LICENSE)
[![Python](https://img.shields.io/badge/python-3.11+-blue.svg)](pyproject.toml)
[![CI](https://github.com/apundhir/license-compliance-checker/actions/workflows/ci.yml/badge.svg)](https://github.com/apundhir/license-compliance-checker/actions/workflows/ci.yml)
[![PyPI](https://img.shields.io/pypi/v/license-compliance-checker.svg)](https://pypi.org/project/license-compliance-checker/)
[![VS Code](https://img.shields.io/visual-studio-marketplace/v/lcc.license-compliance-checker.svg)](https://marketplace.visualstudio.com/items?itemName=lcc.license-compliance-checker)
[![PRs Welcome](https://img.shields.io/badge/PRs-welcome-brightgreen.svg)](CONTRIBUTING.md)

**Know what you ship. Know what you owe.**

AI-native license compliance for the regulatory era. LCC is the only open-source scanner that combines dependency license detection, AI model license analysis, and EU AI Act Article 53 compliance -- in a single tool.

---

## Why LCC?

- **The only open-source scanner with AI model license detection + EU AI Act compliance.** Detects HuggingFace model licenses, RAIL restrictions, training data provenance, and maps findings to Article 53 obligations automatically.
- **Free alternative to FOSSA ($50K+/yr) and Black Duck ($30K+/yr).** Full transitive dependency resolution across 8 ecosystems, SBOM generation, and policy-as-code -- with no per-seat fees.
- **Detects GPL contamination, AGPL in SaaS, and license conflicts automatically.** Plain-English explanations tell you exactly what is wrong and how to fix it, before legal does.

---

## Quick Start

### Install from PyPI

```bash
pip install license-compliance-checker

# Scan a project
lcc scan .

# Scan with EU AI Act compliance policy
lcc scan . --policy eu-ai-act-compliance

# Generate an SBOM
lcc sbom generate --input scan-report.json --format cyclonedx --output sbom.json

# Check license compatibility
lcc scan . --project-license Apache-2.0 --context saas
```

### Install with Docker

```bash
git clone https://github.com/apundhir/license-compliance-checker.git
cd license-compliance-checker

export LCC_SECRET_KEY=$(python -c "import secrets; print(secrets.token_hex(32))")
export POSTGRES_PASSWORD=$(python -c "import secrets; print(secrets.token_hex(16))")

docker-compose -f docker-compose.prod.yml up -d --build
# Dashboard: http://localhost:3000  |  API: http://localhost:8000
```

---

## How to Use

### 1. Scan a Local Project

```bash
# Basic scan — detects all dependencies and their licenses
lcc scan /path/to/your/project

# Scan with JSON output for CI/CD
lcc scan . --format json --output report.json

# Scan specific ecosystems only
lcc scan . --format json --manifest requirements.txt --manifest package.json
```

### 2. Enforce Compliance Policies

```bash
# Block builds on any license violation
lcc scan . --policy permissive

# EU AI Act compliance for AI projects
lcc scan . --policy eu-ai-act-compliance

# NIST AI Risk Management Framework
lcc scan . --policy nist-ai-rmf --context high_risk
```

### 3. Detect License Conflicts

```bash
# Check if dependencies are compatible with your project license
lcc scan . --check-compatibility --project-license Apache-2.0

# SaaS context — detects AGPL/SSPL issues
lcc scan . --check-compatibility --project-license MIT --deployment-context saas
```

### 4. Generate SBOMs and Reports

```bash
# CycloneDX SBOM (with regulatory metadata for AI components)
lcc sbom generate --input report.json --format cyclonedx --output sbom.json

# SPDX SBOM
lcc sbom generate --input report.json --format spdx --output sbom.spdx.json

# HTML report for stakeholders
lcc report --input report.json --format html --output compliance-report.html

# Sign SBOM for supply chain attestation
lcc sbom sign --input sbom.json --key private.gpg --output sbom.json.sig
```

### 5. Run the Full Stack (Dashboard + API)

```bash
export LCC_SECRET_KEY=$(python -c "import secrets; print(secrets.token_hex(32))")
export POSTGRES_PASSWORD=$(python -c "import secrets; print(secrets.token_hex(16))")

docker-compose -f docker-compose.prod.yml up -d --build

# Dashboard: http://localhost:3000
# API docs:  http://localhost:8000/docs
```

### 6. Integrate into CI/CD

```yaml
# .github/workflows/compliance.yml
- uses: apundhir/license-compliance-checker/.github/actions/license-compliance@v1
  with:
    policy: 'permissive'
    fail-on: 'violations'
```

### 7. Use the VS Code Extension

Install the extension, then open any project with manifest files. Violations appear as red underlines on save. The status bar shows overall compliance status.

---

## Key Features

### AI Model & Dataset Scanning

LCC provides first-class support for AI/ML components that traditional SCA tools ignore entirely.

- **HuggingFace model card parsing** -- extracts training data sources, known limitations, evaluation metrics, environmental impact, and use restrictions from model cards and dataset cards.
- **RAIL restriction display in plain English** -- translates OpenRAIL, Llama, and other AI-specific license restrictions into clear, actionable language (e.g., "No harm", "User threshold applies", "Attribution required").
- **EU AI Act compliance assessment** -- automatically evaluates AI models against Article 53 GPAI obligations and assigns risk classifications (Minimal, Limited, High, GPAI, GPAI Systemic, Prohibited).

### License Compatibility Engine

Goes beyond detection to find conflicts across your entire dependency tree.

- **GPL contamination detection** -- flags strong copyleft licenses (GPL-2.0, GPL-3.0) in permissive-licensed projects with plain-English explanations.
- **AGPL-in-SaaS detection** -- identifies AGPL-3.0 dependencies that require full source disclosure in SaaS deployments.
- **Pairwise conflict analysis** -- detects known incompatible license combinations (e.g., GPL-2.0 + Apache-2.0, GPL + BSD-4-Clause).
- **Copyleft version conflicts** -- flags GPL-2.0-only vs GPL-3.0 combinations that prevent legal distribution.
- **Weak copyleft boundary warnings** -- advises on LGPL dynamic linking, MPL file-level copyleft, and EPL patent obligations.

### Transitive Dependency Resolution

Resolves the full dependency tree with depth tracking, not just direct dependencies.

- **8 ecosystems** -- Python (pip, Poetry, Conda), JavaScript (npm, Yarn, pnpm), Go, Java (Maven), Kotlin/Groovy (Gradle), Rust (Cargo), Ruby (Bundler), and .NET (NuGet).
- **Depth metadata** -- every component is tagged with its depth in the dependency tree (depth 0 = direct, depth 1+ = transitive).
- **Policy awareness** -- policies can distinguish between direct and transitive dependencies, allowing different rules for each.

### Policy-as-Code

Enforce compliance rules automatically using OPA or built-in policy templates.

- **Built-in regulatory templates** -- `eu-ai-act-compliance`, `nist-ai-rmf`, `permissive`, `strict`, and more.
- **OPA integration** -- write custom policies in Rego for fine-grained control.
- **Policy testing framework** -- validate policies before deploying them to CI.

### SBOM Generation

Produce audit-ready Software Bill of Materials with regulatory extensions.

- **CycloneDX and SPDX** -- industry-standard SBOM formats with full component metadata.
- **GPG signing and validation** -- sign SBOMs for tamper-evident compliance records.
- **Regulatory extensions** -- `lcc:regulatory:*` properties embed EU AI Act risk classification, copyright compliance status, and training data provenance directly in the SBOM.

### Web Dashboard

Explore scan results, manage policies, and track compliance visually.

- **AI Model tab** -- dedicated view for AI models with RAIL restriction panels, license details, and training data summaries.
- **EU AI Act compliance page** -- per-component obligation status, risk badges, and compliance pack export.
- **Dependency depth view** -- depth badges, transitive dependency filtering, and parent component tooltips.

### VS Code Extension

Shift-left compliance -- catch violations before you commit.

- **Scan on save** -- automatically scans manifest files with a 300 ms debounce.
- **Inline diagnostics** -- violations appear as red underlines, warnings as yellow, directly in your editor.
- **Status bar indicator** -- shield icon shows violation count (red) or "Compliant" (green).
- **Commands** -- `LCC: Scan Workspace` and `LCC: Scan Current File` from the Command Palette.
- **Multi-root workspace support** -- scans every root when triggered.

### CI/CD Integration

Block non-compliant code from merging with the LCC GitHub Action.

- **fail-on** -- fail the build on `violations`, `warnings`, or `none`.
- **ecosystems** -- scope scans to specific languages (e.g., `python,node,go`).
- **sbom-format** -- generate CycloneDX or SPDX SBOMs as build artifacts.
- **Policy enforcement** -- apply any built-in or custom policy in CI.

---

## EU AI Act Compliance

LCC is the first open-source tool to automate EU AI Act Article 53 compliance for General Purpose AI (GPAI) models. This is the single biggest regulatory change affecting AI deployments in the EU, and most organisations are not prepared.

### What Article 53 Requires

Providers of GPAI models must comply with five obligations:

| Obligation | Article | What LCC Does |
|---|---|---|
| Technical documentation | Art.53(1)(a) | Generates SBOM with model type, version, license, and metadata |
| Information for downstream providers | Art.53(1)(b) | Extracts model card descriptions, capabilities, and limitations |
| Copyright policy | Art.53(1)(c) | Identifies training data licenses, flags unverified copyright |
| Training data summary | Art.53(1)(d) | Extracts datasets, data sources, and descriptions from model cards |
| Systemic risk obligations | Art.53(2) | Detects large models (65B+ params), checks eval metrics and safety docs |

### Compliance Pack Export

Generate a complete compliance bundle for auditors with a single command:

```bash
lcc scan /path/to/project --policy eu-ai-act-compliance
```

The compliance pack includes:
- `eu_ai_act_report.json` -- structured regulatory assessment with per-obligation status
- `eu_ai_act_report.html` -- branded HTML report with executive summary and component cards
- `training_data_summary.json` -- extracted training data information for all AI models
- `copyright_policy_template.md` -- pre-populated copyright policy template for Art.53(1)(c)

### Regulatory Policy Templates

Apply built-in regulatory policies that map directly to framework requirements:

```bash
# EU AI Act compliance
lcc scan . --policy eu-ai-act-compliance

# NIST AI Risk Management Framework
lcc scan . --policy nist-ai-rmf
```

---

## Competitive Comparison

| Feature | LCC v2.0 | FOSSA | Black Duck | Snyk | ORT |
|---|:---:|:---:|:---:|:---:|:---:|
| AI model license detection | Yes | No | No | No | No |
| EU AI Act compliance | Yes | No | No | No | No |
| RAIL restriction parsing | Yes | No | No | No | No |
| License compatibility engine | Yes | Yes | Yes | Partial | Yes |
| Transitive dependency resolution | Yes (8 ecosystems) | Yes | Yes | Yes | Yes |
| SBOM generation (CycloneDX + SPDX) | Yes | Yes | Yes | Partial | Yes |
| Policy-as-code (OPA) | Yes | Partial | No | No | Yes |
| VS Code extension | Yes | No | No | Yes | No |
| GitHub Action | Yes | Yes | Yes | Yes | Yes |
| Open source | Yes (Apache-2.0) | No | No | No | Yes |
| Price | Free | $50K+/yr | $30K+/yr | $25K+/yr | Free |

---

## Supported Ecosystems

| Ecosystem | Manifest Files | Lock Files | Transitive Resolution |
|---|---|---|:---:|
| **Python** | `requirements.txt`, `pyproject.toml`, `setup.py`, `Pipfile`, `environment.yml` | `poetry.lock`, `Pipfile.lock` | Yes |
| **JavaScript** | `package.json` | `package-lock.json`, `yarn.lock`, `pnpm-lock.yaml` | Yes |
| **Go** | `go.mod` | `go.sum`, vendor trees | Yes |
| **Java** | `pom.xml` (Maven) | -- | Yes |
| **Kotlin/Groovy** | `build.gradle`, `build.gradle.kts` | -- | Yes |
| **Rust** | `Cargo.toml` | `Cargo.lock` | Yes |
| **Ruby** | `Gemfile` | `Gemfile.lock` | Yes |
| **.NET** | `*.csproj`, `packages.config`, `*.nuspec` | -- | Yes |
| **HuggingFace** | Model cards, dataset cards | -- | -- |

---

## CLI Reference

```bash
# Basic scan
lcc scan /path/to/project

# Scan with policy enforcement
lcc scan . --policy strict
lcc scan . --policy eu-ai-act-compliance

# Scan with license compatibility checking
lcc scan . --project-license Apache-2.0 --context saas

# Generate reports
lcc report --input scan-report.json --format html --output report.html
lcc report --input scan-report.json --format markdown --output report.md

# SBOM generation and signing
lcc sbom generate --input scan-report.json --format cyclonedx --output sbom.json
lcc sbom sign --input sbom.json --key private.gpg --output sbom.json.sig

# Policy management
lcc policy list
lcc policy show eu-ai-act-compliance
lcc policy test my-policy.rego

# Background job queue
lcc queue submit /path/to/project --policy strict
lcc queue status <job-id>
lcc queue worker
```

---

## GitHub Action

Add license compliance checks to any GitHub workflow:

```yaml
name: License Compliance
on: [push, pull_request]

jobs:
  compliance:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - uses: actions/setup-python@v5
        with:
          python-version: '3.11'

      - uses: apundhir/license-compliance-checker/.github/actions/license-compliance@v1
        with:
          path: '.'
          policy: 'strict'
          fail-on: 'violations'
          ecosystems: 'python,node,go'
          sbom-format: 'cyclonedx'
          format: 'json'
          output: 'lcc-report.json'

      - uses: actions/upload-artifact@v4
        if: always()
        with:
          name: compliance-report
          path: lcc-report.json
```

### Action Inputs

| Input | Default | Description |
|---|---|---|
| `path` | `.` | Project path to scan |
| `policy` | -- | Policy name to apply |
| `fail-on` | `violations` | When to fail: `violations`, `warnings`, `none` |
| `ecosystems` | `all` | Comma-separated ecosystems to scan |
| `sbom-format` | `none` | SBOM output format: `cyclonedx`, `spdx`, `none` |
| `format` | `json` | Report format: `json`, `markdown`, `html`, `csv` |
| `exclude` | -- | Glob patterns to exclude, comma-separated |

---

## VS Code Extension

Install the **License Compliance Checker** extension from the VS Code Marketplace, or search for `lcc` in the Extensions panel. Requires the `lcc` CLI on your `PATH`.

Supported settings:

| Setting | Default | Description |
|---|---|---|
| `lcc.enabled` | `true` | Enable or disable the extension |
| `lcc.scanOnSave` | `true` | Scan manifest files on save |
| `lcc.lccPath` | `"lcc"` | Path to the CLI executable |
| `lcc.policy` | `""` | Policy to apply (e.g., `eu-ai-act-compliance`) |
| `lcc.threshold` | `0.5` | Confidence threshold for violations |

---

## Architecture

```mermaid
graph TD
    User[User / CI Pipeline] -->|CLI / GitHub Action| Core[LCC Core Engine]
    User -->|Web Interface| UI[Dashboard Next.js]
    User -->|VS Code| VSCode[VS Code Extension]

    subgraph Core_Engine [Core Engine]
        Core --> Detectors[8 Ecosystem Detectors]
        Core --> AIDetectors[AI Model / Dataset Detectors]
        Detectors --> TransDeps[Transitive Resolver + Depth Tracking]
        AIDetectors --> ModelCard[Model Card Parser]
        TransDeps --> Resolvers[License Resolvers]
        Resolvers --> Compat[Compatibility Engine]
        Compat --> PolicyEngine[Policy Engine + OPA]
    end

    subgraph Regulatory [Regulatory Compliance]
        PolicyEngine --> EUAIAct[EU AI Act Assessor]
        PolicyEngine --> NIST[NIST AI RMF]
        EUAIAct --> Reporter[Regulatory Reporter]
    end

    subgraph Output [Output]
        Reporter --> SBOM[SBOM CycloneDX / SPDX]
        Reporter --> Reports[JSON / HTML / Markdown / CSV]
        Reporter --> CompPack[Compliance Pack]
    end

    subgraph Services [Backend Services]
        Core -->|Task Queue| Redis[Redis Queue]
        Core -->|Persistence| DB[(PostgreSQL)]
        UI -->|REST API| API[LCC API Service]
    end
```

---

## Documentation

- [Documentation Site](https://apundhir.github.io/license-compliance-checker/) -- full guides, tutorials, and API reference
- [User Guide](docs/guides/user.md)
- [Policy Guide](docs/guides/policies.md)
- [Deployment Guide](docs/deployment/index.md)
- [API Reference](docs/reference/api.md)
- [AI Ethics & Privacy](docs/AI_ETHICS.md)
- [FAQ](docs/reference/faq.md)

---

## Benchmarks

LCC ships with a benchmark framework measuring detection accuracy, scan speed, and AI model license detection across a 50-project corpus spanning all 8 ecosystems. Performance targets: scan time under 10s for 50 dependencies, AI model detection at 95%+ accuracy. See [benchmarks/README.md](benchmarks/README.md) for methodology and results.

```bash
python -m benchmarks.run_benchmarks --all -v
```

---

## Development

```bash
# Create virtual environment
python -m venv venv
source venv/bin/activate

# Install with test dependencies
pip install -e ".[test]"

# Run tests
pytest

# Run the API server
LCC_SECRET_KEY=dev-key lcc server

# Run the dashboard
cd dashboard && npm install && npm run dev

# Build VS Code extension
cd vscode-extension && npm install && npm run compile
```

See [CONTRIBUTING.md](CONTRIBUTING.md) for development guidelines and [SECURITY.md](SECURITY.md) for reporting vulnerabilities.

---

## License

Apache License 2.0 -- see [LICENSE](LICENSE) for details.
