Metadata-Version: 2.4
Name: licenseid
Version: 0.1.0
Summary: Get the SPDX License ID from license text
Author-email: Arthit Suriyawongkul <suriyawa@tcd.ie>
License-Expression: Apache-2.0
License-File: LICENSE
Keywords: license,license-matcher,licenseid,matcher,opensource,spdx
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Programming Language :: Python :: 3.14
Requires-Python: >=3.10
Requires-Dist: beautifulsoup4>=4.10.0
Requires-Dist: click>=8.0.0
Requires-Dist: rapidfuzz>=3.0.0
Requires-Dist: requests>=2.28.0
Provides-Extra: dev
Requires-Dist: flake8; extra == 'dev'
Requires-Dist: mypy; extra == 'dev'
Requires-Dist: pylint; extra == 'dev'
Requires-Dist: pytest; extra == 'dev'
Requires-Dist: pytest-cov; extra == 'dev'
Requires-Dist: ruff; extra == 'dev'
Provides-Extra: java
Requires-Dist: jpype1>=1.7.0; extra == 'java'
Description-Content-Type: text/markdown

# LicenseID

A portable SPDX License ID matcher.

`licenseid` takes license text as input and identifies the closest matched SPDX License ID using a hybrid search strategy (SQLite FTS5 trigram + RapidFuzz ranking + optional Java validation).

## Features

- **Hybrid strategy**:
  - **Tier 1**: Broad recall using SQLite FTS5 with trigram tokenization.
  - **Tier 2**: Precision ranking using RapidFuzz (token set ratio) + Popularity weighting.
  - **Tier 3**: Optional final validation via `tools-java` if available.
- **Unix philosophy**: Parseable CLI output.

## Installation

Install with `pipx`:

```bash
pipx install licenseid
```

Or using `uv`:

```bash
uv tool install licenseid
```

## Usage

### 1. Update the license database

Before matching, you need to build the local license index:

```bash
licenseid update
```

### 2. Match a license

Match text from a file:

```bash
licenseid match LICENSE.txt
```

Match with Java validation enabled:

```bash
licenseid match LICENSE.txt --java
```

Match with popularity tie-breaker enabled:

```bash
licenseid match LICENSE.txt --pop
```

The tie-breaker is triggered only when candidate similarity scores
differ by less than 0.02%.

### 3. Output formats

Default (Unix-friendly):

```text
LICENSE_ID=Apache-2.0 SCORE=0.9850
```

JSON:

```bash
licenseid match LICENSE.txt --json
```

## Configuration

- `SPDX_TOOLS_JAR`: Path to the `tools-java` jar for Tier 3 validation.

## License

Apache-2.0
