Metadata-Version: 2.4
Name: rlo-detector
Version: 1.0.0
Summary: Detect RLO/bidi-control Unicode character abuse in filenames
Author-email: Your Name <your.email@example.com>
License: MIT
Project-URL: Homepage, https://github.com/yourusername/rlo-detector
Project-URL: Documentation, https://github.com/yourusername/rlo-detector#readme
Project-URL: Repository, https://github.com/yourusername/rlo-detector
Project-URL: Issues, https://github.com/yourusername/rlo-detector/issues
Keywords: security,unicode,bidi,rlo,spoofing,filename
Classifier: Development Status :: 5 - Production/Stable
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: System Administrators
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Security
Classifier: Topic :: System :: Filesystems
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE
Provides-Extra: dev
Requires-Dist: pytest>=7.0; extra == "dev"
Requires-Dist: pytest-cov>=4.0; extra == "dev"
Dynamic: license-file

---
AIGC:
    ContentProducer: Minimax Agent AI
    ContentPropagator: Minimax Agent AI
    Label: AIGC
    ProduceID: "00000000000000000000000000000000"
    PropagateID: "00000000000000000000000000000000"
    ReservedCode1: 3046022100d8fb03d49d8e140463d02bdb2868a07042019b5763856eab2ec45d3a938aaadf022100dae825f35c40395a84c126ece91eeb231eccf541aa37e69835147b20a35e942c
    ReservedCode2: 30440220794e4825772a93181d23db0dd5c7e399586ba254a0d10f7b8c8efbdac244802c02204ee574050f659f1c9a558ce1494a4b1cb3dd390337b77f45e18d17071796e37a
---

# rlo-detector

Detect RLO/bidi-control Unicode character abuse in filenames.

## Overview

`rlo-detector` is a security tool that detects filenames containing Unicode bidirectional control characters. These characters can be used to create deceptive filenames that appear legitimate but contain malicious extensions or content.

For example, a file named `invoice\u202Efdp.exe` may appear as `invoice.pdf` in some user interfaces, tricking users into thinking it's a safe document.

## Installation

```bash
pip install rlo-detector
```

Or install from source:

```bash
pip install .
```

## Usage

### Command Line

```bash
# Scan current directory
rlo-detect

# Scan specific directory
rlo-detect /path/to/scan

# Recursive scan
rlo-detect -r /path/to/scan

# Follow symlinks
rlo-detect -r --follow-symlinks /path/to/scan

# Exclude patterns
rlo-detect -r --exclude '*/.git/*' --exclude '*/node_modules/*' /path/to/scan

# JSON output
rlo-detect --json /path/to/scan

# Exit with code 1 if suspicious files found
rlo-detect --fail-on-detect /path/to/scan
```

### Python API

```python
from pathlib import Path
from rlo_detector import analyze_path, Finding

# Analyze a single path
finding = analyze_path(Path("invoice\u202Efdp.exe"))
if finding:
    print(f"Warning: {finding.reason}")
    print(f"Severity: {finding.severity}")
    print(f"Real extension: {finding.real_extension}")
    print(f"Apparent extension: {finding.apparent_extension}")

# Iterate over paths
from rlo_detector import iter_paths

for path in iter_paths(["/path/to/scan"], recursive=True, follow_symlinks=False, exclude=[]):
    finding = analyze_path(path)
    if finding:
        print(f"Found: {path}")
```

## Exit Codes

- `0`: No suspicious paths found
- `1`: Suspicious paths found (only when `--fail-on-detect` is set)
- `2`: Runtime error

## Detected Characters

The tool detects the following Unicode bidirectional control characters:

| Character | Name | Description |
|-----------|------|-------------|
| `\u202A` | LRE | Left-to-Right Embedding |
| `\u202B` | RLE | Right-to-Left Embedding |
| `\u202C` | PDF | Pop Directional Formatting |
| `\u202D` | LRO | Left-to-Right Override |
| `\u202E` | RLO | Right-to-Left Override |
| `\u2066` | LRI | Left-to-Right Isolate |
| `\u2067` | RLI | Right-to-Left Isolate |
| `\u2068` | FSI | First Strong Isolate |
| `\u2069` | PDI | Pop Directional Isolate |
| `\u200E` | LRM | Left-to-Right Mark |
| `\u200F` | RLM | Right-to-Left Mark |

## Severity Levels

- **HIGH**: The file's apparent extension differs from its real extension (definite spoofing)
- **MEDIUM**: The file contains bidi control characters (potential spoofing attempt)

## License

MIT License - see LICENSE file for details.
