Metadata-Version: 2.4
Name: solorider
Version: 1.0.2
Summary: An extensible framework for the rapid detection and profiling of potential supply-chain attacks of libraries hosted on code repositories.
Author: matonis
License: MIT
Project-URL: Homepage, https://github.com/matonis/solorider
Project-URL: Repository, https://github.com/matonis/solorider
Project-URL: Issues, https://github.com/matonis/solorider/issues
Keywords: supply-chain,security,malware,npm,pypi,aur,static-analysis,package-auditing
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Information Technology
Classifier: Topic :: Security
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Operating System :: OS Independent
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: requests>=2.20
Requires-Dist: yara-x>=1.0
Requires-Dist: packaging>=20.0
Dynamic: license-file

# solorider

(pronounced "Solo Rider")

An extensible framework for the rapid detection and profiling of supply-chain attacks in code repositories.

solorider allows security practitioners to efficiently iterate on the detection and profiling of potential supply-chain attacks of libraries hosted on code repositories. solorider provides facilities to assess packages across npm, GitHub, PyPI and make critical decisions on a package's integrity -- or -- adapt to shifts in actor tradecraft.

The project was authored to address three primary security/intelligence questions:

    1. Is this package potentially compromised?
    2. Are any of this package's dependencies compromised?
    3. Am I compromised?

solorider is a proof-of-concept and comes with no warranties or guarantees.


## Installation

```bash
git clone https://github.com/matonis/solorider.git
cd solorider
pip install .

# Development mode (editable)
pip install -e .

# Verify
solorider --list-plugins
```


## Quick Start

```python
import solorider

d = solorider.SupplyChainDetector()
d.assess_package('cap-js-postgres-2.2.2')
d.print_reports()
```

Output:

    [solorider] 'cap-js-postgres-2.2.2' identified in 1 ecosystem(s): npm
    [solorider] Proceeding to download and analyze...
    [solorider] Analysis complete.

    ============================================================
    [npm] cap-js-postgres-2.2.2-npm-1781719850
    ============================================================

    Report 1 (standard_report): solorider default summary

    [!] - This package had detections that may need review
    [!] - There were multiple detections on this package which may need to be reviewed
    [+] - package/execution.js
           [!] (high) - (yara_detector|general_LargeFile): File is anomalously large
           [!] (high) - (yara_detector|general_ObfuscatorPattern): Determines if uses obfuscation
           [!] (high) - (yara_detector|session_CookieTheft_AWS): Matches AWS credential/cookie theft patterns
           [!] (high) - (yara_detector|session_CookieTheft_ExfilIndicator): Matches data-exfiltration indicators
           [!] (medium) - (anomalous_size|anomalous_file_size): File size is anomalous for this package
    [+] - package/setup.mjs
           [!] (high) - (yara_detector|all_bunArtifacts): Matches Bun runtime artifacts
           [!] (medium) - (package_installer_detector|install_script): Declares an install lifecycle script
    [+] - package/package.json
           [!] (medium) - (package_installer_detector|scripts.preinstall): Declares a preinstall lifecycle script


## Usage

solorider has three primary use cases.

### 1. Assess a Package

```python
import solorider

d = solorider.SupplyChainDetector()

# Bare name -- probes all ecosystems (PyPI, npm, AUR, GitHub)
d.assess_package('lodash')

# Pinned PyPI version
d.assess_package('flask==3.0.3')

# Pinned npm version
d.assess_package('express@5.2.1')

# Scoped npm package
d.assess_package('@babel/parser@7.18.4')

# GitHub URL
d.assess_package('https://github.com/pallets/flask.git')

# AUR URL
d.assess_package('https://aur.archlinux.org/yay.git')

# Local compressed file
d.assess_package('/tmp/suspicious_package.tar.gz')
```

### 2. Assess Dependencies

```python
# Python dependencies from requirements.txt
d.assess_dependencies('requirements.txt')

# npm dependencies from package.json
d.assess_dependencies('package.json')
```

### 3. Audit Local Environment

solorider has multiple plugins to assess local code environments. Installed packages are enumerated and its code scanned across existing analyzer plugins.

```python
# Audit installed Python packages
d.audit_local('python')
d.print_reports()
```

The npm audit plugin requires a target directory (a project root or a `node_modules` directory). To determine the project root to pass as the target, run:

```bash
$ npm prefix
/home/user/projects/my-app
```

```python
# Audit installed npm packages under the project root reported by npm prefix
d.audit_local('npm', target='/home/user/projects/my-app')
d.print_reports()
```

### 4. CLI

```bash
solorider lodash
solorider flask==3.0.3
solorider https://github.com/pallets/flask.git
solorider /tmp/suspicious_package.tar.gz
solorider --deps requirements.txt
solorider --deps package.json
solorider --list-plugins
```


## Plugin Options

List the available analyzer plugins, grouped by category in execution
order:

```python
import solorider

d = solorider.SupplyChainDetector()
d.list_plugins()
```

Output:

```
Version Extractors
==================
  [+] npm_version_extractor
      Extract package name and version from npm package.json.

  [+] pypi_version_extractor
      Extract package name and version from PyPI metadata files (wheel METADATA,
      PKG-INFO, pyproject.toml).

  [+] stated_pinned_version_extractor
      Records the caller-supplied 'stated_pinned_version' argument onto the
      report, if it was passed.

Classifiers
===========
  [+] anomalous_size
      Identify files whose size is 20% or more above the average file size across
      the entire package.

  [+] npm_bin_detector
      NPM-derivative packages only. Determine if a 'bin' config is present in
      package.json and mark the script files it exposes as executable commands.

  [+] npm_gyp_detection
      NPM-derivative packages only. Parses binding.gyp to determine whether node-
      gyp will automatically execute a script on install, and flags the package
      files it references.

  [+] package_installer_detector
      NPM-derivative packages only. Determine if lifecycle scripts are present and
      marks executed scripts.

  [+] similar_filenames
      Identify files whose names are suspiciously similar to other files in the
      same package using Levenshtein edit distance.

Static
======
  [+] entropy_analysis
      Computes per-file Shannon entropy and flags high-entropy files that indicate
      obfuscated, packed, or encrypted content that signature rules miss.

  [+] indicator_match
      Matches each file in a package against a provided Python-list of indicators
      via the 'indicators=' parameter

  [+] yara_detector
      Scans each file in a package with a set of YARA rules (additional rules
      contained in a folder can be extended in the package via
      'yara_rule_directory=' parameter)

Deep
====
  [+] check_dependency_for_advisory_npm
      Resolve a package's declared package.json dependencies and flag any that
      appear as malware in the GitHub Advisory Database.

  [+] check_dependency_for_advisory_pypi
      Resolve a package's declared requirements.txt dependencies and flag any that
      appear as malware in the OSV advisory database.

  [+] claude_deobfuscator
      Deep analysis plugin that sends files flagged with obfuscation patterns to
      Claude for behavioral assessment.

Judgements
==========
  [+] advisory_lookup_npm
      Query the GitHub Advisory Database for known malware advisories affecting
      the assessed npm package.

  [+] advisory_lookup_pypi
      Query the OSV advisory database for known malware advisories affecting the
      assessed PyPI package.

  [+] check_npm_version_mismatch
      Compare the extracted package version against the highest version inferred
      from the cached npm registry directory and flag a mismatch as potential
      dist-tag abuse.

  [+] check_pypi_version_mismatch
      Compare the extracted package version against the highest version inferred
      from the cached PyPI registry directory and flag a mismatch as potential
      version redirection.

  [+] pinned_version_blacklist
      Checks the package's extracted and stated pinned-version specifiers against
      a provided Python-list of blacklisted versions via the
      'blacklisted_pinned_versions=' parameter

  [+] standard_judgement
      Judgement plugin which sets baseline thresholds to determine likelihood of a
      package being malicious

Reporting
=========
  [+] file_report
      Adds d.print_file_reports(): lists every file with at least one detection
      for each assessed ecosystem, annotating each path with its high/medium/low
      detection counts. Files with no detections are omitted.

  [+] report_by_file
      Adds d.print_report_by_file(files): given a filename/path or list of them,
      prints the filename, path, hashes, and the detections on each matching file
      grouped by plugin category and ordered by severity.

  [+] report_by_plugin
      Adds d.print_report_by_plugin(plugins): given a plugin name or list of them,
      prints -- per assessed ecosystem -- the unique list of file paths each named
      plugin produced a detection on.

  [+] report_by_severity
      Adds d.print_report_by_severity(severities): given a severity level
      (high/medium/low/other) or list of them, prints -- per assessed ecosystem --
      the unique list of file paths detected at each level.

  [+] simple_report
      Adds d.print_simple_report(): prints a concise per-package summary
      (detection counts by severity, malicious judgement count, and which plugins
      triggered) for every assessed ecosystem.

  [+] standard_report
      Creates a verbose overview of detections identified in a project. Report is
      accessible via print_reports().
```

List the audit plugins used by `audit_local()` to enumerate locally
installed packages:

```python
d.list_audit_plugins()
```

Output:

```
Audit Plugins
=============
  [+] npm
      Audits npm packages installed under a node_modules tree, resolving each to a
      pinned version (name@version) and the on-disk location of its source code.
      Requires a target argument: the path to a node_modules directory or a
      project root containing one (passed to run() or via dispatch('npm',
      target)).

  [+] python
      Audits Python packages installed in the local environment, resolving each to
      a pinned version (name==version) and the on-disk location of its source
      code.
```

Indicator matching:

```python
d.assess_package('lodash', indicators=['edb172e0c2c9...'])
```

Custom YARA rules directory:

```python
d.assess_package('lodash', yara_rule_directory='/home/user/my_yara_rules')
```

Entropy threshold: override the bits/byte value at or above which `entropy_analysis` flags a file as likely obfuscated, packed, or encrypted (default 7.2).

```python
d.assess_package('lodash', entropy_threshold=6.5)
```

Claude deep analysis:

```python
d.assess_package('lodash', CLAUDE_API_KEY='sk-ant-api03-...')
```

GitHub Advisory Database lookups: solorider will evaluate a package's dependencies against known advisories. The npm advisory-lookup plugins query the GitHub Advisory Database for known malware affecting a package and its dependencies; supplying `GITHUB_TOKEN` authenticates those lookups, raising the rate limit from 60 to 5,000 requests per hour for higher-volume scans.

```python
d.assess_package('express@5.2.1', GITHUB_TOKEN='ghp_...')
```

Pinned-version blacklist matching:

```python
d.assess_package('guardrails-ai==0.10.1',
    blacklisted_pinned_versions=['guardrails-ai==0.10.1', 'express@5.2.1'],
)
```

Combined:

```python
d.assess_package('lodash',
    indicators=['edb172e0...'],
    yara_rule_directory='/home/user/my_yara_rules',
    entropy_threshold=6.5,
    blacklisted_pinned_versions=['guardrails-ai==0.10.1'],
    CLAUDE_API_KEY='sk-ant-api03-...',
)
```


## Report Modules

After running an assessment, call any of these report methods on the detector to print results to stdout.

Full per-plugin report (every detection on every file, grouped by ecosystem):

```python
d.assess_package('lodash')
d.print_reports()
```

Concise per-package summary (detection counts by severity plus which plugins triggered):

```python
d.print_simple_report()
```

Per-package list of files that have detections, each annotated with its high/medium/low counts and ordered by most detections first:

```python
d.print_file_reports()
```

Detailed drill-down for one or more specific files (filename, path, hashes, and detections grouped by severity):

```python
# Single file (string)
d.print_report_by_file('package/lodash.js')

# Several files (list of strings)
d.print_report_by_file(['index.js', 'package/at.js'])
```

Per-plugin list of the unique files each named plugin detected (a plugin name string or a list of them):

```python
# Single plugin (string)
d.print_report_by_plugin('yara_detector')

# Several plugins (list of strings)
d.print_report_by_plugin(['yara_detector', 'npm_bin_detector'])
```

Per-severity list of the unique files detected at each level (a level string or a list of them; `high`/`medium`/`low`/`other`):

```python
# Single level (string)
d.print_report_by_severity('high')

# Several levels (list of strings)
d.print_report_by_severity(['high', 'medium'])
```


## Restricting & Selecting Plugins

The analyzer pipeline can be narrowed per assessment. `only_*` arguments act as
an allowlist (applied first); `ignore_*` arguments act as a blocklist
(subtracted after). All four accept a list and can be combined.

Categories: `plugins_version_extractors`, `plugins_classifiers`,
`plugins_static`, `plugins_deep`, `plugins_judgements`, `plugins_reporting`.

Restrict individual plugins (skip by name):

```python
d.assess_package('lodash', ignore_plugins=['claude_deobfuscator'])
```

Restrict plugin categories (skip by category):

```python
d.assess_package('lodash', ignore_plugin_category=['plugins_deep'])
```

Run only individual plugins (allowlist by name):

```python
d.assess_package('lodash', only_plugins=['yara_detector', 'indicator_match'])
```

Run only individual plugin categories (allowlist by category):

```python
d.assess_package('lodash', only_plugin_category=['plugins_static'])
```


## Output Directory

Default output schema:

    cache/
    └── lodash-npm-1781637646/
        ├── downloads/                 #:: Downloaded package artifacts
        │   └── lodash-4.17.21.tgz
        ├── extracted/                 #:: Decompressed package contents
        │   └── package/
        │       ├── index.js
        │       ├── lodash.js
        │       └── ...
        ├── cached_artifacts/          #:: Cached plugin analysis artifacts
        │   ├── cached_artifact_lodash-npm-1781637646_1781720055_claude_deobfuscator.txt
        │   └── extracted_iocs.json
        ├── cached_directory/          #:: Cached registry metadata JSON
        │   └── CACHED_DIRECTORY_lodash-npm-1781637646.dat
        ├── session_report/            #:: Session report JSON output
        │   └── lodash-npm-1781637646.json
        └── other/                     #:: General-purpose scratch space

Custom output location:

```python
# At construction
d = solorider.SupplyChainDetector(output_dir='/home/user/scans')
d.assess_package('lodash')

# At invocation (overrides construction)
d = solorider.SupplyChainDetector()
d.assess_package('lodash', output_dir='/home/user/scans')
d.assess_dependencies('requirements.txt', output_dir='/home/user/deps_scan')
```


## Architecture

solorider uses a plugin model dispatched across several layers. When a package name is received for assessment, it is broadcast across ecosystem plugins to determine presence. After identification, the package is extracted to disk and analyzed across analyzer plugins in categorical order.

    ┌──────────────────┐   ┌──────────────────┐   ┌────────────────────┐   ┌──────────────────┐
    │      Input       │   │     Scoping      │   │     Analyzers      │   │    Reporting      │
    │                  │   │                  │   │                    │   │                  │
    │                  │   │  ╭────────────╮  │   │  ╭──────────────╮  │   │                  │
    │                  │   │  │    npm     │  │   │  │  Classifiers │  │   │                  │
    │                  │   │  ╰─────┬──────╯  │   │  ╰──────┬───────╯  │   │                  │
    │                  │   │        │         │   │         │          │   │                  │
    │                  │   │        ▼         │   │         ▼          │   │                  │
    │                  │   │  ╭────────────╮  │   │  ╭──────────────╮  │   │  ╭────────────╮  │
    │ ╭──────────────╮ │   │  │    pypi    │  │   │  │   Static     │  │   │  │    JSON    │  │
    │ │assess_package│ │   │  ╰─────┬──────╯  │   │  │  detection   │  │   │  ╰─────┬──────╯  │
    │ │ assess_deps  │─┼──►│        │         │──►│  ╰──────┬───────╯  │──►│        │         │
    │ ╰──────────────╯ │   │        ▼         │   │         ▼          │   │        ▼         │
    │                  │   │  ╭────────────╮  │   │  ╭──────────────╮  │   │  ╭────────────╮  │
    │                  │   │  │    aur     │  │   │  │    Deep      │  │   │  │   Python   │  │
    │                  │   │  ╰─────┬──────╯  │   │  │  detection   │  │   │  ╰────────────╯  │
    │                  │   │        │         │   │  ╰──────┬───────╯  │   │                  │
    │                  │   │        ▼         │   │         ▼          │   │                  │
    │                  │   │  ╭────────────╮  │   │  ╭──────────────╮  │   │                  │
    │                  │   │  │   github   │  │   │  │  Judgements  │  │   │                  │
    │                  │   │  ╰────────────╯  │   │  ╰──────────────╯  │   │                  │
    │                  │   │                  │   │                    │   │                  │
    └──────────────────┘   └──────────────────┘   └────────────────────┘   └──────────────────┘


## Accessing Results Programmatically

```python
import solorider

d = solorider.SupplyChainDetector()
d.assess_package('yay')

# Ecosystem results
d.DOWNLOADER.FOUND                    # True
d.DOWNLOADER.PACKAGE_PRESENCE         # {'aur': False, 'npm': True, 'pypi': True, ...}
d.DOWNLOADER.PRESENT_PACKAGES         # ['npm', 'pypi']

# Iterate child assessments
for child in d.ASSESSMENT_OBJECTS:

    child.TMP_MODE                    # 'npm'
    child.SESSION_NAME                # 'yay-npm-1781719850'
    child.HAS_DETECTIONS              # True
    child.REPORT['type']              # 'npm'
    child.REPORT['package_url']       # 'https://registry.npmjs.org/...'
    child.REPORT['metadata']          # [{'key': 'name', 'value': 'yay'}, ...]

    # Files
    for f in child.REPORT['files']:
        f['path']                     # 'package/index.js'
        f['size']                     # 283
        f['md5']                      # 'a1b2c3d4...'
        f['sha1']                     # 'da39a3ee...'
        f['sha256']                   # 'edb172e0...'

    # Detections per file
    for f in child.REPORT['files']:
        f['detections_plugins_classifiers']
        f['detections_plugins_static']
        f['detections_plugins_deep']
        f['plugins_extracted_artifacts']

    # Collect all detections
    all_detections = []
    for f in child.REPORT['files']:
        for key in ('detections_plugins_classifiers',
                     'detections_plugins_static',
                     'detections_plugins_deep'):
            all_detections.extend(f.get(key, []))

    # Filter
    high_severity = [d for d in all_detections if d['severity'] == 'high']
    yara_hits = [d for d in all_detections if d['plugin_name'] == 'yara_detector']

    # Judgements
    is_malicious = any(j['malicious'] for j in child.REPORT['judgements'])

# Print reports
d.print_reports()

# Export as JSON
import json
for child in d.ASSESSMENT_OBJECTS:
    with open(f"{child.SESSION_NAME}.json", "w") as f:
        json.dump(child.REPORT, f, indent=2, default=str)
```


## Designing & Deploying Plugins

Plugin architecture is implemented for both polling code repositories and assessing files within retrieved packages. Plugins can be developed and deployed by dropping files into the appropriate directory.

Dependency plugins expand retrieval capabilities across code repositories. When a package is polled, solorider broadcasts the package name to each plugin to determine ecosystem presence.

Analysis plugins expand detection capabilities. When analysis occurs, a JSON object containing all file metadata is broadcast to each plugin in a specific order.

Order of operations:

    1. Classifiers     -- extend metadata or derive light characteristics
    2. Static          -- determinations based on static features (bytes)
    3. Deep            -- deeper analysis (deobfuscation, LLM, etc)
    4. Judgements       -- render decisions based on security policy
    5. Reporting        -- output to STDOUT, REST API, secondary systems

Plugin functions:

    add_detection()    -- add a detection to a file record
    add_artifact()     -- attach an extracted artifact to a file record
    cache_artifact()   -- export data to the session output directory
    add_judgement()    -- add a malicious/benign judgement
    add_report()       -- add a verbose report

Usage:

```python
self.add_detection(record=record, detection="empty_file", message="File is 0 bytes", severity="low")
self.add_artifact(record=record, artifact={"indicator": "185.231.68.0", "type": "c2_ip"})
self.cache_artifact(data=open(target_path, "rb").read(), file_name="extracted_payload.bin")
self.add_judgement(malicious=True, message="3 files matched known malware indicators")
self.add_report(report_name="threat summary", report="[!] Package contains credential stealer")
```

Example analysis plugin:

```python
from solorider.plugin_base import PluginBase

class MyDetector(PluginBase):
    name = "my_detector"

    def __init__(self):
        self.NAME = "my_detector"
        self.DESCRIPTION = "Detects something suspicious."

    def run(self, context):
        for record in context["report"]["files"]:
            if record.get("size", 0) == 0:
                self.add_detection(
                    record=record,
                    detection="empty_file",
                    message="File is 0 bytes",
                    severity="low",
                )
```

Example ecosystem plugin:

```python
from solorider.lib.downloader_plugin_base import DownloaderPluginBase

class RubyGemsChecker(DownloaderPluginBase):
    PLUGIN_NAME = "rubygems_checker"
    TYPE = "rubygems"
    BASE_REPO_URL = "https://rubygems.org"

    def accepts(self, raw_input):
        ...

    def check(self):
        ...

    def download(self, download_directory):
        ...
```

Drop either into the appropriate plugin directory. Live on the next run.


## Innovation Guidance

Current LLMs do a fantastic job of interpreting solorider's architecture and are a great way to streamline expansion of the framework and complement capabilities of existing AI-based tooling (agents, MCP servers, etc).

Users of solorider are encouraged to upload it to your favorite LLM (such as Claude) and iterate on new signatures and plugins for the framework.

While solorider is effectively a proof-of-concept, its orchestration framework is robust enough to be a starting point for a tailored solution to scaling detection of backdoored repositories.


## Report Schema

The following JSON object is contained as a static property within the SupplyChainDetector class (.REPORT) and is exposed to plugins via context['report'].

```json
{
  "session_name": "lodash-npm-1781637646",
  "package_url": "https://registry.npmjs.org/lodash/-/lodash-4.17.21.tgz",
  "date": "2026-06-17 02:40:32.451289+00:00",
  "date_epoch": 1781637632,
  "metadata": [
    {"key": "name", "value": "lodash"},
    {"key": "version", "value": "4.17.21"},
    {"key": "description", "value": "Lodash modular utilities."}
  ],
  "type": "npm",
  "extracted_package_name": "lodash",
  "extracted_package_version": "4.17.21",
  "extracted_pinned_version_str": "lodash@4.17.21",
  "stated_pinned_version": "",
  "files": [
    {
      "path": "package/dist/lodash.core.js",
      "size": 2148903,
      "md5": "a1b2c3d4e5f6a1b2c3d4e5f6a1b2c3d4",
      "sha1": "da39a3ee5e6b4b0d3255bfef95601890afd80709",
      "sha256": "edb172e0c2c9bc0f89eb4e291526593cf68d6020fc8cf6c8a5392e771f448635",
      "detections_plugins_classifiers": [
        {
          "plugin_name": "anomalous_size",
          "detection": "anomalous_file_size",
          "message": "File is 2,148,903 bytes, 1247% above average",
          "severity": "medium"
        }
      ],
      "detections_plugins_static": [
        {
          "plugin_name": "yara_detector",
          "detection": "general_ObfuscatorPattern",
          "message": "Determines if uses obfuscation",
          "severity": "high"
        },
        {
          "plugin_name": "indicator_match",
          "detection": "matched_sha256",
          "message": "SHA256 matches known indicator: edb172e0...",
          "severity": "high"
        }
      ],
      "detections_plugins_deep": [
        {
          "plugin_name": "claude_deobfuscator",
          "detection": "claude_behavioral_analysis",
          "message": "Harvests GitHub authentication tokens",
          "severity": "high"
        }
      ]
    }
  ],
  "judgements": [
    {
      "judgement_name": "standard_judgement",
      "malicious": true,
      "message": "There were multiple detections on this package which may need to be reviewed"
    }
  ],
  "reports": [
    {
      "plugin_name": "standard_report",
      "report_name": "solorider default summary",
      "report": "[!] - This package had detections that may need review ..."
    }
  ]
}
```


## License

solorider is released under the [MIT License](LICENSE) and is free and open
source. The software is provided "as is", without warranties or guarantees of
any kind, and its authors accept no liability for its use. Use at your own risk.
