Metadata-Version: 2.4
Name: mule-discovery
Version: 1.3.0
Summary: Scan Mule applications for migration complexity assessment
Project-URL: Homepage, https://github.com/KongHQ-CX/mule-discovery
Author: Stephen Brown
License-Expression: MIT
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Requires-Python: >=3.10
Requires-Dist: pyyaml>=6.0
Provides-Extra: anypoint
Requires-Dist: anypoint-sdk>=0.2.0; extra == 'anypoint'
Provides-Extra: dev
Requires-Dist: pytest-cov>=4.0; extra == 'dev'
Requires-Dist: pytest>=7.0; extra == 'dev'
Description-Content-Type: text/markdown

# mule-discovery

Scan Mule applications for migration complexity assessment.

Parses Mule 4 (and 3) XML source files, POM dependencies, DataWeave scripts, and API specifications to produce a structured migration readiness report with complexity scoring.

## Estate Analysis

The output produced by `mule-discover` (JSON or YAML) can be fed into the [estate-analyzer](https://github.com/KongHQ-CX/kong-ps-agent-skills/tree/main/mule-analysis/estate-analyzer) agent skill to generate pre-sales migration reports. The estate-analyzer processes discovery output across your entire Mule application estate to produce complexity summaries, connector frequency analysis, PoC candidate recommendations, and migration sizing reports.

## Quick Start (uv)

No install required — just run from the project directory:

```bash
cd mule-discovery

# Discover all Mule apps under a directory
uv run mule-discover /path/to/apps --output-dir ./inventory

# JSON output instead of YAML
uv run mule-discover /path/to/apps --json --output-dir ./inventory
```

`uv run` reads `pyproject.toml`, resolves dependencies into an ephemeral environment, and runs the command. Nothing is installed globally.

## Installation

### From PyPI with `uv` (recommended)

Install as a global CLI tool — `uv` puts the entry points on your `PATH`:

```bash
uv tool install mule-discovery

# Then run directly from anywhere
mule-discover /path/to/apps --output-dir ./inventory
```

Upgrade or remove later with:

```bash
uv tool upgrade mule-discovery
uv tool uninstall mule-discovery
```

For Anypoint Platform integration (policy scanning), install with the extra:

```bash
uv tool install "mule-discovery[anypoint]"
```

### From PyPI with `pip`

```bash
pip install mule-discovery

# Then run directly
mule-discover /path/to/apps --output-dir ./inventory
```

For Anypoint Platform integration (policy scanning):

```bash
pip install mule-discovery[anypoint]
```

### From source (development)

```bash
uv sync --extra dev
```

Requires Python 3.10+.

## CLI Tools

### `mule-discover`

Recursively find all Mule applications under a directory and produce migration complexity reports for each.

```bash
# Discover all apps, write YAML inventories (default) to ./inventory
uv run mule-discover /path/to/apps --output-dir ./inventory

# JSON output
uv run mule-discover /path/to/apps --json --output-dir ./inventory

# Suppress progress output
uv run mule-discover /path/to/apps -o ./inventory -q

# Custom complexity thresholds
uv run mule-discover /path/to/apps --flow-low 8 --flow-medium 18 --flow-high 30

# Resolve parent POMs and external flow-refs against a customer's shared frameworks
uv run mule-discover /path/to/apps --common-libs /path/to/common_libs_home -o ./inventory
```

#### `--common-libs`

Customers often ship their Mule apps alongside a separate "common libs" / shared frameworks directory containing parent POMs (which the apps inherit dependencies from) and shared Mule projects defining flows that the apps reference via `flow-ref`. Without this directory, the discovery tool only sees what's local to each app and reports an empty connector list and unresolved flow-refs.

When you pass `--common-libs PATH`, the tool will:

- **Resolve parent POM chains.** Each app's `<parent>` coordinates are looked up in the directory; if found, the parent's `<dependencies>` are merged into the app's connector inventory (with `source_file` indicating which parent contributed each connector). Multi-level chains are walked.
- **Resolve external flow-refs.** Every `flow-ref` whose target is not defined inside the app is checked against an index of `<flow>` and `<sub-flow>` definitions found anywhere under the common-libs directory. Resolved refs include the library name, source file, and line number; unresolved ones are reported separately so they can be flagged for follow-up.
- **Pull in only what's referenced.** The common-libs directory may be large; the tool does not blanket-include everything from it. Only parent POMs the apps actually inherit from and flows the apps actually reference are surfaced.

Expected directory layout (one Mule project per top-level subdirectory):

```
common_libs_home/
├── hvcp-mule4-common-framework-handler/
│   ├── pom.xml
│   └── src/main/mule/*.xml
├── hvcp-mule4-common-messaging-framework/
│   └── ...
└── shared-parent-poms/
    └── pom.xml
```

The discovery output gains two new fields under the per-app inventory:

- `resolved_parent_poms`: list of parent POMs in the inheritance chain, each with `groupId`, `artifactId`, `version`, `resolved` (boolean), `source_file`, and the `dependencies` declared there.
- `external_flow_refs`: `{"resolved": [...], "unresolved": [...]}`. Each resolved ref shows the library and source file/line of the matching definition.

#### Connector discovery without a parent POM

When an app's `pom.xml` declares no dependencies (typical when everything is inherited from a parent POM that isn't on disk), the tool falls back to deriving connectors from the `xmlns:` declarations in the Mule XML files themselves. These entries are tagged with `notes: "derived from XML namespaces - parent POM not resolved"` and `source_file: "(xml-namespaces)"` so the source of each connector remains traceable. Pass `--common-libs` to replace this fallback with proper parent-POM resolution where possible.

Each per-app report includes:
- Flow inventory with complexity levels (LOW / MEDIUM / HIGH / VERY_HIGH)
- DataWeave transformation analysis and classification
- HTTP listener and scheduled job detection
- Connector inventory with migration weights
- API specification detection (OpenAPI, WSDL)
- External dependency and out-of-scope item tracking
- AWS service usage (SQS, S3, DynamoDB)
- SOAP/WSDL service detection
- HTTP request-config inventory and connector authentication metadata (`request_configs`, `connector_auth`)
- Overall migration score (0–100) with recommendation (SIMPLE / MODERATE / COMPLEX / VERY_COMPLEX)

### `mule-scan-policies`

Scan Anypoint Platform for API policies on deployed applications. Requires the `anypoint` extra.

```bash
pip install -e ".[anypoint]"

export ANYPOINT_CLIENT_ID=...
export ANYPOINT_CLIENT_SECRET=...
export ANYPOINT_ORG_ID=...
export ANYPOINT_ENV_ID=...

uv run mule-scan-policies
uv run mule-scan-policies --format json
```

### `mule-download-policies`

Download custom policies from Anypoint Exchange. Requires the `anypoint` extra.

```bash
export ANYPOINT_CLIENT_ID=...
export ANYPOINT_CLIENT_SECRET=...
export ANYPOINT_ORG_ID=...

uv run mule-download-policies --output-dir ./custom_policies
```

## Complexity Scoring

Each application receives a migration score from 0 to 100 (higher = simpler migration):

| Score Range | Recommendation | Meaning |
|---|---|---|
| 75–100 | SMALL | Straightforward migration |
| 50–74 | MEDIUM | Some complexity, manageable |
| 25–49 | LARGE | Significant effort required |
| 0–24 | XLARGE | Major rework needed |

Deductions are applied across eight dimensions:

| Dimension | Max Deduction |
|---|---|
| Flow complexity | 30 pts |
| Transform complexity | 15 pts |
| Risk / out-of-scope items | 20 pts |
| Connector migration weight | 20 pts |
| WSDL / SOAP services | 10 pts |
| Scale (flow + component count) | 20 pts |
| Pattern complexity (scatter-gather, choices, batch, parallel-foreach, retries) | 15 pts |
| DataWeave volume | 15 pts |

### Flow Complexity Thresholds

Flows are classified by component count (configurable via CLI flags):

| Components | Complexity |
|---|---|
| ≤ 6 | LOW |
| 7–14 | MEDIUM |
| 15–25 | HIGH |
| > 25 | VERY_HIGH |

### DataWeave Classification

DataWeave transformations are classified by line count and function usage:

| Classification | Criteria |
|---|---|
| simple_mapping | ≤ 5 lines, no complex functions |
| field_level_logic | 6–20 lines, or uses routine functions (map, filter, pluck, etc.) |
| business_logic | > 20 lines, or uses complex functions (reduce, groupBy, flatMap, etc.) |

## Package Structure

```
src/mule_discovery/
├── __init__.py                # Main discover_mule_app() orchestrator
├── constants.py               # XML namespaces, element classifications, connector weights
├── xml_helpers.py             # XML utility functions
├── models/                    # Data models (dataclasses)
│   ├── result.py              # DiscoveryResult (top-level container)
│   ├── flows.py               # FlowInfo, BatchInfo, ChoiceInfo, ScatterGatherInfo, ...
│   ├── connectors.py          # ConnectorInfo, SpringDependency
│   ├── dataweave.py           # DataWeaveInfo
│   ├── listeners.py           # HttpListenerInfo, ScheduledJobInfo
│   ├── dependencies.py        # ExternalDependencyInfo, SourceFiles, OutOfScopeItem
│   ├── schemas.py             # ApiSpecInfo (OpenAPI, WSDL)
│   └── scoring.py             # ComplexityThresholds, ScoreResult
├── parsers/                   # File IO → models
│   ├── file_discovery.py      # find_mule_apps(), find_mule_xml_files()
│   ├── mule_xml.py            # Mule XML parsing (flows, listeners, jobs)
│   ├── pom.py                 # POM parsing (app name, version, connectors, parent-POM chain resolution)
│   ├── common_libs.py         # Indexing of customer-supplied common-libs (flows + parent POMs)
│   ├── http_auth.py           # HTTP auth config extraction
│   ├── dataweave.py           # DataWeave script parsing
│   ├── soap.py                # SOAP/WSDL service detection
│   ├── aws.py                 # AWS service detection (SQS, S3, DynamoDB)
│   ├── openapi.py             # OpenAPI spec detection
│   └── wsdl.py                # WSDL parsing utilities
├── analysis/                  # Models → models (pure functions)
│   ├── classification.py      # Flow type and source category constants
│   ├── complexity.py          # Flow and DataWeave complexity assignment
│   ├── patterns.py            # Pattern detection (async, scatter-gather, choice, ...)
│   ├── scoring.py             # Migration score calculation (0–100)
│   └── dependencies.py        # External dependency and out-of-scope extraction
├── output/                    # Models → formatted strings
│   ├── yaml_output.py         # YAML
│   ├── json_output.py         # JSON
│   └── text_output.py         # Human-readable text summary
├── anypoint/                  # Anypoint Platform integration (optional)
│   ├── policies.py            # Policy scanning
│   └── exchange.py            # Custom policy download
└── cli/                       # CLI entry points (thin wrappers)
    ├── discover.py            # mule-discover
    ├── scan_policies.py       # mule-scan-policies
    └── download_policies.py   # mule-download-policies
```

### Design Principles

- **No function does both IO and computation.** Parsers read files → return models. Analysis takes models → returns models. Output takes models → returns strings.
- **All data models are plain dataclasses** with typed fields — no methods with side effects.
- **All analysis functions are standalone** — no class methods, no inheritance.
- **Each output format is a separate module.**

## Testing

```bash
make test
```

Or directly:

```bash
uv run --extra dev python -m pytest
```

Coverage is enforced at 70% (branch coverage) via `pyproject.toml`.
