Metadata-Version: 2.4
Name: ptrepo
Version: 0.0.3
Summary: Exposed repository metadata testing tool
Home-page: https://www.penterep.com/
Author: Penterep
Author-email: info@penterep.com
License: GPLv3
Project-URL: homepage, https://www.penterep.com/
Project-URL: repository, https://github.com/penterep/ptrepo
Project-URL: tracker, https://github.com/penterep/ptrepo/issues
Project-URL: changelog, https://github.com/penterep/ptrepo/blob/main/CHANGELOG.md
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: Implementation :: CPython
Classifier: Environment :: Console
Classifier: Topic :: Security
Classifier: License :: OSI Approved :: GNU General Public License v3 (GPLv3)
Requires-Python: >=3.12
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: ptlibs<2,>=1.0.33
Requires-Dist: requests<3,>=2.31
Dynamic: author
Dynamic: author-email
Dynamic: classifier
Dynamic: description
Dynamic: description-content-type
Dynamic: home-page
Dynamic: license
Dynamic: license-file
Dynamic: project-url
Dynamic: requires-dist
Dynamic: requires-python
Dynamic: summary

[![penterepTools](https://www.penterep.com/external/penterepToolsLogo.png)](https://www.penterep.com/)

## PTREPO - Exposed repository testing tool

ptrepo is a Penterep tool for testing exposed source-code repositories on web servers.
The planned scope covers repository discovery, best-effort repository metadata/content download, commit/revision listing where practical, and native secret scanning of recovered content.

## Current MVP status

Current version supports discovery, Git/SVN best-effort content download, Mercurial/Bazaar metadata download, reachable or dangling Git commit listing, observed SVN revision reporting, observed Mercurial/Bazaar count reporting from metadata where possible, and native secret scanning of recovered Git/SVN content, Git history, and recovered Mercurial/Bazaar metadata.
Terminal output shows concise history/count summaries; JSON output keeps the detailed recovery source in `historyCoverage`.

Extended SCM types such as Darcs, Fossil, CVSweb, and RCS are discovery-level in this MVP. If `--download`, `--commits`, or `--secrets` is requested for a confirmed unsupported SCM type, ptrepo reports an explicit unsupported warning instead of printing a false commit count or ambiguous empty result.

Implemented:

- URL normalization
- `.git`, `.svn`, `_svn`, `.bzr`, `.hg`, `cgi-bin/cvsweb.cgi`, `_darcs`, `_fossil_`, Fossil checkout markers, and RCS candidate generation
- HTTP probing
- discovery classification
- Git recovery for metadata, refs, reflogs, loose objects, pack files, files recoverable from `.git/index`, and files exportable from the reconstructed Git object database
- SVN recovery for `entries`, `text-base`, `wc.db`, `pristine`, and recovered file contents
- Mercurial and Bazaar/Breezy metadata recovery for known metadata paths
- Git validation and history reporting through a defensive low-level `git` backend
- SVN observed revision reporting from recovered `entries` and `wc.db` metadata
- Mercurial changeset count reporting from `.hg/store/00changelog.i` when the recovered revlog index is usable
- Bazaar/Breezy revision count reporting from `.bzr/branch/last-revision` when available
- native Git commit patch/message, recovered Git/SVN file, and recovered Mercurial/Bazaar metadata secret scanning with built-in rules, redaction, fingerprints, and coverage reporting
- human and JSON output

Git download currently saves:

- `.git/HEAD`
- `.git/config`
- `.git/index`
- `.git/packed-refs`
- `.git/info/refs`
- `.git/objects/info/packs`
- `.git/logs/HEAD` and discovered/common ref logs
- branch/tag ref files where discovered
- loose objects discovered from refs, reflogs, commits, trees, and `.git/index`
- pack files listed in `.git/objects/info/packs`
- locally reconstructed pack indexes where a `.pack` file is recovered but the matching `.idx` file is unavailable
- recovered blob contents under `git/files/`
- files exported from reachable or dangling commit trees when the local Git object database is usable

SVN download currently saves:

- `.svn/entries` or `_svn/entries`
- `.svn/wc.db` or `_svn/wc.db`
- old working-copy `text-base` files where discoverable
- recursive old working-copy `entries`/`text-base` files where subdirectory metadata is exposed
- new working-copy `pristine` files where discoverable from `wc.db`
- recovered file contents under `svn/files/`

Mercurial/Bazaar download currently saves selected metadata only. It does not reconstruct full working trees or history:

- `.hg/requires`
- `.hg/hgrc`
- `.hg/dirstate`
- `.hg/store/00changelog.i`
- `.hg/store/00manifest.i`
- `.bzr/branch-format`
- `.bzr/branch/format`
- `.bzr/branch/last-revision`
- `.bzr/branch/branch.conf`
- `.bzr/repository/format`
- `.bzr/repository/pack-names`
- `.bzr/checkout/format`
- `.bzr/checkout/dirstate`

When `--download` is used, ptrepo also reports available Git commit counts, observed SVN revision counts, and observed Mercurial/Bazaar counts without printing detailed history entries. Use `--commits` for bounded detailed history output.

These planned options are accepted by the CLI contract but intentionally fail in the current MVP slice:
- `-r/--redirects`
- `-C/--cache`

## Installation

```
pip install ptrepo
```

## Adding to PATH

If you're unable to invoke the script from your terminal, it's likely because it's not included in your PATH. You can resolve this issue by executing the following commands, depending on the shell you're using:

For Bash Users

```bash
echo "export PATH=\"`python3 -m site --user-base`/bin:\$PATH\"" >> ~/.bashrc
source ~/.bashrc
```

For ZSH Users

```bash
echo "export PATH=\"`python3 -m site --user-base`/bin:\$PATH\"" >> ~/.zshrc
source ~/.zshrc
```

## Usage examples

```
ptrepo -u https://www.example.com/
ptrepo -u https://www.example.com/plugins/mpdf
ptrepo -u https://www.example.com/ -t git svn bzr hg cvs darcs fossil rcs
ptrepo -f urls.txt -w repository_paths.txt
ptrepo -u https://www.example.com/ --download
ptrepo -u https://www.example.com/ --download ~/Download/repo
ptrepo -u https://www.example.com/ --commits
ptrepo -u https://www.example.com/ --commits --commit-limit 20
ptrepo -u https://www.example.com/ --download --commits
ptrepo -u https://www.example.com/ --secrets
ptrepo -u https://www.example.com/ --max-response-bytes 32768 -j
```

## Options

```
   -u   --url           <url>           Test specified URL
   -f   --file          <file>          Load URLs from file
   -w   --wordlist      <file>          Load additional supported repository path candidates from file
   -t   --repo-type     <type>          Repository type(s) to test: git, svn, bzr, hg, cvs, darcs, fossil, rcs
        --download      [directory]     Download recoverable repository content/metadata; defaults to current directory
        --commits                       Temporarily recover metadata and list Git commits or observed SVN/Hg/Bzr counts
        --commit-limit  <count>         Maximum commit/revision entries to print and Git commits to scan; 0 disables both
        --secrets                       Temporarily recover supported repository content/metadata and scan for secrets
        --secrets-rules <file>          Load additional JSON secret rules
        --secrets-baseline <file>       Ignore previously reported secret finding fingerprints
        --secrets-mode  <mode>          Secret scan mode: auto, files, or history
        --entropy                       Enable entropy checks for generic secret rules
        --no-entropy                    Disable entropy checks for generic secret rules
        --allowlist     <file>          Load JSON secret allowlist
        --max-secret-file-size <bytes>  Maximum recovered file size to scan for secrets
   -H   --headers       <header:value>  Set custom header(s)
   -T   --timeout       <timeout>       Set timeout
        --max-response-bytes <bytes>    Maximum bytes to read from each discovery response
        --max-download-bytes <bytes>    Maximum bytes to write for each downloaded file
   -a   --user-agent    <user-agent>    Set User-Agent header
   -c   --cookie        <cookie=value>  Set cookie(s)
   -p   --proxy         <proxy>         Set proxy (e.g. http://127.0.0.1:8080)
   -v   --version                       Show script version and exit
   -h   --help                          Show this help message and exit
   -j   --json                          Output JSON only, suppresses banner and human output
```

## Planned options

These options are accepted by the CLI contract but intentionally fail in the current MVP slice.

```
   -r   --redirects                     Planned, not implemented in current MVP slice
   -C   --cache                         Planned, not implemented in current MVP slice
```

## Secret rule files

`--secrets-rules` loads additional JSON rules. The file may contain either a list of rules or an object with a `rules` list.
Custom rules must include at least one keyword so the scanner can skip regex evaluation on unrelated lines.
Custom rule regexes are length-limited, `secret_group` must reference an existing capture group, and `entropy_threshold` must be between `0.0` and `8.0`.

```json
{
  "rules": [
    {
      "id": "custom-demo-token",
      "name": "Custom demo token",
      "description": "Project-specific token",
      "regex": "(DEMO_[A-Z0-9]{12})",
      "secret_group": 1,
      "keywords": ["DEMO_"],
      "severity": "high",
      "confidence": "medium",
      "allowlist": {
        "patterns": ["DEMO_PUBLIC_FIXTURE"],
        "regexes": ["^DEMO_TEST_[A-Z0-9]+$"]
      }
    }
  ]
}
```

`--allowlist` loads JSON allowlists:

```json
{
  "patterns": ["known-fixture-value"],
  "regexes": ["^example_[A-Za-z0-9]+$"]
}
```

Allowlist regexes are length-limited before compilation.

`--secrets-baseline` loads previously reported fingerprints and suppresses
matching findings. It accepts either a JSON list of fingerprint strings, an
object with a `fingerprints` list, or a previous PTREPO-style JSON report that
contains nested `fingerprint` fields. Suppressed findings are counted as
ignored baseline findings in human and JSON output.

When Git history contains more commits than `--commit-limit`, history-aware
secret scanning reports partial coverage instead of implying that the whole
history was scanned. Setting `--commit-limit 0` disables detailed history
listing and Git commit secret scanning; file-mode secret scanning can still run.
Mercurial and Bazaar/Breezy secret scanning is metadata-only in this MVP; it
does not reconstruct or scan each historical changeset/revision.

Built-in secret rules cover common provider and generic credential patterns,
including private key markers, AWS `AKIA`/`ASIA` access key IDs, GitHub tokens,
GitLab access/build/deploy/runner/OAuth token prefixes, Slack tokens and
incoming webhooks, Stripe secret/restricted/webhook keys, Google API keys and
OAuth client secrets, Google service-account JSON, database URLs with
credentials, URLs with embedded credentials, JWT-like tokens, generic
password/token/API key assignments, and conservative base64/hex decoded
credential assignments. Git history scanning checks added and deleted patch
lines plus commit message text. Recovered-file scanning skips oversized files
and files that look binary based on NUL bytes or a high ratio of binary control
bytes.

## Dependencies

```
ptlibs>=1.0.33,<2
requests>=2.31,<3
```

## License

Copyright (c) 2026 Penterep Security s.r.o.

ptrepo is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

ptrepo is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with ptrepo. If not, see https://www.gnu.org/licenses/.

## Warning

You are only allowed to run the tool against the websites which
you have been given permission to pentest. We do not accept any
responsibility for any damage/harm that this application causes to your
computer, or your network. Penterep is not responsible for any illegal
or malicious use of this code. Be Ethical!
