Metadata-Version: 2.4
Name: vojtamaur
Version: 0.1.4
Summary: CLI access to ALL_POSTS.txt, ARCHIVE.txt and documentation from vojtamaur.cz
Author: Vojta Maur
License: MIT
Project-URL: Homepage, https://vojtamaur.cz/
Project-URL: Documentation, https://vojtamaur.cz/documentation/
Project-URL: Archive, https://vojtamaur.cz/ARCHIVE.txt
Project-URL: Source, https://github.com/VojtaMaur/vojtamaur-python
Keywords: archive,cli,plaintext,static-site,vojtamaur
Classifier: Development Status :: 3 - Alpha
Classifier: Environment :: Console
Classifier: Intended Audience :: End Users/Desktop
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Internet :: WWW/HTTP
Classifier: Topic :: Utilities
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Dynamic: license-file

# vojtamaur

A small `pip install` command-line tool for working with selected public artifacts of [vojtamaur.cz](https://vojtamaur.cz/):

- `ALL_POSTS.txt`
- `ARCHIVE.txt`
- the documentation page at `/documentation/`
- `kurt-godel-rat.jpg`

The tool is not a website parser, crawler, CMS, sync daemon, image processor, or background service. It works mainly with published plaintext/HTML/image endpoints, keeps a local cache of the last successfully loaded artifacts, and includes embedded package snapshots of `ALL_POSTS.txt`, `ARCHIVE.txt`, and `kurt-godel-rat.jpg` as final fallbacks.

## Installation

From PyPI:

```bash
python -m pip install vojtamaur
```

From a local repository checkout:

```bash
python -m pip install .
```

For development:

```bash
python -m pip install -e .
```

From GitHub:

```bash
python -m pip install git+https://github.com/VojtaMaur/vojtamaur-python.git
```

From a specific Git tag:

```bash
python -m pip install git+https://github.com/VojtaMaur/vojtamaur-python.git@v0.1.4
```

## Quick usage

```bash
vojtamaur --help
vojtamaur posts
vojtamaur posts --save
vojtamaur archive
vojtamaur archive --save
vojtamaur docs
vojtamaur docs --save
vojtamaur rat
vojtamaur rat --save
vojtamaur rat --offline
vojtamaur grep metaprogram
vojtamaur search-url archive.today
vojtamaur stats
vojtamaur head 40
vojtamaur random
vojtamaur random --print-only
vojtamaur status
vojtamaur status --limit 10
vojtamaur verify
vojtamaur open site
vojtamaur open docs
vojtamaur open rat
vojtamaur open rat --offline
vojtamaur open archive-link 1
```

## Artifacts

The CLI currently knows these artifact kinds:

| Command / kind | Online path | Embedded in package | Notes |
| --- | --- | --- | --- |
| `posts` | `/ALL_POSTS.txt` | yes | Plain-text export of all posts. |
| `archive` | `/ARCHIVE.txt` | yes | Archive map and external preservation links. |
| `docs` | `/documentation/` | no | Documentation HTML; cached after a successful fetch. |
| `rat` | `/images/kurt-godel-rat.jpg` | yes | Binary image artifact. Saved as a file, not printed to the terminal. |

## Source strategy

For each artifact, the tool tries to use the most current available source first.

For `posts`, `archive`, and `rat`, the fallback order is:

1. primary URL from `SITE_SOURCES`
2. additional live fallback URLs from `SITE_SOURCES`, in order
3. local cache
4. embedded package snapshot

For `docs`, the fallback order is:

1. primary URL from `SITE_SOURCES`
2. additional live fallback URLs from `SITE_SOURCES`, in order
3. local cache

`docs` is not embedded in the package. The embedded snapshots are intended for compact, directly reusable artifacts, not for every page of the website.

If all available sources fail, the command exits with an error.

### Adding another live fallback

Live deployment fallbacks are configured in `src/vojtamaur/constants.py` through `SITE_SOURCES`:

```python
SITE_SOURCES: list[tuple[str, str]] = [
    ("main", "https://vojtamaur.cz"),
    ("fallback", "https://vojtamaur.neocities.org"),
    ("github_pages", "https://vojtamaur.github.io/vojtamaur-web"),
    (
        "ardrive",
        "https://db6beycsnxhli2vxsahgn3ajpsi6qv5alttkr4d3sfwrj7uurqfq.ardrive.net/GHwSYFJtzrRqt5AOZuwJfJHoV6Bc5qjwe5FtFP6UjAs",
    ),
]
```

To add another deployment, append another `(label, base_url)` tuple. Use a base URL without a trailing slash. The `posts`, `archive`, `docs`, and `rat` endpoint URLs are generated from this list. The order is the priority order used by the fetch logic.

Only add live deployments that expose the same relative artifact paths. Repository browsers, catalog records, web archives, and one-off snapshots belong in `ARCHIVE.txt`, not in `SITE_SOURCES`.

## Cache

The cache location is platform-specific:

- Windows: `%LOCALAPPDATA%/vojtamaur/`
- macOS: `~/Library/Caches/vojtamaur/`
- Linux/Unix: `$XDG_CACHE_HOME/vojtamaur/` or `~/.cache/vojtamaur/`

Override on Windows CMD:

```bat
set VOJTAMAUR_CACHE_DIR=C:\temp\vojtamaur-cache
```

Override on PowerShell:

```powershell
$env:VOJTAMAUR_CACHE_DIR = "C:\temp\vojtamaur-cache"
```

Override on Unix-like systems:

```bash
export VOJTAMAUR_CACHE_DIR=/tmp/vojtamaur-cache
```

## Offline mode

```bash
vojtamaur posts --offline
vojtamaur archive --offline
vojtamaur docs --offline
vojtamaur rat --offline
vojtamaur stats --offline
vojtamaur open rat --offline
```

Or globally through the environment:

```bash
export VOJTAMAUR_OFFLINE=1
```

In offline mode, `posts`, `archive`, and `rat` use the local cache first and the embedded package snapshot if no cache exists. `docs` uses only the local cache because documentation HTML is not embedded.

## Timeout

The default network timeout is 3 seconds.

```bash
vojtamaur status --timeout 5
```

Or:

```bash
export VOJTAMAUR_TIMEOUT=5
```

## Commands

### `posts`

Prints or saves `ALL_POSTS.txt`.

```bash
vojtamaur posts
vojtamaur posts --save
vojtamaur posts --save my_copy.txt
```

### `archive`

Prints or saves `ARCHIVE.txt`.

```bash
vojtamaur archive
vojtamaur archive --save
vojtamaur archive --save my_archive.txt
```

### `docs`

Downloads the documentation page from `/documentation/`. By default, it prints a simple plaintext extraction from the HTML. With `--raw`, it prints the original HTML. With `--save`, it saves the raw HTML.

```bash
vojtamaur docs
vojtamaur docs --raw
vojtamaur docs --save
```

### `rat`

Downloads `kurt-godel-rat.jpg`.

Because this is a binary artifact, the command saves the image file instead of printing raw JPEG bytes to the terminal.

```bash
vojtamaur rat
vojtamaur rat --save
vojtamaur rat --save my_rat.jpg
vojtamaur rat --offline
vojtamaur kurt-godel-rat --offline
```

### `grep`

Searches `ALL_POSTS.txt` as plain text.

```bash
vojtamaur grep DullGPT
vojtamaur grep "Boltzmannovy mozky" --context 2
vojtamaur grep Metaweb --case-sensitive
```

### `search-url`

Searches URLs found in `ARCHIVE.txt`.

```bash
vojtamaur search-url arquivo
vojtamaur search-url archive.today
```

### `stats`

Prints basic statistics for `ALL_POSTS.txt` and `ARCHIVE.txt`: byte size, character count, word count, line count, entry count, unique slug count, languages, sections, and the number of unique archive links.

```bash
vojtamaur stats
vojtamaur stats --offline
```

### `head`

Prints the first N lines of `ALL_POSTS.txt`.

```bash
vojtamaur head
vojtamaur head 80
```

### `random`

Selects a random URL from `URL:` headers in `ALL_POSTS.txt`.

```bash
vojtamaur random
vojtamaur random --print-only
```

By default, the selected URL is also opened in the browser.

### `status`

Checks URLs found in `ARCHIVE.txt`.

```bash
vojtamaur status
vojtamaur status --limit 10
```

The command uses `HEAD` first and falls back to `GET` for selected failures. Plain HTTP URLs are marked as `INSECURE_HTTP`.

### `verify`

Runs a basic health check:

- primary and fallback sources for `posts`, `archive`, and `docs`
- embedded package snapshots for text artifacts
- local cache writability
- cache decoding for text/HTML cache files, if they exist
- URL parsing from `ALL_POSTS.txt` and `ARCHIVE.txt`

```bash
vojtamaur verify
```

This is a practical availability and parser check. It is not a cryptographic provenance system. If you need strict integrity verification, use the website's generated checksum artifacts or external SHA-256 tooling.

### `open`

Opens a known target or explicit URL. For `posts`, `archive`, `docs`, and `rat`, normal mode opens the canonical online URL. With `--offline`, it opens the corresponding local cache file.

```bash
vojtamaur open site
vojtamaur open fallback
vojtamaur open posts
vojtamaur open archive
vojtamaur open docs
vojtamaur open rat
vojtamaur open posts --offline
vojtamaur open archive --offline
vojtamaur open docs --offline
vojtamaur open rat --offline
vojtamaur open random
vojtamaur open archive-link 1
vojtamaur open https://vojtamaur.cz/metawebovy-clanek/
```

## Embedded snapshots

The package includes bundled fallback copies of:

- `src/vojtamaur/data/ALL_POSTS.txt`
- `src/vojtamaur/data/ARCHIVE.txt`
- `src/vojtamaur/data/kurt-godel-rat.jpg`

These files make the installed package partially useful even if the website, fallback deployments, and cache are unavailable. They also make package distributions, wheels, source archives, PyPI mirrors, pip caches, and installed environments act as additional copies of selected public artifacts.

Before publishing a new release, refresh the embedded snapshots:

```bash
python scripts/refresh_embedded_data.py
```

Then verify:

```bash
python -m unittest
vojtamaur verify
vojtamaur stats --offline
vojtamaur rat --offline
```

## What this tool does not do

- it does not parse the rendered website as the source of articles
- it does not download Markdown/MDX source files
- it does not synchronize the repository
- it does not compute diffs
- it does not store a database
- it does not run in the background
- it does not process or rewrite image pixels
- it does not replace external checksum verification
- it has no runtime dependencies outside the Python standard library

## Limitations

`ALL_POSTS.txt` is a text export, not a complete replica of the website. Media, iframes, PDFs, and selected long blocks are represented by placeholders or omitted. This is intentional. The tool works with the text sediment, not the full rendered website.

`kurt-godel-rat.jpg` is treated as a binary public artifact. The CLI downloads, caches, saves, and opens it; it does not inspect or modify its image metadata.

The embedded snapshots are release snapshots. The live endpoints remain the preferred source when available.

## Tests

```bash
python -m unittest
```

## Build

```bash
python -m pip install build
python -m build
```

## Publishing

Publish through PyPI Trusted Publishing in GitHub Actions or upload manually with Twine. A PyPI account and project access are required.
