Metadata-Version: 2.4
Name: pusheen-archiver
Version: 0.1.2
Summary: Social media archival with authenticity guarantees
License: MIT License
        
        Copyright (c) 2024 Pusheen Archiver Contributors
        
        Permission is hereby granted, free of charge, to any person obtaining a copy
        of this software and associated documentation files (the "Software"), to deal
        in the Software without restriction, including without limitation the rights
        to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
        copies of the Software, and to permit persons to whom the Software is
        furnished to do so, subject to the following conditions:
        
        The above copyright notice and this permission notice shall be included in all
        copies or substantial portions of the Software.
        
        THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
        IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
        FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
        AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
        LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
        OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
        SOFTWARE.
        
Project-URL: Homepage, https://github.com/pusheenism/pusheen-archiver
Project-URL: Repository, https://github.com/pusheenism/pusheen-archiver
Project-URL: Bug Tracker, https://github.com/pusheenism/pusheen-archiver/issues
Keywords: social-media,archiver,downloader,yt-dlp,gallery-dl
Classifier: Development Status :: 3 - Alpha
Classifier: Environment :: Console
Classifier: Intended Audience :: End Users/Desktop
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Internet
Classifier: Topic :: Multimedia
Requires-Python: >=3.11
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: click>=8.1.7
Requires-Dist: rich>=13.7.1
Requires-Dist: pydantic>=2.7.1
Requires-Dist: pydantic-settings>=2.3.0
Requires-Dist: platformdirs>=4.0.0
Requires-Dist: sqlalchemy[asyncio]>=2.0.30
Requires-Dist: alembic>=1.13.1
Requires-Dist: aiosqlite>=0.20.0
Requires-Dist: httpx>=0.27.0
Requires-Dist: aiofiles>=23.2.1
Requires-Dist: cryptography>=42.0.8
Requires-Dist: yt-dlp>=2024.5.27
Requires-Dist: gallery-dl>=1.26.0
Requires-Dist: playwright>=1.44.0
Requires-Dist: Pillow>=10.3.0
Requires-Dist: mutagen>=1.47.0
Requires-Dist: structlog>=24.2.0
Requires-Dist: anyio>=4.4.0
Requires-Dist: tenacity>=8.3.0
Requires-Dist: python-dateutil>=2.9.0
Requires-Dist: tqdm>=4.66.4
Requires-Dist: jsonpatch>=1.33
Provides-Extra: server
Requires-Dist: fastapi>=0.111.0; extra == "server"
Requires-Dist: uvicorn[standard]>=0.29.0; extra == "server"
Requires-Dist: python-multipart>=0.0.9; extra == "server"
Requires-Dist: asyncpg>=0.29.0; extra == "server"
Requires-Dist: psycopg2-binary>=2.9.9; extra == "server"
Requires-Dist: celery[redis]>=5.4.0; extra == "server"
Requires-Dist: redis>=5.0.4; extra == "server"
Requires-Dist: flower>=2.0.1; extra == "server"
Provides-Extra: dev
Requires-Dist: pytest>=8.2.2; extra == "dev"
Requires-Dist: pytest-asyncio>=0.23.7; extra == "dev"
Requires-Dist: pytest-cov>=5.0.0; extra == "dev"
Requires-Dist: pytest-mock>=3.14.0; extra == "dev"
Requires-Dist: factory-boy>=3.3.0; extra == "dev"
Requires-Dist: faker>=25.8.0; extra == "dev"
Requires-Dist: black>=24.4.2; extra == "dev"
Requires-Dist: ruff>=0.4.7; extra == "dev"
Requires-Dist: mypy>=1.10.0; extra == "dev"
Requires-Dist: pre-commit>=3.7.1; extra == "dev"
Dynamic: license-file

<p align="center">
  <img src="docs/logo.gif" alt="Pusheen Archiver" width="420">
</p>

# Pusheen Archiver

Save social media posts before they disappear. Pusheen Archiver captures posts, profiles, and media from X, YouTube, TikTok, Instagram, SoundCloud, and Pinterest — all stored locally, with cryptographic hashes so you can prove the content is untampered.

No cloud account. No subscription. Just `pip install pusheen-archiver` and you're done.

---

## Install

```bash
pip install pusheen-archiver
pusheen
```

That's it. The first time you run `pusheen` with no arguments it'll ask where you want to store your archives and set everything up.

**Windows installer:** Grab `PusheenInstaller.exe` from the releases page for a GUI installer that handles Python, PATH, and Chromium automatically.

---

## Quick start

```bash
# archive a whole profile
pusheen save https://x.com/someuser
pusheen save https://www.tiktok.com/@someuser
pusheen save https://soundcloud.com/some_artist

# single post
pusheen save https://www.youtube.com/watch?v=dQw4w9WgXcQ

# keep it up to date
pusheen sync x someuser
```

Paste any supported URL and pusheen figures out what it is — profile, post, playlist, whatever.

---

## What gets saved

For each post:
- The media (video, images, audio) at the best available quality
- A `metadata.json` with the caption, stats, hashtags, and everything else
- A full-page screenshot and rendered HTML snapshot (via Playwright)
- A `versions/` folder that records every edit the post goes through

For each profile run:
- Avatar and banner images
- A signed `manifest.json` listing every file with its SHA256 hash
- A `receipt.txt` you can attach to a legal filing

Nothing ever gets deleted from disk. If a post disappears online, it gets flagged in the database but stays in your archive.

---

## Platforms

| Platform | Auth needed? | Notes |
|----------|-------------|-------|
| X (Twitter) | No API key — browser cookies work | See cookie setup below |
| YouTube | Optional API key | Works fine without one |
| TikTok | Optional | Public profiles work without credentials |
| Instagram | Optional | Public profiles work without credentials |
| SoundCloud | None | `client_id` is auto-discovered |
| Pinterest | Optional | Public boards work without credentials |

---

## Configuration

All settings are in a single TOML file — no scattered environment variables for desktop use:

| OS | Location |
|----|---------|
| Windows | `%APPDATA%\pusheen-archiver\config.toml` |
| macOS | `~/Library/Application Support/pusheen-archiver/config.toml` |
| Linux | `~/.config/pusheen-archiver/config.toml` |

```bash
pusheen config edit   # opens it in your default editor
```

The file is fully commented so you know what everything does. The important bits:

```toml
[paths]
archive_root = "C:/Users/you/archive"

[archive]
capture_screenshots  = true
capture_html         = true
skip_media           = false   # true = metadata only, no downloads
save_info_json       = true    # yt-dlp .info.json sidecar files
save_thumbnail       = true    # thumbnail images alongside media
max_posts            = 0       # 0 = no limit

[media]
media_format  = "default"      # default | mp4 | webm | mp3 | m4a | flac | opus
media_quality = "best"         # best | high | medium | low | worst
```

You can also pass `--no-info-json` or `--no-thumbnail` on the command line to skip those for a single run without touching the config.

---

## Cookie auth for X

X doesn't require an API key. Browser cookies are enough.

**Option A — cookies file (more reliable)**
1. Install the [Get cookies.txt LOCALLY](https://chrome.google.com/webstore/detail/get-cookiestxt-locally/cclelndahbckbenkjhflpdbgdldlbecc) extension
2. Log into x.com, click the extension, export as `cookies.txt`
3. In `config.toml` under `[x]`: `cookies_file = "C:/path/to/cookies.txt"`

**Option B — live browser (easier)**
```toml
[x]
cookies_browser = "brave"   # chrome | firefox | edge | brave | chromium
```
The browser has to be closed when you run pusheen — Chrome and Brave lock their cookie database while they're open.

---

## Archive structure

```
archive/
  x/
    someuser/
      profile/
        profile.json
        avatar.jpg
        banner.jpg
      posts/
        2026-06-10_1234567890/
          metadata.json
          screenshot.png
          page.html
          media/
            video.mp4
          versions/
            v1.json
            v2.json        ← created automatically when a post is edited
      manifests/
        manifest.json      ← every file + its SHA256
        manifest.sig       ← Ed25519 signature (if you've run `pusheen keygen`)
        receipt.txt
```

---

## Verifying an archive

```bash
pusheen verify archive/x/someuser/manifests
```

Checks every file hash against the manifest. If you generated signing keys (`pusheen keygen`), it validates the Ed25519 signature too.

---

## All commands

```
pusheen save <url>               archive anything — post, profile, playlist
  --no-media                     skip downloads, save metadata only
  --no-screenshots               skip Playwright screenshots
  --no-info-json                 skip yt-dlp .info.json sidecars
  --no-thumbnail                 skip thumbnail images
  --out <dir>                    save to a specific directory
  --watch                        re-archive a profile on a schedule

pusheen sync <platform> <user>   incremental sync (new posts only)
pusheen sync-all                 sync every account you've archived
pusheen daemon                   run sync-all on repeat until Ctrl-C

pusheen search <query>           search across all archived captions
pusheen history <platform> <user> show profile change timeline
pusheen export <platform> <user> pack to .zip or .tar.gz
pusheen status                   list archived accounts and stats
pusheen verify <manifest_dir>    check file hashes and signature
pusheen keygen                   generate Ed25519 signing keys

pusheen config edit              open config.toml in your editor
pusheen config show              print current settings
pusheen config update            add missing keys to an existing config
pusheen db init                  create database tables (first run)
pusheen db migrate               run Alembic migrations
pusheen install-browser          install Playwright browser
pusheen shell                    interactive REPL
```

Platform aliases: `x`/`tw`, `yt`, `ig`/`insta`, `tt`, `sc`, `pin`

---

## Search

```bash
pusheen search "concert announcement"
pusheen search "cute" --platform tiktok
pusheen search "dropped" --username someuser --limit 50
```

---

## Profile history

```bash
pusheen history x someuser
```
```
  2026-01-15  first seen     bio: "just a person"   followers: 1,204
  2026-03-02  bio changed    "just a person on the internet"   +185 followers
  2026-06-10  avatar changed   +113 followers
```

---

## Server mode

The default setup uses SQLite and runs entirely locally. If you want to run pusheen as a shared service with a REST API and async job queue:

```bash
pip install "pusheen-archiver[server]"
# set database_url in config.toml to your PostgreSQL connection string
docker-compose up -d db redis
pusheen db migrate
uvicorn pusheen_archiver.api.main:app --host 0.0.0.0 --port 8000
```

API docs at `http://localhost:8000/docs`. Requires PostgreSQL + Redis. This is for self-hosted or developer deployments — not needed for personal use.

---

## Adding a platform

1. Create `src/pusheen_archiver/adapters/myplatform.py`, subclass `BasePlatformAdapter`
2. Implement `discover_account`, `discover_posts`, `fetch_metadata`, `download_media`
3. Register it in `src/pusheen_archiver/adapters/__init__.py`

Everything else — signing, manifests, version history, deduplication, CLI — works automatically.

---

## Development

```bash
git clone https://github.com/pusheenism/pusheen-archiver
cd pusheen-archiver
pip install -e ".[dev]"
pytest
```

For a deep dive into the internals — database schema, adapter interface, API endpoints, signing system — see [`docs/ARCHITECTURE.md`](docs/ARCHITECTURE.md).

---

## License

MIT
