Metadata-Version: 2.4
Name: tap-substack
Version: 1.0.0
Summary: Singer tap for extracting data from Substack newsletters, built with the Meltano Singer SDK
Author: tripleaceme
License: MIT
Project-URL: Homepage, https://github.com/tripleaceme/tap-substack
Project-URL: Repository, https://github.com/tripleaceme/tap-substack
Project-URL: Issues, https://github.com/tripleaceme/tap-substack/issues
Keywords: singer,meltano,tap,substack,etl,elt
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Topic :: Database
Requires-Python: >=3.9
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: singer-sdk>=0.36.0
Requires-Dist: requests>=2.31.0
Provides-Extra: dev
Requires-Dist: pytest>=7.0; extra == "dev"
Requires-Dist: singer-sdk[testing]; extra == "dev"
Dynamic: license-file

# tap-substack

A [Singer](https://www.singer.io/) tap for extracting data from [Substack](https://substack.com) newsletters, built with the [Meltano Singer SDK](https://sdk.meltano.com).

## What's Working

### Streams

| Stream | Endpoint | Auth | Status | Description |
|---|---|---|---|---|
| `posts` | `/api/v1/archive` | No | Working | All published posts (paginated, sorted by date) |
| `post_details` | `/api/v1/posts/{slug}` | No | Working | Full HTML content per post (child of `posts`) |
| `comments` | `/api/v1/post/{id}/comments` | No | Working | Comments per post (child of `posts`) |
| `dashboard_summary` | `/api/v1/publish-dashboard/summary` | Yes | Working | High-level publication metrics (subscribers, views, open rate) |
| `email_stats` | `/api/v1/publication/stats/email_stats` | Yes | Working | Per-post email delivery and engagement metrics |
| `recommendations_inbound` | `/api/v1/recommendations/stats/to` | Yes | Working | Publications recommending you + signup stats |
| `recommendations_outbound` | `/api/v1/recommendations/stats/from` | Yes | Working | Publications you recommend + signup stats |
| `subscribers` | `/api/v1/publication/subscribers` | Yes | Working | Subscriber list with email, type, source |

### Features
- **Public + authenticated modes**: Runs without auth for public data; add `session_token` to unlock dashboard/analytics streams
- **Pagination**: Offset-based pagination for list endpoints (`posts`, `email_stats`, `subscribers`)
- **Rate limiting**: Built-in 0.5s request throttle to avoid 429s
- **Graceful auth errors**: Authenticated streams log warnings and skip on 403/404/429 instead of crashing
- **Parent-child streams**: `post_details` and `comments` automatically iterate over posts from the `posts` stream
- **Incremental replication**: `posts` stream supports `replication_key` on `post_date`

## Installation

```bash
pip install -e .
```

## Configuration

| Setting | Required | Description |
|---|---|---|
| `subdomain` | Yes | Your Substack subdomain (e.g. `newsletter` for `newsletter.substack.com`) |
| `session_token` | Yes | Your `substack.sid` cookie (see below) |

### Getting your session token

1. Log in to your Substack dashboard
2. Open browser DevTools > Application > Cookies
3. Copy the value of the `substack.sid` cookie

## Usage

```bash
# Discovery mode
tap-substack --config sample_config.json --discover

# Sync public data only
tap-substack --config sample_config.json

# Pipe to a target
tap-substack --config config.json | target-jsonl
```

## Development

```bash
# Set up
python -m venv .venv
source .venv/bin/activate
pip install -e ".[dev]"

# Run against a test newsletter (public only)
tap-substack --config test_config.json
```
