Metadata-Version: 2.4
Name: RedditMiner
Version: 1.0.2
Summary: A Python tool for scraping images, galleries, and comments from Reddit using browser cookies.
Home-page: https://github.com/MisbahKhan0009/RedditMiner
Author: Misbah Khan
Author-email: Misbah Khan <misbahkhan007@gmail.com>
Keywords: reddit,scraper,images,gallery,comments,download
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Intended Audience :: End Users/Desktop
Classifier: Topic :: Internet :: WWW/HTTP
Classifier: Topic :: Utilities
Requires-Python: >=3.7
Description-Content-Type: text/markdown
Requires-Dist: requests>=2.0.0
Dynamic: author
Dynamic: home-page
Dynamic: requires-python

# RedditMiner: Subreddit Image Scraper

## Quick Start

1. **Install RedditMiner from PyPI:**
    ```bash
    pip install redditminer
    ```

2. **Export your Reddit cookies** (see below) and save as `cookies.txt` in your working directory.

3. **Run example commands:**

    - Scrape 200 top posts and save as JSON:
       ```bash
       redditminer --subreddit funny --limit 200 --sort top
       ```

    - Scrape only image URLs (TXT file):
       ```bash
       redditminer --subreddit funny --output-mode image_url
       ```

    - Scrape posts with top-level comments included (JSON):
       ```bash
       redditminer --subreddit funny --output-mode post --with-comment
       ```

    - Scrape and immediately download all images:
       ```bash
       redditminer --subreddit funny --output-mode image_url --download-images
       ```

    - Customize the download directory and parallelism:
       ```bash
       redditminer --subreddit funny --output-mode image_url --download-images --output-dir my_images --max-workers 16
       ```

**Note:**
Make sure you have `cookies.txt` in your current directory for authentication. If you encounter rate limiting, RedditMiner will automatically slow down and retry.


<p align="center">
   <img src="assets/logo.svg" alt="RedditMiner Arctic Logo" width="120" height="120">
</p>

# RedditMiner: Subreddit Image Scraper

RedditMiner is a lightweight, open-source Python tool for scraping image and gallery URLs from any public or private subreddit using your browser session cookies. No Reddit API credentials required—works even for NSFW and restricted subreddits.

## Features

- **Cookie Authentication**: Uses your browser session cookies for seamless access.
- **Image & Gallery Support**: Extracts direct image links and all images from Reddit galleries.
- **Deep Pagination**: Efficiently fetches large numbers of posts using Reddit's pagination.
- **Command-Line Interface**: Specify the subreddit and options directly via command line.

## Installation

1. **Clone the repository**
   ```bash
   git clone https://github.com/MisbahKhan0009/RedditMiner.git
   cd RedditMiner
   ```

2. **Install dependencies**
   ```bash
   pip install requests
   ```

3. **Export your Reddit cookies**
   - Log into Reddit in your browser.
   - Use a browser extension like "EditThisCookie" or "Get cookies.txt" to export your cookies for `reddit.com`.
   - Save the exported file as `cookies.txt` in the project root directory.

You can install RedditMiner directly from PyPI:

```bash
pip install redditminer
```

Or, for development, clone the repository and install dependencies:

```bash
git clone https://github.com/MisbahKhan0009/RedditMiner.git
cd RedditMiner
pip install -e .
```

### Export your Reddit cookies
- Log into Reddit in your browser.
- Use a browser extension like "EditThisCookie" or "Get cookies.txt" to export your cookies for `reddit.com`.
- Save the exported file as `cookies.txt` in your working directory.


## Usage

Run the scraper with your desired subreddit:

```bash
python main.py --subreddit EarthPorn
```

After installation, you can use RedditMiner from the command line:

```bash
redditminer --subreddit EarthPorn
```

Or, if running from source:

```bash
python main.py --subreddit EarthPorn
```




### Optional arguments

   - `post` (default): Full post data (JSON)
   - `image_url`: Only image URLs (from both `image_url` and `gallery_images` fields, TXT file)
   - `post_with_comments`: Full post data with comments (JSON, same as `post` if `--with-comment` is not set)

**Rate Limiting:**
If Reddit returns a 429 (Too Many Requests) error, the scraper will automatically slow down and retry after 60 seconds. This helps avoid being blocked by Reddit's rate limits. For best results, avoid running multiple scrapes in parallel and consider using a fresh set of cookies if you encounter repeated rate limiting.

### Command-line options

- `--subreddit` : Subreddit name to scrape (required)
- `--limit` : Number of posts to scrape (default: 100)
- `--sort`  : Sort order (`new`, `hot`, `top`, etc.; default: `new`)
- `--output-mode` : Output format. Options:
   - `post` (default): Full post data (JSON)
   - `image_url`: Only image URLs (from both `image_url` and `gallery_images` fields, TXT file)
   - `post_with_comments`: Full post data with comments (JSON, same as `post` if `--with-comment` is not set)
- `--with-comment` : Include top-level comments for each post (JSON output modes only). Comments from "AutoModerator" are automatically skipped.
- `--download-images` : Download all found images
- `--output-dir` : Directory to save images (default: images)
- `--max-workers` : Number of parallel downloads (default: 8)

**Rate Limiting:**
If Reddit returns a 429 (Too Many Requests) error, RedditMiner will automatically slow down and retry after 60 seconds. For best results, avoid running multiple scrapes in parallel and consider using a fresh set of cookies if you encounter repeated rate limiting.

**Examples:**

Scrape 200 top posts and save as JSON:
```bash
python main.py --subreddit funny --limit 200 --sort top
```


Scrape only image URLs (TXT file):
```bash
python main.py --subreddit funny --output-mode image_url
```



Scrape posts with top-level comments included (JSON):
```bash
python main.py --subreddit funny --output-mode post --with-comment
```

Each post in the output JSON will have a `comments` field containing a list of top-level comments (author, body, score, created_utc). Comments from "AutoModerator" are excluded.

Scrape and immediately download all images:
```bash
python main.py --subreddit funny --output-mode image_url --download-images
```


You can customize the download directory and parallelism:
```bash
python main.py --subreddit funny --output-mode image_url --download-images --output-dir my_images --max-workers 16
```

**Examples:**

Scrape 200 top posts and save as JSON:
```bash
redditminer --subreddit funny --limit 200 --sort top
```

Scrape only image URLs (TXT file):
```bash
redditminer --subreddit funny --output-mode image_url
```

Scrape posts with top-level comments included (JSON):
```bash
redditminer --subreddit funny --output-mode post --with-comment
```

Each post in the output JSON will have a `comments` field containing a list of top-level comments (author, body, score, created_utc). Comments from "AutoModerator" are excluded.

Scrape and immediately download all images:
```bash
redditminer --subreddit funny --output-mode image_url --download-images
```

You can customize the download directory and parallelism:
```bash
redditminer --subreddit funny --output-mode image_url --download-images --output-dir my_images --max-workers 16
```

Downloaded images are automatically organized by subreddit:
- For example, images from r/EarthPorn will be saved in `images/EarthPorn/` by default.
- If you specify a custom output directory, images will be saved in `<output-dir>/<subreddit>/`.

Results are saved as:
- JSON: `output/images_[subreddit]_[timestamp].json`
- TXT (image URLs): `output/images_[subreddit]_[timestamp].txt`
- Downloaded images: in `images/<subreddit>/` (or `<output-dir>/<subreddit>/` if specified)

## Project Structure

```
RedditMiner/
│
├── redditminer/
│   ├── __init__.py
│   └── scraper.py         # Core scraping logic and RedditImageScraper class
│
├── main.py                # Command-line entry point
├── cookies.txt            # Your exported Reddit cookies
├── README.md
└── ...

```

## Contributing

Contributions are welcome! Please open issues or submit pull requests for new features, bug fixes, or improvements.

## License

This project is licensed under the MIT License. See the [LICENSE](LICENSE) file for details.

## Disclaimer

This tool is intended for personal and educational use. Please respect Reddit's [Terms of Service](https://www.redditinc.com/policies/user-agreement) and do not use this tool for spamming or violating site rules.
