Metadata-Version: 2.4
Name: RedditMiner
Version: 1.0.0
Summary: A Python tool for scraping images, galleries, and comments from Reddit using browser cookies.
Home-page: https://github.com/MisbahKhan0009/RedditMiner
Author: Misbah Khan
Author-email: Misbah Khan <misbahkhan007@gmail.com>
Keywords: reddit,scraper,images,gallery,comments,download
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Intended Audience :: End Users/Desktop
Classifier: Topic :: Internet :: WWW/HTTP
Classifier: Topic :: Utilities
Requires-Python: >=3.7
Description-Content-Type: text/markdown
Requires-Dist: requests>=2.0.0
Dynamic: author
Dynamic: home-page
Dynamic: requires-python



<p align="center">
   <img src="assets/logo.svg" alt="RedditMiner Arctic Logo" width="120" height="120">
</p>

# RedditMiner: Subreddit Image Scraper

RedditMiner is a lightweight, open-source Python tool for scraping image and gallery URLs from any public or private subreddit using your browser session cookies. No Reddit API credentials required—works even for NSFW and restricted subreddits.

## Features

- **Cookie Authentication**: Uses your browser session cookies for seamless access.
- **Image & Gallery Support**: Extracts direct image links and all images from Reddit galleries.
- **Deep Pagination**: Efficiently fetches large numbers of posts using Reddit's pagination.
- **Command-Line Interface**: Specify the subreddit and options directly via command line.

## Installation

1. **Clone the repository**
   ```bash
   git clone https://github.com/MisbahKhan0009/RedditMiner.git
   cd RedditMiner
   ```

2. **Install dependencies**
   ```bash
   pip install requests
   ```

3. **Export your Reddit cookies**
   - Log into Reddit in your browser.
   - Use a browser extension like "EditThisCookie" or "Get cookies.txt" to export your cookies for `reddit.com`.
   - Save the exported file as `cookies.txt` in the project root directory.


## Usage

Run the scraper with your desired subreddit:

```bash
python main.py --subreddit EarthPorn
```




### Optional arguments

- `--limit` : Number of posts to scrape (default: 100)
- `--sort`  : Sort order (`new`, `hot`, `top`, etc.; default: `new`)
- `--output-mode` : Output format. Options:
   - `post` (default): Full post data (JSON)
   - `image_url`: Only image URLs (from both `image_url` and `gallery_images` fields, TXT file)
   - `post_with_comments`: Full post data with comments (JSON, same as `post` if `--with-comment` is not set)
- `--with-comment` : Include top-level comments for each post (JSON output modes only). Comments from "AutoModerator" are automatically skipped.

**Rate Limiting:**
If Reddit returns a 429 (Too Many Requests) error, the scraper will automatically slow down and retry after 60 seconds. This helps avoid being blocked by Reddit's rate limits. For best results, avoid running multiple scrapes in parallel and consider using a fresh set of cookies if you encounter repeated rate limiting.

**Examples:**

Scrape 200 top posts and save as JSON:
```bash
python main.py --subreddit funny --limit 200 --sort top
```


Scrape only image URLs (TXT file):
```bash
python main.py --subreddit funny --output-mode image_url
```



Scrape posts with top-level comments included (JSON):
```bash
python main.py --subreddit funny --output-mode post --with-comment
```

Each post in the output JSON will have a `comments` field containing a list of top-level comments (author, body, score, created_utc). Comments from "AutoModerator" are excluded.

Scrape and immediately download all images:
```bash
python main.py --subreddit funny --output-mode image_url --download-images
```


You can customize the download directory and parallelism:
```bash
python main.py --subreddit funny --output-mode image_url --download-images --output-dir my_images --max-workers 16
```

Downloaded images are automatically organized by subreddit:
- For example, images from r/EarthPorn will be saved in `images/EarthPorn/` by default.
- If you specify a custom output directory, images will be saved in `<output-dir>/<subreddit>/`.

Results are saved as:
- JSON: `output/images_[subreddit]_[timestamp].json`
- TXT (image URLs): `output/images_[subreddit]_[timestamp].txt`
- Downloaded images: in `images/<subreddit>/` (or `<output-dir>/<subreddit>/` if specified)

## Project Structure

```
RedditMiner/
│
├── redditminer/
│   ├── __init__.py
│   └── scraper.py         # Core scraping logic and RedditImageScraper class
│
├── main.py                # Command-line entry point
├── cookies.txt            # Your exported Reddit cookies
├── README.md
└── ...

```

## Contributing

Contributions are welcome! Please open issues or submit pull requests for new features, bug fixes, or improvements.

## License

This project is licensed under the MIT License. See the [LICENSE](LICENSE) file for details.

## Disclaimer

This tool is intended for personal and educational use. Please respect Reddit's [Terms of Service](https://www.redditinc.com/policies/user-agreement) and do not use this tool for spamming or violating site rules.
