Metadata-Version: 2.4
Name: onedrive_pdf_downloader
Version: 1.3.0
Summary: A tool to download private PDF files from OneDrive using Selenium.
Project-URL: Repository, https://github.com/Flying-Bunny-Devs/Onedrive-Private-PDF-Downloader
Project-URL: Homepage, https://github.com/Flying-Bunny-Devs/Onedrive-Private-PDF-Downloader
Project-URL: Documentation, https://github.com/Flying-Bunny-Devs/Onedrive-Private-PDF-Downloader/blob/master/README.md
Project-URL: Issues, https://github.com/Flying-Bunny-Devs/Onedrive-Private-PDF-Downloader/issues
Project-URL: Changelog, https://github.com/Flying-Bunny-Devs/Onedrive-Private-PDF-Downloader/blob/master/CHANGELOG.md
Author-email: Francesco146 <49027005+Francesco146@users.noreply.github.com>, willnaoosmith <55159911+willnaoosmith@users.noreply.github.com>
Maintainer-email: Francesco146 <49027005+Francesco146@users.noreply.github.com>, willnaoosmith <55159911+willnaoosmith@users.noreply.github.com>
License: MIT License
        
        Copyright (c) 2024 Francesco Marastoni
        
        Permission is hereby granted, free of charge, to any person obtaining a copy
        of this software and associated documentation files (the "Software"), to deal
        in the Software without restriction, including without limitation the rights
        to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
        copies of the Software, and to permit persons to whom the Software is
        furnished to do so, subject to the following conditions:
        
        The above copyright notice and this permission notice shall be included in all
        copies or substantial portions of the Software.
        
        THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
        IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
        FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
        AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
        LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
        OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
        SOFTWARE.
License-File: LICENSE
Keywords: downloader,onedrive,pdf,selenium
Classifier: Development Status :: 5 - Production/Stable
Classifier: Environment :: Console
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 3 :: Only
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Utilities
Requires-Python: >=3.10
Requires-Dist: argparse==1.4.0
Requires-Dist: attrs==25.3.0
Requires-Dist: certifi==2025.4.26
Requires-Dist: deprecated==1.2.18
Requires-Dist: h11==0.16.0
Requires-Dist: idna==3.10
Requires-Dist: img2pdf==0.6.1
Requires-Dist: lxml==5.4.0
Requires-Dist: outcome==1.3.0.post0
Requires-Dist: packaging==25.0
Requires-Dist: pikepdf==9.7.0
Requires-Dist: pillow==11.2.1
Requires-Dist: pysocks==1.7.1
Requires-Dist: selenium==4.31.0
Requires-Dist: sniffio==1.3.1
Requires-Dist: sortedcontainers==2.4.0
Requires-Dist: trio-websocket==0.12.2
Requires-Dist: trio==0.30.0
Requires-Dist: typing-extensions==4.13.2
Requires-Dist: urllib3==2.4.0
Requires-Dist: websocket-client==1.8.0
Requires-Dist: wrapt==1.17.2
Requires-Dist: wsproto==1.2.0
Description-Content-Type: text/markdown

# 📄 PDF Exporter from Authenticated OneDrive Sessions

![separator](https://raw.githubusercontent.com/andreasbm/readme/master/assets/lines/colored.png)

This project allows you to export PDFs, even those that are protected, from authenticated OneDrive sessions using Selenium. The tool automates the browser process to capture screenshots of each page and combine them into a PDF file. Works also on OneDrive for Business.

> [!WARNING]
> This tool may need to be calibrated in order to work correctly. It is expected to be used by someone who can inspect a page and read HTML.

- [📄 PDF Exporter from Authenticated OneDrive Sessions](#-pdf-exporter-from-authenticated-onedrive-sessions)
  - [✨ Features](#-features)
  - [👀 Preview](#-preview)
  - [📋 Requirements](#-requirements)
    - [🌐 Browsers:](#-browsers)
    - [🔧 Browser Drivers:](#-browser-drivers)
  - [⚙️ Installation and Setup](#️-installation-and-setup)
  - [🚀 Usage](#-usage)
    - [⚡ Command-line Options](#-command-line-options)
    - [🗑️ Uninstall](#️-uninstall)
    - [🛠️ Profile Setup:](#️-profile-setup)
    - [📂 Getting the Cache Path:](#-getting-the-cache-path)
  - [🛠️ Calibrating the Tool](#️-calibrating-the-tool)
    - [📝 Steps to Calibrate:](#-steps-to-calibrate)
  - [🤝 Contributing](#-contributing)

![separator](https://raw.githubusercontent.com/andreasbm/readme/master/assets/lines/colored.png)

## ✨ Features
- Supports both Firefox and Chrome browsers.
- Ability to export protected PDFs from authenticated web sessions.
- Can optionally keep or delete temporary images used for PDF creation.
- Compatible with browser profiles to retain session data (useful for skipping the login).
- Support for Raw lossless PDFs

![separator](https://raw.githubusercontent.com/andreasbm/readme/master/assets/lines/colored.png)

## 👀 Preview

```bash
$ onedrive-pdf-downloader --profile-dir /path/to/profile https://blabla.sharepoint.com/...

INFO - Initializing browser: firefox
Make sure to authenticate and reach the PDF preview. 
INFO - Total number of pages detected: 8
INFO - Detected file name: '2024-10-21.pdf'
INFO - Starting the export of the file "2024-10-21.pdf". This might take a while depending on the number of pages.
INFO - Toolbar hidden for clean screenshots.
INFO - Page 1 of 8 exported.
INFO - Page 2 of 8 exported.
INFO - Page 3 of 8 exported.
INFO - Page 4 of 8 exported.
INFO - Page 5 of 8 exported.
INFO - Page 6 of 8 exported.
INFO - Page 7 of 8 exported.
INFO - Page 8 of 8 exported.
INFO - Saving the file as '2024-10-21.pdf'.
INFO - Temporary images removed.
INFO - Browser session ended.
```

![separator](https://raw.githubusercontent.com/andreasbm/readme/master/assets/lines/colored.png)

## 📋 Requirements

Before running the project, you need the following dependencies:

### 🌐 Browsers:
Make sure that you have one of the following browsers installed:
- Firefox
- Chrome

### 🔧 Browser Drivers:
To interact with your browser via Selenium, you need the appropriate driver for your browser:
- **Geckodriver** for Firefox: [Download Geckodriver](https://github.com/mozilla/geckodriver/releases/latest)
- **Chromedriver** for Chrome: [Download Chromedriver](https://googlechromelabs.github.io/chrome-for-testing/#stable)

Ensure the drivers are in your system’s `PATH` or specify their location explicitly when launching the browser.

![separator](https://raw.githubusercontent.com/andreasbm/readme/master/assets/lines/colored.png)

## ⚙️ Installation and Setup

1. Install via `pip`:
    ```bash
    pip install onedrive_pdf_downloader
    ```

2. Download and install the appropriate browser drivers for your browser.

3. Optionally, set up a browser profile to retain session information:
    - **Firefox:**
        - Create a Firefox profile through `about:profiles` in the browser's address bar.
        - Use the `-p` option to specify the path to the Firefox profile directory.
    - **Chrome:**
        - Specify the Chrome user data directory and profile name using `-p` and `-n` options. You can find this in `chrome://version/`.

![separator](https://raw.githubusercontent.com/andreasbm/readme/master/assets/lines/colored.png)

## 🚀 Usage

To run the script, use the following command structure:

```bash
onedrive-pdf-downloader [options]
```

### ⚡ Command-line Options

| Argument             | Description                                                                              | Example                             |
| -------------------- | ---------------------------------------------------------------------------------------- | ----------------------------------- |
| `--browser, -b`      | Specify the browser to use (`firefox` or `chrome`).                                      | `--browser firefox`                 |
| `--profile-dir, -p`  | Path to the browser profile directory. If using Chrome, specify the user data directory. | `--profile-dir /path/to/profile`    |
| `--profile-name, -n` | Profile name to use (Chrome only).                                                       | `--profile-name "Profile 1"`        |
| `--keep-imgs, -k`    | Keep the temporary images used for PDF creation.                                         | `--keep-imgs`                       |
| `--output-file, -o`  | Specify the output file name.                                                            | `--output-file file.pdf`            |
| `--cache-dir, -r`    | Search in browser caches for RAW Lossless PDFs (Firefox only).                           | `--cache-dir /path/to/cache`        |
| `url`                | The URL of the PDF file. This is a required argument.                                    | `https://blabla.sharepoint.com/...` |

### 🗑️ Uninstall

To uninstall the package, you can use pip:
```bash
pip uninstall onedrive_pdf_downloader
```

### 🛠️ Profile Setup:
To use an authenticated session, you may need to use a browser profile where you're already logged in. Here’s how to do that:

- **Firefox Profile:**
    1. Open Firefox and type `about:profiles` in the address bar.
    2. Create a new profile or use an existing one. The profile path will be displayed under "Root Directory."
    3. Use the `--profile-dir` option to point to this directory in the script.

- **Chrome Profile:**
    1. Open Chrome and type `chrome://version` in the address bar.
    2. Find the `Profile Path`
    3. Use the `--profile-dir` option for the user data directory (e.g., `/path/to/profiles`) and the `--profile-name` option for the profile name (e.g., `Default`).

### 📂 Getting the Cache Path:
To use the `--cache-dir` option, you need to find the cache directory of your browser. Here’s how to do that:

- **Firefox:**
    1. Open Firefox and type `about:profiles` in the address bar.
    2. Locate the profile you are using and find the "Local Directory" path.
    3. The cache directory is typically located within this path under `cache2/entries`.
    
    Basically, `{profile_path}/cache2/entries`.

![separator](https://raw.githubusercontent.com/andreasbm/readme/master/assets/lines/colored.png)

## 🛠️ Calibrating the Tool

If the tool is not working correctly, you may need to update the class names and ARIA labels used to identify elements on the OneDrive page. These values are defined in the [browser/constants.py](src/onedrive_pdf_downloader/browser/constants.py#L6) file.

### 📝 Steps to Calibrate:

1. **Open the OneDrive page in your browser:**
   - Use the browser's inspector tool (F12, Ctrl+Shift+I in most browsers, or right-click and select "Inspect") to find the class names or the ARIA labels for the elements used by the script.

2. **Update the class names and ARIA labels in the script:**
   - Open the [browser/constants.py](src/onedrive_pdf_downloader/browser/constants.py#L6) file.
   - Update the following lists with the new values:
     ```python
     CLASS_NAMES_TOTAL_PAGES = ["status_5a88b9b2"]  # Add the new class names for the total pages element
     CLASS_NAMES_FILE_NAME = ["OneUpNonInteractiveCommandNewDesign_156f96ef"]  # Add the new class names for the file name element
     CLASS_NAMES_TOOLBAR = ["root_5a88b9b2"]  # Add the new class names for the toolbar element
     ARIA_LABELS_NEXT_PAGE = ["Vai alla pagina successiva."]  # Add the new ARIA labels for the next page button
     ```

3. **Save the changes and run the script again:**
   - Save the updated [browser/constants.py](src/onedrive_pdf_downloader/browser/constants.py#L6) file.
   - Run the script with the updated values to ensure it works correctly.

By following these steps, you can calibrate the tool to work with any changes in the OneDrive page structure.

![separator](https://raw.githubusercontent.com/andreasbm/readme/master/assets/lines/colored.png)

## 🤝 Contributing

We welcome contributions to improve this tool. If you have found new class names or ARIA labels, please consider submitting a pull request to update the configuration.

For more details, see the [CONTRIBUTING.md](/CONTRIBUTING.md) file.
