Metadata-Version: 2.4
Name: pyremotedata
Version: 0.1.13
Summary: A package for low- and high-level high-bandwidth asynchronous data transfer
Project-URL: Homepage, https://github.com/asgersvenning/pyremotedata
Project-URL: Repository, https://github.com/asgersvenning/pyremotedata
Project-URL: Issues, https://github.com/asgersvenning/pyremotedata/issues
Author-email: Asger Svenning <asgersvenning@gmail.com>
License-File: LICENSE
Classifier: Development Status :: 5 - Production/Stable
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Programming Language :: Python :: 3.14
Classifier: Typing :: Typed
Requires-Python: >=3.10
Requires-Dist: pyyaml>=6.0.1
Requires-Dist: tqdm>=4.67.1
Provides-Extra: full
Requires-Dist: colorlog>=6.9.0; extra == 'full'
Requires-Dist: wrapt-timeout-decorator>=1.5.1; extra == 'full'
Description-Content-Type: text/markdown

# `pyRemoteData`

[![Python version](https://img.shields.io/badge/python-3.10%2B-blue.svg)](https://www.python.org/downloads/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
[![Tests](https://github.com/asgersvenning/pyremotedata/actions/workflows/python-tests.yml/badge.svg)](https://github.com/asgersvenning/pyremotedata/actions)
[![codecov](https://codecov.io/github/asgersvenning/pyremotedata/graph/badge.svg)](https://codecov.io/github/asgersvenning/pyremotedata)

---

`pyRemoteData` is a module developed for scientific computation using the remote storage platform [ERDA](https://erda.au.dk/) (Electronic Research Data Archive) provided by Aarhus University IT, as part of my PhD at the Department of Ecoscience at Aarhus University.

It can be used with **any** storage facility that supports SFTP and LFTP, but is only tested on a minimal SFTP server found at [atmoz/sftp](https://hub.docker.com/r/atmoz/sftp) and on the live AU ERDA service which runs on MiG (Minimum intrusion Grid - [SourceForge](https://sourceforge.net/projects/migrid/)/[GitHub](https://github.com/ucphhpc/migrid-sync)) developed by [SCIENCE HPC Centre at Copenhagen University](https://science.ku.dk/english/research/research-e-infrastructure/science-hpc-centre/).

## Capabilities

In order to facility high-throughput computation in a cross-platform setting, `pyRemoteData` handles data transfer with multithreading and asynchronous data streaming using thread-safe buffers.

## Use-cases

If your storage facility supports SFTP and LFTP, and you need high-bandwidth data streaming for analysis, data migration or other purposes such as model-training, then this module may be of use to you.
Experience with SFTP or LFTP is not necessary, but you must be able to setup the required SSH configurations.

See **Automated** for details on how to avoid having to set up SSH configuration.

## Setup

A more user-friendly setup process, which facilitates both automated as well as interactive setup is currently in development. (**TODO**: Finish and describe the setup process)

### Installation

The package is available on PyPI. The recommended way to install and manage dependencies is using the lightning-fast [`uv`](https://docs.astral.sh/uv/) package manager:

```bash
# Add to your current project
uv add pyremotedata
```

Alternatively, you can use the `uv pip` interface or standard `pip`:

```bash
uv pip install pyremotedata
# or just
pip install pyremotedata
```

### Interactive

Simply follow the popup instructions that appear once you load the package for the first time.

### Automated

The automatic configuration setup relies on setting the correct environment variables **BEFORE LOADING THE PACKAGE**:

* `PYREMOTEDATA_REMOTE_USERNAME` : Should be set to your username on your remote service.
* `PYREMOTEDATA_REMOTE_URI` : Should be set to the URI of the endpoint for your remote service (e.g. for ERDA it is "io.erda.au.dk").
* `PYREMOTEDATA_REMOTE_DIRECTORY` : If you would like to set a default working directory, that is not the root of your remote storage, then set this to that (e.g. "/MY_PROJECT/DATASETS") otherwise simply set this to "/".
* `PYREMOTEDATA_AUTO` : Should be **set to "yes"** to disable interactive mode. If this is not set, or set to anything other than "yes" (not case-sensitive), while any of the prior environment variables are unset an error will be thrown.

The recommended way to avoid any SSH or environment variables setup is to use:

```python
from pyremotedata.implicit_mount import IOHandler
with IOHandler(lftp_settings = {'sftp:connect-program' : 'ssh -a -x -i <keyfile>'}, user = <USER>, remote = <REMOTE>) as io:
    ...
```

Here `keyfile` is probably something like `~/.ssh/id_rsa`.

### Example

If you want to test against a mock server simply follow the instructions in tests/README.

If you have a remote storage facility that supports SFTP and LFTP, then you can use the following example to test the functionality of the module:

```python
# Set the environment variables (only necessary in a non-interactive setting)
# If you are simply running this as a Python script, 
# you can omit these lines and you will be prompted to set them interactively
import os
os.environ["PYREMOTEDATA_REMOTE_USERNAME"] = "username"
os.environ["PYREMOTEDATA_REMOTE_URI"] = "storage.example.com"
os.environ["PYREMOTEDATA_REMOTE_DIRECTORY"] = "/MY_PROJECT/DATASETS"
os.environ["PYREMOTEDATA_AUTO"] = "yes"

from pyremotedata.implicit_mount import IOHandler

handler = IOHandler()

with handler as io:
    print(io.ls())
    local_file = io.download("/remote/file/or/directory")

# The configuration is persistent, but can be removed using the following:
from pyremotedata.config import remove_config
remove_config()
```

## Issues

This module is certainly not maximally efficient, and you may run into network- or OS-specific issues. Any and all feedback and contributions is highly appreciated.