Metadata-Version: 2.4
Name: ldt-toolkit
Version: 0.0.3
Summary: Python library for longitudinal depression trajectories workflows.
Author-email: Simon Provost <simon.gilbert.provost@gmail.com>, Bianca Branco <s2714881@sms.ed.ac.uk>, Alex Kwong <akwong@ed.ac.uk>
License: MIT License
        
        Copyright (c) 2026 Simon Gilbert Provost
        
        Permission is hereby granted, free of charge, to any person obtaining a copy
        of this software and associated documentation files (the "Software"), to deal
        in the Software without restriction, including without limitation the rights
        to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
        copies of the Software, and to permit persons to whom the Software is
        furnished to do so, subject to the following conditions:
        
        The above copyright notice and this permission notice shall be included in all
        copies or substantial portions of the Software.
        
        THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
        IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
        FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
        AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
        LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
        OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
        SOFTWARE.
        
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: <3.13,>=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: beartype>=0.18.5
Requires-Dist: rich>=13.7.1
Requires-Dist: rich-pyfiglet>=1.0.0
Requires-Dist: numpy>=1.26.4
Requires-Dist: pandas>=2.2.2
Requires-Dist: PyYAML>=6.0.2
Requires-Dist: plotly>=5.24.1
Requires-Dist: scikit-learn<1.6,>=1.4.2
Requires-Dist: scikit-longitudinal>=0.1.7
Requires-Dist: tslearn>=0.7.0
Requires-Dist: terminaltexteffects>=0.14.2
Requires-Dist: skrub==0.6.2
Requires-Dist: miceforest>=6.0.5
Requires-Dist: notebook>=7.5.3
Requires-Dist: shap>=0.49.1
Requires-Dist: pipelineprofiler>=0.1.18
Requires-Dist: setuptools<81
Dynamic: license-file

<div align="center">
  <img src="public/logo/logo.gif" alt="Longitudinal Depression Trajectories Toolkit" width="820">
</div>

<p align="center"><big><big><strong>L</strong>ongitudinal <strong>D</strong>epression <strong>T</strong>rajectories <em>Toolkit</em></big></big></p>

<div align="center">
  <img src="https://img.shields.io/static/v1?label=Python&message=3.10%E2%80%933.12&color=3776AB&style=for-the-badge&logo=python&logoColor=white" alt="Python 3.10-3.12">
  <img src="https://img.shields.io/static/v1?label=UV&message=managed&color=2196F3&style=for-the-badge&logo=uv&logoColor=white" alt="UV managed">
  <img src="https://img.shields.io/static/v1?label=Ruff&message=compliant&color=0F172A&style=for-the-badge&logo=ruff&logoColor=white" alt="Ruff compliant">
  <img src="https://img.shields.io/static/v1?label=Beartype&message=verified&color=8E24AA&style=for-the-badge" alt="Beartype verified">
  <img src="https://img.shields.io/static/v1?label=Go%20CLI&message=supported&color=00ADD8&style=for-the-badge&logo=go&logoColor=white" alt="Go CLI supported">
  <img src="https://img.shields.io/static/v1?label=R&message=required%20for%20LGMMs&color=276DC3&style=for-the-badge&logo=r&logoColor=white" alt="R required for LGMMs">
</div>

<div align="center">
  <a href="https://github.com/OWNER/ldt-toolkit">Documentation (placeholder)</a> -
  <a href="https://life-epi-psych.github.io">LEAP Group</a>
</div>

## <img src="public/icons/lucide/github.svg" width="32" alt="" /> About The Project

The `Longitudinal Depression Trajectories Toolkit (LDT-Toolkit)` initiative is designed for *social, medical, and clinical researchers* who work with repeated-measure data and require a *stepping-stone path* from raw cohort files to *downstream modelling*.

`LDT-Toolkit` is intended as a general toolkit for *longitudinal depression trajectory exploration*. It is currently using the [Millennium Cohort Study (MCS) from CLS/UCL](https://cls.ucl.ac.uk/cls-studies/millennium-cohort-study/) as a *proof-of-concept* case study.

The initiative delivers *two interconnected components*. First, `ldt-toolkit` (this repository) is the *Python engine* of tools and reproducible pipelines to accelerate exploration of longitudinal studies toward downstream modelling, while remaining fully usable in Python scripts or notebooks. Second, `ldt` is a *fully interactive* Go CLI with a *no-code* terminal interface for running and orchestrating the toolkit from start to finish.

The toolset supports *two broad lines of exploration*. `Playground` methods allow researchers to quickly iterate on their own datasets by running operations across *data preparation*, *data preprocessing*, and *machine learning* phases. With `Presets`, researchers can run *stage-level reproducible pipelines* for a given longitudinal study, and *community contributions* are encouraged so this can grow into a reusable collection of presets for preparing, preprocessing, and modelling longitudinal datasets.

And last, within the *data preprocessing* stage, we offer a *brand-new and novel* `Trajectories Builder Playground`: apply existing trajectory-building algorithms to your longitudinal datasets, or design and submit your own builders to help shape a community consensus around reproducible depression-trajectory construction.

## <img src="public/icons/lucide/terminal.svg" width="32" alt="" /> Setup And Launch

> [!IMPORTANT]
> Use both components: `ldt-toolkit` (Python toolkit) and `ldt` (Go CLI). The CLI is the primary no-code interface and is highly recommended.


<details>
<summary><strong>Machine Requirements</strong> (<a href="https://www.r-project.org/">R</a>, <a href="https://www.python.org/downloads/">Python</a>, <a href="https://docs.astral.sh/uv/getting-started/installation/">uv</a>, <a href="https://go.dev/dl/">Go</a>)</summary>

Install the following first:
- Python `3.10` to `3.12`
- [`uv`](https://docs.astral.sh/uv/)
- Go (for `ldt` CLI)
- R + `Rscript` (required only when running `LCGA`/`GMM` tools through `lcmm`)

Quick checks:

```bash
python --version || python3 --version
uv --version
go version
Rscript --version
```

Example installs by OS:

```bash
# macOS (Homebrew)
brew update
brew install python@3.12 uv go r

# Linux (Ubuntu/Debian)
sudo apt-get update
sudo apt-get install -y python3 python3-pip golang r-base
curl -LsSf https://astral.sh/uv/install.sh | sh
```

</details>

### <img src="public/icons/lucide/tally-1.svg" width="28" alt="" /> Install `ldt-toolkit` ([PyPi Index](https://pypi.org/project/ldt-toolkit/))

```bash
uv add ldt-toolkit
# 👆 We recommend using `uv` as a state-of-the-art Python Package Manager
# 👆 Make sure to have `uv`'s utilising the right Python version. `uv ptyhon pin 3.10` for example, do the trick.
```

_If you plan to run `LCGA`/`GMM`, install R-side dependencies:_
```bash
Rscript --vanilla -e "repos <- 'https://cloud.r-project.org'; required_packages <- c('lcmm'); missing <- setdiff(required_packages, rownames(installed.packages())); if (length(missing)) install.packages(missing, repos = repos) else message('All required R packages are already installed.')"
# 👇 If you have cloned the repo, you can also run:
Rscript --vanilla setup_R/install_requirements.R
```

### <img src="public/icons/lucide/tally-2.svg" width="28" alt="" /> Install the `ldt` GO CLI (No-code Terminal Interface)

Homebrew (macOS/Linux):

```bash
brew tap Longitudinal-Depression-Toolkit/homebrew-tap
brew install ldt
```

[//]: # (Scoop &#40;Windows&#41;:)

[//]: # ()
[//]: # (```powershell)

[//]: # (scoop bucket add longitudinal-depression-toolkit https://github.com/Longitudinal-Depression-Toolkit/scoop-bucket)

[//]: # (scoop install ldt)

[//]: # (```)

<img src="public/icons/lucide/party-popper.svg" width="14" alt="" /> Voila! Now, you can run:

```bash
ldt
```
> [!WARNING]
> The full `ldt-toolkit` workflow is not supported by default on Windows yet.
> As a workaround, run the toolkit inside a Docker Linux environment, or use a macOS-based machine.
> Please open [an issue](https://github.com/Longitudinal-Depression-Toolkit/ldt-toolkit/issues?q=sort%3Aupdated-desc+is%3Aissue+is%3Aopen) if you want to contribute Windows support.


### <img src="public/icons/lucide/square-code.svg" width="32" alt="" /> Getting Started W/ the Toolset

<details>
<summary><strong>(1) Generate Synthetic Data (Multi-Technique)</strong></summary>

```python
from pathlib import Path

from ldt.data_preparation import EventShockRecovery, MissingDataScenarios, TrendPatterns

out = Path("/path/to/your/project/data")
out.mkdir(parents=True, exist_ok=True)

TrendPatterns(n_samples=400, n_waves=5, random_state=7).prepare().to_csv(
    out / "synthetic_trend_patterns.csv", index=False
)

EventShockRecovery().prepare(
    n_samples=400,
    n_waves=5,
    random_state=7,
    feature_cols=["depressive_score"],
    shock_wave=3,
    shock_mean=3.5,
    recovery_rate=0.9,
    noise_sd=0.8,
).to_csv(out / "synthetic_event_shock.csv", index=False)

MissingDataScenarios().prepare(
    n_samples=400,
    n_waves=5,
    random_state=7,
    feature_cols=["depressive_score", "sleep_score"],
    mechanism="mixed",
    missing_rate=0.20,
    dropout_rate=0.10,
    mar_strength=1.0,
).to_csv(out / "synthetic_with_missing.csv", index=False)
```

</details>

<details>
<summary><strong>(2) Build Trajectories (Two Techniques) + ShowTable</strong></summary>

```python
from pathlib import Path

from ldt.data_preprocessing import BuildTrajectories, ShowTable

input_long = Path("/path/to/your/project/data/synthetic_trend_patterns.csv")
out = Path("/path/to/your/project/outputs")
out.mkdir(parents=True, exist_ok=True)

BuildTrajectories().fit_preprocess(
    mode="from_scratch",
    input_path=input_long,
    output_path=out / "trajectories_dtw_kmeans.csv",
    id_col="subject_id",
    time_col="wave",
    value_cols=["depressive_score"],
    builder="dtw_kmeans",
    n_trajectories=4,
)
BuildTrajectories().fit_preprocess(
    mode="from_scratch",
    input_path=input_long,
    output_path=out / "trajectories_clusterMLD.csv",
    id_col="subject_id",
    time_col="wave",
    value_cols=["depressive_score"],
    builder="clusterMLD",
    n_trajectories=4,
)

ShowTable().fit_preprocess(
    input_path=out / "trajectories_clusterMLD.csv",
    output_html=out / "trajectories_clusterMLD_report.html",
    open_browser=False,
)
```

</details>

<details>
<summary><strong>(3) End-to-End: Synthetic Data to Standard ML</strong></summary>

```python
from pathlib import Path

from ldt.data_preparation import MissingDataScenarios
from ldt.data_preprocessing import (
    AggregateLongToCrossSectional,
    BuildTrajectories,
    CleanDataset,
    CombineDatasetWithTrajectories,
    MissingImputation,
)
from ldt.machine_learning import StandardMachineLearning

root = Path("/path/to/your/project")
raw_long = root / "data/synthetic_long_with_missing.csv"
clean_long = root / "outputs/long_clean.csv"
imputed_long = root / "outputs/long_imputed.csv"
trajectories = root / "outputs/trajectories.csv"
cross_sectional = root / "outputs/cross_sectional.csv"
model_ready = root / "outputs/model_ready.csv"

long_df = MissingDataScenarios().prepare(
    n_samples=1200,
    n_waves=6,
    random_state=42,
    feature_cols=["depressive_score", "sleep_score", "anxiety_score"],
    mechanism="mixed",
    missing_rate=0.20,
    dropout_rate=0.15,
    mar_strength=1.10,
)
raw_long.parent.mkdir(parents=True, exist_ok=True)
long_df.to_csv(raw_long, index=False)

CleanDataset().fit_preprocess(input_path=raw_long, output_path=clean_long)
MissingImputation().fit_preprocess(
    technique="mice_imputation",
    input_path=clean_long,
    output_path=imputed_long,
    random_state=42,
)
BuildTrajectories().fit_preprocess(
    mode="from_scratch",
    input_path=imputed_long,
    output_path=trajectories,
    id_col="subject_id",
    time_col="wave",
    value_cols=["depressive_score"],
    builder="clusterMLD",
    n_trajectories=4,
)
AggregateLongToCrossSectional().fit_preprocess(
    input_path=imputed_long,
    output_path=cross_sectional,
    subject_id_col="subject_id",
    numeric_columns=["depressive_score", "sleep_score", "anxiety_score", "age_baseline"],
    numeric_agg="mean",
)
CombineDatasetWithTrajectories().fit_preprocess(
    input_original_data_path=cross_sectional,
    input_trajectories_data_path=trajectories,
    output_path=model_ready,
    original_id_col="subject_id",
    trajectory_id_col="subject_id",
    merge_type="left",
    trajectory_columns=["trajectory_id", "trajectory_name"],
)

ml_result = StandardMachineLearning().fit_predict(
    technique="run_experiment",
    input_path=model_ready,
    target_column="trajectory_id",
    feature_columns="depressive_score,sleep_score,anxiety_score,age_baseline",
    estimator_key="random_forest",
    metric_keys="accuracy,f1_macro",
    cv_folds=5,
    validation_split="none",
    multiclass_mode="multiclass",
    random_seed=42,
    output_dir=str(root / "outputs/standard_ml"),
)
print(ml_result["mean_score"], ml_result["report_path"])
```

</details>

*But there is more: explore the full [documentation](https://github.com/OWNER/ldt-toolkit) for the complete tool and presets catalogue.*

## <img src="public/icons/lucide/graduation-cap.svg" width="32" alt="" /> Citation

```bibtex
@software{Provost_Longitudinal_Depression_Trajectories_Toolkit,
    author = {Provost, Simon and Branco, Bianca and Kwong, Alex},
    title = {{Longitudinal Depression Trajectories Toolkit: Machine Learning for Longitudinal Studies}},
    version = {0.0.1}
}
```

Use GitHub's "Cite this repository" for citation metadata updates.

## <img src="public/icons/lucide/fingerprint-pattern.svg" width="32" alt="" /> License, Data, Security

- Access to [Millennium Cohort Study data is governed by CLS/UCL data access rules](https://cls.ucl.ac.uk/cls-studies/millennium-cohort-study/).
- Review [SECURITY.md](./SECURITY.md) for handling expectations.
- Licensed under the [MIT License](./LICENSE).


_Special thanks to [@charm.land](https://charm.land) for their amazing TUI framework!_
