Metadata-Version: 2.4
Name: fmpose3d
Version: 0.0.9
Author: Xiaohang Yu
Author-email: Ti Wang <ti.wang@epfl.ch>, Mackenzie Weygandt Mathis <mackenzie.mathis@epfl.ch>
License: Apache 2.0
Keywords: pose estimation,3D pose,flow matching,computer vision
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE.txt
Requires-Dist: torch>=2.4.1
Requires-Dist: torchvision>=0.19.1
Requires-Dist: timm>=1.0.0
Requires-Dist: einops>=0.4.0
Requires-Dist: numpy<2.0,>=1.18.5
Requires-Dist: tqdm>=4.60.0
Requires-Dist: scipy>=1.7.0
Requires-Dist: yacs>=0.1.8
Requires-Dist: opencv-python>=4.5.0
Requires-Dist: numba>=0.56.0
Requires-Dist: scikit-image>=0.19.0
Requires-Dist: filterpy>=1.4.5
Requires-Dist: pandas>=1.0.1
Requires-Dist: huggingface_hub>=0.20.0
Provides-Extra: dev
Requires-Dist: pytest>=7.0.0; extra == "dev"
Requires-Dist: black>=22.0.0; extra == "dev"
Requires-Dist: flake8>=4.0.0; extra == "dev"
Requires-Dist: isort>=5.10.0; extra == "dev"
Provides-Extra: wandb
Requires-Dist: wandb>=0.12.0; extra == "wandb"
Provides-Extra: viz
Requires-Dist: matplotlib>=3.5.0; extra == "viz"
Requires-Dist: opencv-python>=4.5.0; extra == "viz"
Provides-Extra: animals
Requires-Dist: deeplabcut>=3.0.0rc13; extra == "animals"
Dynamic: license-file

# FMPose3D: monocular 3D pose estimation via flow matching

![Version](https://img.shields.io/badge/python_version-3.10-purple)
[![PyPI version](https://badge.fury.io/py/fmpose3d.svg?icon=si%3Apython)](https://badge.fury.io/py/fmpose3d)
[![PyPI Downloads](https://static.pepy.tech/personalized-badge/fmpose3d?period=total&units=INTERNATIONAL_SYSTEM&left_color=BLACK&right_color=GREEN&left_text=downloads)](https://pepy.tech/projects/fmpose3d)

Code:[![License: Apache 2.0](https://img.shields.io/badge/License-Apache_2.0-blue.svg)](https://www.apache.org/licenses/LICENSE-2.0)
Models:[![License: NC](https://img.shields.io/badge/License-NC-red.svg)](https://creativecommons.org/licenses/by-nc-nd/4.0/deed.en)

This is the official implementation of the approach described in:

[**FMPose3D: monocular 3D pose estimation via flow matching**](https://arxiv.org/abs/2602.05755) **CVPR 2026**           
Ti Wang, Xiaohang Yu, Mackenzie Weygandt Mathis

<!-- <p align="center"><img src="./images/Frame 4.jpg" width="50%" alt="" /></p> -->

<p align="center"><img src="./images/predictions.jpg" width="95%" alt="" /></p>

## 🚀 TL;DR

FMPose3D creates a 3D pose from a single 2D image. It leverages fast Flow Matching, generating multiple plausible 3D poses via an ODE in just a few steps, then aggregates them using a reprojection-based Bayesian module (RPEA) for accurate predictions, achieving state-of-the-art results on human and animal 3D pose benchmarks.



## News!
- [X] Feb 2026: The FMPose3D paper was accepted to CVPR 2026! 🔥
- [X] Feb 2026: the FMPose3D code and our arXiv paper is released - check out the demos here or on our [project page](https://xiu-cs.github.io/FMPose3D/)
- [X] March 2026: This method is integrated into [DeepLabCut](https://www.mackenziemathislab.org/deeplabcut)

## Installation

### Set up an environment

Make sure you have Python 3.10+. You can set this up with:
```bash
conda create -n fmpose_3d python=3.10
conda activate fmpose_3d

pip install fmpose3d
```

For the animal pipeline, install the optional DeepLabCut dependency:

```bash
pip install "fmpose3d[animals]"
```

## Demos

### Testing on in-the-wild images (humans)

This visualization script is designed for single-frame based model, allowing you to easily run 3D human pose estimation on any single image.

Pre-trained weights are downloaded automatically from [Hugging Face](https://huggingface.co/MLAdaptiveIntelligence/FMPose3D) the first time you run inference, so no manual setup is needed.

Alternatively, you can use your own trained weights or download ours from [Google Drive](https://drive.google.com/drive/folders/1aRZ6t_6IxSfM1nCTFOUXcYVaOk-5koGA?usp=sharing), place them in the `./pre_trained_models` directory, and set `model_weights_path` in the shell script (e.g. `demo/vis_in_the_wild.sh`).

Next, put your test images into folder `demo/images`. Then run the visualization script:
```bash
sh vis_in_the_wild.sh
```
The predictions will be saved to folder `demo/predictions`.

<p align="center"><img src="./images/demo.gif" width="95%" alt="" /></p>

## Training and Inference

### Dataset Setup

#### Setup from original source 
You can obtain the Human3.6M dataset from the [Human3.6M](http://vision.imar.ro/human3.6m/) website, and then set it up using the instructions provided in [VideoPose3D](https://github.com/facebookresearch/VideoPose3D). 

#### Setup from preprocessed dataset (Recommended)
 You also can access the processed data by downloading it from [here](https://drive.google.com/drive/folders/112GPdRC9IEcwcJRyrLJeYw9_YV4wLdKC?usp=sharing).

Place the downloaded files in the `dataset/` folder of this project:

```
<project_root>/
├── dataset/
│   ├── data_3d_h36m.npz
│   ├── data_2d_h36m_gt.npz
│   └── data_2d_h36m_cpn_ft_h36m_dbb.npz
```

### Training 

The training logs, checkpoints, and related files of each training time will be saved in the './checkpoint' folder.

For training on Human3.6M:
```bash
sh ./scripts/FMPose3D_train.sh
```

### Inference

Pre-trained weights are fetched automatically from Hugging Face on the first run. You can also use local weights by setting `model_weights_path` in the shell script (see [Demos](#testing-on-in-the-wild-images-humans) above for details).

To run inference on Human3.6M:

```bash
sh ./scripts/FMPose3D_test.sh
```

### Inference API

FMPose3D also ships a high-level Python API for end-to-end 3D pose estimation from images. See the [Inference API documentation](fmpose3d/inference_api/README.md) for the full reference.

## Experiments on non-human animals

For animal training/testing and demo scripts, see [animals/README.md](animals/README.md).

## Citation 

```
@misc{wang2026fmpose3dmonocular3dpose,
      title={FMPose3D: monocular 3D pose estimation via flow matching}, 
      author={Ti Wang and Xiaohang Yu and Mackenzie Weygandt Mathis},
      year={2026},
      journal={CVPR},
      url={https://arxiv.org/abs/2602.05755}, 
}
```

## Acknowledgements

We thank the Swiss National Science Foundation (SNSF Project # 320030-227871) and the Kavli Foundation for providing financial support for this project.

Our code is extended from the following repositories. We thank the authors for releasing the code. 

- [MHFormer](https://github.com/Vegetebird/MHFormer)
- [StridedTransformer-Pose3D](https://github.com/Vegetebird/StridedTransformer-Pose3D)
- [ST-GCN](https://github.com/vanoracai/Exploiting-Spatial-temporal-Relationships-for-3D-Pose-Estimation-via-Graph-Convolutional-Networks)
- [VideoPose3D](https://github.com/facebookresearch/VideoPose3D)
