modelbest_robo_dataset

具身智能多模态数据集工具库 · 统一格式 · 多源适配 · SSTable + Parquet + MP4

11
Core Files
1,790
Lines
4
Sources
37,129
Episodes
14
Datasets

数据流水线

原始数据
LeRobot Parquet
RH20T npy+mp4
fuse TFRecord
RoboMind HDF5
Source 适配
RawEpisode
统一中间表示
state / action / video
Writer 写入
→ SSTable 骨架
→ Parquet 时序
→ per-ep MP4
Reader 读取
Episode
骨架 + 时序采样
+ 视频解码
训练使用
PyTorch Dataset
__getitem__
帧级样本

三层架构

Source 适配层
L
LeRobotSource
lerobot.py · 169 行
Parquet + info.json
pushtxarmaloha
R
RH20TSource
rh20t.py · 277 行
npy + MP4 + WAV
tcp gripper force
cfg1~7 11,952 eps
F
FuseSource
fuse.py · 170 行
TFRecord / RLDS
state action imu mic
fuse 24,081 eps
M
RoboMindSource
robomind.py · 266 行
HDF5 · 3 变体自动检测
puppetfrankatiangong
核心库
T
data_types.py
200 行 · Pydantic
Episode MetaContent
TimeseriesContent VideoContent
TimeseriesName (12 种)
W
writer.py
442 行 · EmbodiedWriter
RawEpisode → 统一格式
SSTable (MbTableBuilder)
Parquet + MP4 + stats
R
reader.py
244 行 · EmbodiedReader
SSTable partition → Episode
load_sample(ep, ts)
Parquet 缓存 + 视频解码
V
validator.py
79 行 · ActionValidator
量级比 + 相关性校验
absolute vs delta
B
base.py
90 行
RawEpisode 中间表示
EpisodeSource Protocol
dim_names 标注维度语义
应用层
D
embodied_dataset.py
138 行 · PyTorch Dataset
预加载骨架 + Parquet
__getitem__ 按 timestamp
视频按需解码 (PyAV)
C
convert.py
124 行 · CLI 入口
--source lerobot | rh20t | fuse | robomind
--all 批量 · --max-episodes
进度日志 + ETA
G
generate_viz.py
606 行 · 可视化生成
多相机同步播放
时序曲线联动 + 音频
→ 14 个 HTML 页面
E
expand_skeleton.py
260 行 · 骨架展开
episode 级 → 帧级骨架
action chunk 切片
训练用 single_frame

Episode 骨架结构

Episode
├─ episode_index: int
├─ duration: float
└─ messages: list[MessageType]

EnvMessage (role="env")
├─ MetaContent
│ task, task_id, robot_type,
│ quality_rating, dim_names, ...
├─ VideoContent → per-ep MP4
├─ TimeseriesContent → Parquet
│ name: env.obs.*
└─ AudioContent → Parquet bytes
AiMessage (role="assistant")
└─ TimeseriesContent → Parquet
   name: ai.action.*

TimeseriesName (12 种 Literal)
├─ env.obs.joint_position
├─ env.obs.cartesian_position
├─ env.obs.force_torque / imu
├─ ai.action.joint_position
├─ ai.action.cartesian_position
└─ ai.action.delta_* (3种)

dim_names 标注每维语义
如 ["x","y","z","qx","qy","qz","qw","gripper"]

存储格式

sstable/
├── skeleton_episode/{name}/Episode 骨架 (SSTable partition)
│ ├── part-00000SSTable 分片 (modelbest_sdk MbTableBuilder)
│ ├── part-00001每片 10,000 条
│ └── ...
├── data/{name}/Parquet 时序表
│ ├── state/chunk-000/file-000.parquet观测 [T, D]
│ ├── action/chunk-000/file-000.parquet控制 [T, D]
│ ├── force/RH20T @100Hz
│ ├── imu/fuse IMU
│ └── audio_env/WAV bytes
├── videos/{name}/{cam}/per-episode H.264 MP4
│ └── episode_000000.mp4
└── meta/{name}/
├── info.json数据集信息
└── stats.jsonmean / std / min / max

文件清单

modelbest_robo_dataset/核心库 + 应用层
├── __init__.py45
├── data_types.py200Pydantic 类型定义
├── writer.py442RawEpisode → 统一格式
├── reader.py218SSTable → Episode
├── validator.py79State/Action 校验
├── embodied_dataset.py138训练侧 PyTorch Dataset 封装
├── sources/
│ ├── base.py90RawEpisode + EpisodeSource
│ ├── lerobot.py169LeRobot v3.0
│ ├── rh20t.py277RH20T (npy+mp4+wav)
│ ├── fuse.py170fuse/DIGIT (TFRecord)
│ └── robomind.py266RoboMIND (HDF5 3变体)
└── scripts/
├── convert.py124CLI 转换入口
├── expand_skeleton.py260骨架展开
├── generate_viz.py606HTML 可视化生成
├── viz_template.html405数据集详情页模板
└── viz_assets/静态总览页资源
├── index.html243总览面板 + datasets.json 卡片入口
└── architecture.html297代码架构静态页面

已接入数据集

LR
pusht
206 eps · cartesian 2D
LR
xarm_push_medium
800 eps · delta_joint 3D
LR
aloha_sim_insertion
50 eps · joint 14D (双臂)
RH
rh20t_cfg1~7
11,952 eps · EE 8D + force + audio
FS
fuse
24,081 eps · delta_EE + IMU + audio
RM
robomind_failure
30→1678 eps · tiangong 14D
RM
robomind_puppet
5 eps · puppet 双臂 7D
RM
robomind_franka
5 eps · franka 8D