Universal Molecular Encoder
Protein sequences with standard amino acid alphabet
Chemical structure representations
DNA/RNA sequences with nucleotide alphabet
Spatial molecular structure data
Modality-aware tokenization with shared vocabulary space
Shared parameters across all biological modalities
Cross-modal compatible representations