Skip to content

Protein Alphabet

MultiMolecule provides a set of predefined alphabets for tokenization.

Standard Alphabet

The standard alphabet is an extended version of the IUPAC alphabet. This extension includes six additional symbols to the IUPAC alphabet, J, U, O, ., -, and *.

  • J: Xle; Leucine (L) or Isoleucine (I)
  • U: Sec; Selenocysteine
  • O: Pyl; Pyrrolysine
  • .: is not used in MultiMolecule and is reserved for future use.
  • -: is not used in MultiMolecule and is reserved for future use.
  • *: is not used in MultiMolecule and is reserved for future use.
Amino Acid Code Three letter Code Amino Acid
A Ala Alanine
C Cys Cysteine
D Asp Aspartic Acid
E Glu Glutamic Acid
F Phe Phenylalanine
G Gly Glycine
H His Histidine
I Ile Isoleucine
K Lys Lysine
L Leu Leucine
M Met Methionine
N Asn Asparagine
P Pro Proline
Q Gln Glutamine
R Arg Arginine
S Ser Serine
T Thr Threonine
V Val Valine
W Trp Tryptophan
Y Tyr Tyrosine
X Xaa Any amino acid
B Asx Aspartic acid (D) or Asparagine (N)
Z Glx Glutamine (Q) or Glutamic acid (E)
J Xle Leucine (L) or Isoleucine (I)
U Sec Selenocysteine
O Pyl Pyrrolysine
. Not Used
* *** Not Used
- Not Used

IUPAC Alphabet

IUPAC amino acid code is a standard amino acid code proposed by the International Union of Pure and Applied Chemistry (IUPAC) to represent Protein sequences.

The IUPAC amino acid code consists of three additional symbols to Streamline Alphabet, B, Z, and X.

Amino Acid Code Three letter Code Amino Acid
A Ala Alanine
B Asx Aspartic acid (D) or Asparagine (N)
C Cys Cysteine
D Asp Aspartic Acid
E Glu Glutamic Acid
F Phe Phenylalanine
G Gly Glycine
H His Histidine
I Ile Isoleucine
K Lys Lysine
L Leu Leucine
M Met Methionine
N Asn Asparagine
P Pro Proline
Q Gln Glutamine
R Arg Arginine
S Ser Serine
T Thr Threonine
V Val Valine
W Trp Tryptophan
X Xaa Any amino acid
Y Tyr Tyrosine
Z Glx Glutamine (Q) or Glutamic acid (E)

Streamline Alphabet

The streamline alphabet is a simplified version of the standard alphabet.

Amino Acid Code Three letter Code Amino Acid
A Ala Alanine
C Cys Cysteine
D Asp Aspartic Acid
E Glu Glutamic Acid
F Phe Phenylalanine
G Gly Glycine
H His Histidine
I Ile Isoleucine
K Lys Lysine
L Leu Leucine
M Met Methionine
N Asn Asparagine
P Pro Proline
Q Gln Glutamine
R Arg Arginine
S Ser Serine
T Thr Threonine
V Val Valine
W Trp Tryptophan
Y Tyr Tyrosine