.bed (PLINK binary biallelic genotype table)
Primary representation of genotype calls at biallelic variants. Must be accompanied by .bim and .fam files. Loaded with --bfile; generated in many situations, most notably when the --make-bed command is used. Do not confuse this with the UCSC Genome Browser's BED format, which is totally different.

The first three bytes should be 0x6c, 0x1b, and 0x01 in that order. (There are old versions of the .bed format which start with a different "magic number"; PLINK 1.9 recognizes them, but will convert sample-major files to the current variant-major format on sight. See the bottom of the original .bed definition page for details; that page also contains a more verbose version of the discussion below.)

The rest of the file is a sequence of V blocks of N/4 (rounded up) bytes each, where V is the number of variants and N is the number of samples. The first block corresponds to the first marker in the .bim file, etc.

The low-order two bits of a block's first byte store the first sample's genotype code. ("First sample" here means the first sample listed in the accompanying .fam file.) The next two bits store the second sample's genotype code, and so on for the 3rd and 4th samples. The second byte stores genotype codes for the 5th-8th samples, the third byte stores codes for the 9th-12th, etc.

The two-bit genotype codes have the following meanings:

00	Homozygous for first allele in .bim file
01	Missing genotype
10	Heterozygous
11	Homozygous for second allele in .bim file
If N is not divisible by four, the extra high-order bits in the last byte of each block are always zero.

For example, consider the following text fileset:

test.ped:
  1 1 0 0 1  0  G G  2 2  C C
  1 2 0 0 2  0  A A  0 0  A C
  1 3 1 2 1  2  0 0  1 2  A C
  2 1 0 0 1  0  A A  2 2  0 0
  2 2 0 0 2  2  A A  2 2  0 0
  2 3 1 2 1  2  A A  2 2  A A

test.map:
  1 snp1 0 1
  1 snp2 0 2
  1 snp3 0 3

If you load it in PLINK 1.9, a .bed file containing the following sequence of bytes will be autogenerated (you can view it with e.g. Unix xxd):

  0x6c 0x1b 0x01 0xdc 0x0f 0xe7 0x0f 0x6b 0x01

and the following .bim file will accompany it:

  1  snp1  0  1  G  A
  1  snp2  0  2  1  2
  1  snp3  0  3  A  C

(For brevity, we don't reproduce the .fam here.) We can decompose the .bed file as follows:

The first three bytes are the magic number.
Since there are six samples, each marker block has size 2 bytes (six divided by four, rounded up). Thus genotype data for the first marker ('snp1') is stored in the 4th and 5th bytes.
The 4th byte value of 0xdc is 11011100 in binary. Since the low-order two bits are '00', the first sample is homozygous for the first allele for this marker listed in the .bim file, which is 'G'. The second sample has genotype code '11', which means she's homozygous for the second allele ('A'). The third sample's code of '01' designates a missing genotype call, and the fourth code of '11' indicates another AA.
The 5th byte value of 0x0f is 00001111 in binary. This indicates that the fifth and sixth samples also have the AA genotype at snp1. There is no sample #7 or #8, so the high-order 4 bits of this byte are zero.
The 6th and 7th bytes store genotype data for the second marker ('snp2'). The 6th byte value of 0xe7 is 11100111 in binary. The '11' code for the first sample means that he's homozygous for the second snp2 allele ('2'), the '01' code for the second sample indicates a missing call, the '10' code for the third indicates a heterozygous genotype, and '11' for the fourth indicates another homozygous '2'. The 7th byte value of 0x0f indicates that the fifth and sixth samples also have homozygous '2' genotypes.
Finally, the 8th and 9th bytes store genotype data for the third marker ('snp3'). You can test your understanding of the file format by interpreting this by hand and then comparing to the .ped file above.
.bim (PLINK extended MAP file)
Extended variant information file accompanying a .bed binary genotype table. (--make-just-bim can be used to update just this file.)

A text file with no header line, and one line per variant with the following six fields:

Chromosome code (either an integer, or 'X'/'Y'/'XY'/'MT'; '0' indicates unknown) or name
Variant identifier
Position in morgans or centimorgans (safe to use dummy value of '0')
Base-pair coordinate (1-based; limited to 231-2)
Allele 1 (corresponding to clear bits in .bed; usually minor)
Allele 2 (corresponding to set bits in .bed; usually major)
Allele codes can contain more than one character. Variants with negative bp coordinates are ignored by PLINK.

See the --keep-allele-order documentation for more discussion of why allele 1 is usually minor and 2 is usually major.

.fam (PLINK sample information file)
Sample information file accompanying a .bed binary genotype table. (--make-just-fam can be used to update just this file.) Also generated by "--recode lgen" and "--recode rlist".

A text file with no header line, and one line per sample with the following six fields:

Family ID ('FID')
Within-family ID ('IID'; cannot be '0')
Within-family ID of father ('0' if father isn't in dataset)
Within-family ID of mother ('0' if mother isn't in dataset)
Sex code ('1' = male, '2' = female, '0' = unknown)
Phenotype value ('1' = control, '2' = case, '-9'/'0'/non-numeric = missing data if case/control)
With the use of additional loading flag(s), PLINK can also correctly interpret some .fam files missing one or more of these fields.

If there are any numeric phenotype values other than {-9, 0, 1, 2}, the phenotype is interpreted as a quantitative trait instead of case/control status. In this case, -9 normally still designates a missing phenotype; use --missing-phenotype if this is problematic.

Several PLINK commands (e.g. --cluster) merge the FID and IID with an underscore in their reports; for example, a sample with FID = 'Chang' and IID = 'Christopher' would be referenced as 'Chang_Christopher'. We preserve this behavior for backwards compatibility, so you should avoid using underscores in FIDs and IIDs (consider '~' instead).

If your case/control phenotype is encoded as '0' = control and '1' = case, you'll need to specify --1 to load it properly.

	 
BED file format
This page describes the format of binary PED (BED) files. Consider the following example PED file, test.ped:
     1 1 0 0 1  0    G G    2 2    C C
     1 2 0 0 1  0    A A    0 0    A C
     1 3 1 2 1  2    0 0    1 2    A C
     2 1 0 0 1  0    A A    2 2    0 0
     2 2 0 0 1  2    A A    2 2    0 0
     2 3 1 2 1  2    A A    2 2    A A
and corresponding MAP file test.map
     1 snp1 0 1
     1 snp2 0 2
     1 snp3 0 3
We create a binary fileset with the following command:
plink --file test --make-bed --out test
which produces output:

     @----------------------------------------------------------@
     |         PLINK!       |    v0.99l     |   27/Jul/2006     |
     |----------------------------------------------------------|
     |  (C) 2006 Shaun Purcell, GNU General Public License, v2  |
     |----------------------------------------------------------|
     |       http://pngu.mgh.harvard.edu/purcell/plink/         |
     @----------------------------------------------------------@

     Web-based version check ( --noweb to skip )
     Connecting to web...  OK, v0.99l is current

     *** Pre-Release Testing Version ***

     Writing this text to log file [ test.log ]
     Analysis started: Sat Jul 29 17:22:59 2006

     Options in effect:
             --file test
             --make-bed
             --out test

     3 (of 3) markers to be included from [ test.map ]
     6 individuals read from [ test.ped ]
     3 individuals with nonmissing phenotypes
     Assuming a binary trait (1=unaff, 2=aff, 0=miss)
     Missing phenotype value is also -9
     Before frequency and genotyping pruning, there are 3 SNPs
     Applying filters (SNP-major mode)
     4 founders and 2 non-founders found
     0 SNPs failed missingness test ( GENO > 1 )
     0 SNPs failed frequency test ( MAF < 0 )
     After frequency and genotyping pruning, there are 3 SNPs
     Writing pedigree information to [ test.fam ]
     Writing map (extended format) information to [ test.bim ]
     Writing genotype bitfile to [ test.bed ]
     Using (default) SNP-major mode
     Analysis finished: Sat Jul 29 17:37:57 2006
and generates files
     test.bed
     test.bim
     test.fam
The file test.bim is the extended map file, which also includes the names of the alleles: (chromosome, SNP, cM, base-position, allele 1, allele 2):
     1       snp1    0       1       G       A
     1       snp2    0       2       1       2
     1       snp3    0       3       A       C
The file test.fam is simply the first six columns of test.ped
     1 1 0 0 1 0
     1 2 0 0 1 0
     1 3 1 2 1 2
     2 1 0 0 1 0
     2 2 0 0 1 2
     2 3 1 2 1 2
We can inspect the BED file with the Unix xxd command, to view a binary file:
xxd -b test.bed
which generates:
     0000000: 01101100 00011011 00000001 11011100 00001111 11100111  l.....
     0000006: 00001111 01101011 00000001                             .k.
The actual binary data are the nine blocks of 8 bits (a byte) in the center: the first 3 bytes have a special meaning. The first two are fixed, a 'magic number' that enables PLINK to confirm that a BED file is really a BED file. That is, BED files should always start 01101100 00011011. The third byte indicates whether the BED file is in SNP-major or individual-major mode: a value of 00000001 indicates SNP-major (i.e. list all individuals for first SNP, all individuals for second SNP, etc) whereas a value of 00000000 indicates individual-major (i.e. list all SNPs for the first individual, list all SNPs for the second individual, etc). By default, all BED files are SNP-major mode (as is the example below).
Here we have extracted and annotated the relevant part of the xxd output:
     |-magic number--| |-mode-| |--genotype data---------| 

     01101100 00011011 00000001 11011100 00001111 11100111

     |--genotype data-cont'd--| 

     00001111 01101011 00000001 

For the genotype data, each byte encodes up to four genotypes (2 bits per genoytpe). The coding is
     00  Homozygote "1"/"1"
     01  Heterozygote
     11  Homozygote "2"/"2"
     10  Missing genotype
The only slightly confusing wrinkle is that each byte is effectively read backwards. That is, if we label each of the 8 position as A to H, we would label backwards:
     01101100
     HGFEDCBA
and so the first four genotypes are read as follows:
     01101100
     HGFEDCBA

           AB   00  -- homozygote (first)
         CD     11  -- other homozygote (second)
       EF       01  -- heterozygote (third)
     GH         10  -- missing genotype (fourth)
Finally, when we reach the end of a SNP (or if in individual-mode, the end of an individual) we skip to the start of a new byte (i.e. skip any remaining bits in that byte).
It is important to remember that the files test.bim and test.fam will already have been read in, so PLINK knows how many SNPs and individuals to expect.
So, considering the full test.bed file: here we consider the six bytes that contain all the genotype data. We consider them one at a time, showing how the 4 genotypes are extracted from each byte to make up the entire dataset. Some positions are called null meaning that all the genotypes for that SNP have been read in, so we advance to the start of a new byte for the next SNP (when in SNP-major mode):

                Genotype    Person    SNP
     11011100 

           00   G/G         1 1       snp1
         11     A/A         1 2       snp1
       10       0/0         1 3       snp1
     11         A/A         2 1       snp1


     00001111 

           11   A/A         2 2       snp1
         11     A/A         2 3       snp1
       00       (null)
     00         (null)


     11100111
           
           11   2/2         1 1       snp2
         10     0/0         1 2       snp2
       01       1/2         1 3       snp2
     11         2/2         2 1       snp2


     00001111 
  
           11   2/2         2 2       snp2
         11     2/2         2 3       snp2
       00       (null) 
     00         (null)


     01101011

           11   C/C         1 1       snp3
         01     A/C         1 2       snp3
       01       A/C         1 3       snp3
     10         0/0         2 1       snp3


     00000001

           10   0/0         2 2       snp3
         00     A/A         2 3       snp3
       00       (null)
     00         (null)

In summary, the following define the BED file format
First two bytes 01101100 00011011 for PLINK v1.00 BED file
Third byte is 00000001 (SNP-major) or 00000000 (individual-major)
Genotype data, either in SNP-major or individual-major order
New "row" always starts a new byte
Each byte encodes up to 4 genotypes
10 indicates missing genotype, otherwise 0 and 1 point to allele 1 or allele 2 in the BIM file, respectively
Bits in each byte read in reverse order
Any changes to this format will be accompanied by a different, unique magic number and will be backwards compatabile in PLINK
Old versions Earlier versions: v0.99 BED files do not contain the 2-byte magic number; BED files prior to 0.99 are always in individual-major mode and contain neither the magic-numbers nor the SNP-major/individual-major identifier. PLINK will indicate if these earlier, legacy files are found.
