Tutorial¶
Sampling ARGs¶
As a simple example, we will first simulate sample data with msprime. We will then run arginfer on the simulated dataset.
The following code simulates a tree sequence and the sequences for a sample size of 10 and sequence length of 1e5.
import msprime
import os
ts_full = msprime.simulate(sample_size=10, Ne=5000,
length=1e5,
mutation_rate=1e-8,
recombination_rate=1e-8,
record_full_arg= True,
random_seed=2)
os.makedirs(os.getcwd()+"/out")
ts_full.dump(os.getcwd()+"/out/"+"ts_full.args")
The output of this code is a tree sequence
stored in "out/" directory under the name of ts_full.args.
Next, the following command can
be used to run 200 MCMC iterations with burn-in 5 and retaining every 10 samples (thinning intervals = 10).
Also sample_size = n = 10
is the number of sequences each seq_length = L = 1e5
in length evolving in
a population of effective size Ne = 5000
, with
mutation rate 1e-8
mutations/generation/site and recombination rate 1e-8
recombinations/generation/site.
import arginfer
arginfer.infer_sim(
ts_full = "out/ts_full.args", # path to simulated ts
sample_size =10, # sample size
iteration= 200, # number of mcmc iterations
thin= 10, # thinning interval, retaining everry kth sample
burn=5, # burn-in period to discard
Ne =5000, # effective population size
seq_length= 1e5, # sequence length in bases
mutation_rate=1e-8, # mutation rate per site per generation
recombination_rate=1e-8, # recombination rate per site per generation
outpath = os.getcwd()+"/output", # output path
plot = True) # plot traces
or equivalently in terminal:
arginfer infer --tsfull "out/ts_full.args" \
-I 200 --thin 10 -b 5 \
-n 10 -L 1e5 --Ne 5000 \
-r 1e-8 -mu 1e-8 \
-O output \
--plot
The output of the above command is as follows:
summary.h5
: A summary of some ARG properties recorded in apandas dataframe
with columns:
pd.DataFrame(columns=('likelihood', 'prior', "posterior",
'ancestral recomb', 'non ancestral recomb',
'branch length'))
.arg
file: The sampled ARGs, which are pickledATS
objects.See here for more information on how manipulate these files (TODO).
arginfer*.pdf
: ifplot=True
, this pdf file will be generated which contains trace plots forthe log(posterior), ARG total branch length, number of ancestral recombinations,and number of non-ancestral recombinations.