Detection of de novo variants

Description   Prerequisites   Dialog   Practical tips   Algorithm

Description

Identification of de novo variants in a trio (child + both parents). Posterior de novo probabilities are computed using a Baysian approach.

Prerequisites

To apply the de novo algorithm in FILTUS, the following requirements should be met:

NOTE: If your variant files don't meet the above requirements, there may still be hope! In many cases you can identify potential de novo variants using the family based filter with a dominant model (follow the link to see details). For instance, if all you have are individual variant files for a child and both parents, this would be the thing to do. The downside is that you wouldn't get posterior probabilities - or any other meassure of classification strength - and probably quite a few more false positives.

Dialog

De novo dialog

To open the de novo dialog, choose De novo variant detection in the Analysis menu. Any filters (e.g. PASS) should be applied before opening the dialog. The entry points of the dialog are as follows:

Trio samples
Indicate the sample numbers (corresponding to the sample order in the Loaded samples window in the main Filtus area.) Alternatively, you can use the syntax ID <string> where <string> is a unique identifier for the sample name (as given in the variant file). For example, if the sample name of the child is "Trio1_child" you can write ID child. If the string is not unique (i.e. if there are multiple loaded sample names containing "child"), you will be warned.

Mutation rate
This is used in the algorithm as the prior probability of a mutation at a given position in a single meiosis. Default value: 1e-8. The algorithm is usually not very sensitive to this parameter, but the posterior probabilities will be affected if you change it radically.

Allele frequencies
Indicate a column containing frequencies for the ALT alleles. The Missing entry value is substituted whenever the column does not contain a number. If your variant file does not have frequency data, the Missing entry value will be used for all variants. Without correct frequencies, the program will still identify the same potential de novo variants, but the posterior probabilies may be less accurate.

NB: For multiallelic variants, the frequency column is ignored. As a result, any multiallelic variants detected by the de novo algorithm gets a missing posterior probability ('-'). Since the output is sorted on the posterior probability, these will end up at the bottom of the list.

Output filters
The purpose of these filters is to reduce the number of false positives in the output. A true heterozygous de novo variant is expected to present with ALT/REF ratio close to 50% in the child, and 0% in the parents. In practice some slack is recommended, e.g. child > 30% and parents < 5%.

One may experiment with these filters for other purposes too: For example, to look for de novo mosiac variants in the child one could try a very loose cutoff for the child, e.g. child > 10%, while requiring parents = 0%. Or oppositely, for variants inherited from a mosaic parent, something like parents < 25% and child > 40% would be sensible, without including too much garbage. Of course, these are merely suggestions whose validity depend heavily on the actual contents of the variant file. (E.g. the quality of the variants and the parameters of the variant calling.)

Summary
A summary of the findings is printed here. The identified variants are shown in the main Filtus window; to inspect them you must close the de novo dialog.

Practical tips

Tip 1: To save the results, first close the de novo dialog, and then select Save main window content in the File menu.

Tip 2: When browsing variants in the main Filtus window, you can right click on any particular variant to see details about that variant for all the samples.

Algorithm

TODO