Gene sharing: Unrelated samples
Description
Prerequisites
Dialog
Practical tips
Algorithm
Description
Identification and statistical evaluation of genes where all the affected have variants compatible with the disease model. Unaffected samples are used as an additional filter.
Prerequisites
The analysis can be applied to any variant files loadable in Filtus, as long as the genotype format is correctly given in the input settings.
In particular, samples from different variant files can be analyzed together, even if the files have different formats.
For recessive analyses, all involved samples must have a well defined gene column (specified in the input settings).
Dialog

-
Model
-
The genetic disease model, either Dominant, Recessive c/h (which includes both compound heterozygous and homozygous models) or Recessive homoz (homozygous only).
-
Affected
-
The affected samples, separated by comma. Samples can be specified in various ways:
- Sample numbers, corresponding to the order in the Loaded samples window. Example: 2,3,4,10,11,15.
- Number ranges. For instance the previous example could be written 2-4, 10-11, 15.
- Text identifiers. You can write ID <s1>, ... where <s1>, ... are uniquely identifying parts of the wanted sample names.
For example, to specify the samples bigprojectA001 and bigprojectA002, one could use ID A001, A002, or perhaps just ID 1, 2.
If a string matches more than one sample name, you will get a warning.
- Negative selection. If the entry starts with NOT, all loaded samples will be chosen except those specified. This works both with numbers and ID identifiers.
For example, NOT 1-10 selects all samples except the first 10, while NOT ID A001, A002 results in all samples except the two matching
A001 and A002 respectively.
-
Healthy
-
The unaffected/control samples. Notational options are as for the Affected field.
Practical tips
Tip 1:
The identified genes are displayed in the main window, sorted on the p-value. You can sort on any column by right clicking on the column header.
Tip 2:
Right click on a gene to view the relevant variants in that gene. You can also view the variants in all identified genes simultaneously, or in the genes on top of the list.
The algorithm
The p-values are computed according to the statistical model described in
Statistical guidance for experimental design and data analysis of mutation
detection in rare monogenic mendelian diseases by exome sequencing by Zhi and Chen (2012).