ProtDCal is a user-friendly software package that was developed to generate a variety of numeric descriptors for protein structures and sequences. This manual is intended to provide an overview of the main interfaces and functionalities of the program. As part of the current distribution of ProtDCal, one can find a similar tutorial and a theory section describing the formalism and parameters of the indices implemented in the program. ProtDCal's feature generation strategy comprises four hierarchical levels:
1. An initial layer intended to select the type of indices to encode for each residue. These indices are grouped in three main classes:
2. Modification operators, these approaches are intended to modify the value of a selected index for a given residue according to the residues within a vicinity defined by the type of modification operator and its parameter value (e.g. for the autocorrelation operator with parameter k = 2, the neighbourhood of residue i comprises the residues in positions i +/- 2). ProtDCal implements five modification operators that can be selected in the Menu: "Options/Weighting operators".
3. A third layer named "Groups" is intended to select one or more groups of residues according their ID or type. When a group of residues is selected, an array of index values is obtained corresponding to the residues in the group. In addition to the implemented grouping approaches, an option is included by which users can define their own groups of residues (see the option Groups in menu Option).
4. A fourth layer comprises several aggregation operators that are used to combine an array of values (from a group of residues) into a single value (descriptor) reflecting the distribution of the index within that group. Some examples of these aggregation operators are the sum, average, variance, kurtosis, geometric mean, information content, etc.
The output of the calculation shows the full combination of indices, groups and aggregation operators selected in each panel. The input file formats of the software can be either PDB or FASTA; for PDB files, all indices can be computed, whereas for FASTA files, only the indices of the second (Thermodynamics indices for sequences) and fourth (Properties-based indices) panels can be evaluated. Multiple proteins may be input simultaneously. The output files of ProtDCal calculations are two tab-delimited text documents named [name]_AA.txt and [name]_Prot.txt which store all the descriptors for each residue of each protein and the descriptors for the combinations of indices, groups, and aggregation operators for each protein respectively.