Overview.

RedLevel stand for redundancy level. GenBanK and RefSeq assembly have a unique identifier but the same assembly can be found in GenBanK and/or RefSeq. One assembly can have mulitple version (including minor change in their annotation for example). Here, several levels were defined to overcome the redundancy implied by assembly versionning and the database duo GenBank-RefSeq.


There are 4 levels of redundancy :


  • Organism : we considere only one assembly at the strain level. (e.g if there are 13 assemblies ccyA+ for Microcystis aeruginosa then only one will be considered).
  • UID : we considere only one assembly that could be present under GenBanK and/or RefSeq in different version.
  • UIDV : we considere only one assembly version that could be present under GenBanK and/or RefSeq.
  • Accession : We considere all version of all assembly present in both GenBank and RefSeq

This section show the percentage of genome entry at different level of redundancy with or without ccyA.

{{ metrics_fig }}

Graphical overview.

The chart below show the increasing number of genome at different level of redundancy over the time.

{{ genome_over_time }}

The lineplot below show the total number of calcyanin sequences over the time for the higher level of redundancy. Therefore there might be duplicated sequences due to GenBank/RefSeq versionning and assembly versionning.

{{ sequence_over_time }}
Sunburst
Treemap

Sunburst and treemap chart display the same type of data. They show the number of sequence by categories in a hierarchical way. Starting from the N-ter type to the date of analysis. If you click on a specific area you will see the number of sequences for each sub-catergories.

{{ sunburst }}

Calcyanin classification.

The decision tree below is used to classify sequences with a significative match against the GlyX3 HMM profile. Red and green edges indicate respectively negative and positive answers. Shortly, for sequences with a match against the GlyX3 HMM profile, we look at the presence and order on the sequence of each Glycine Zipper and we use a set of known N-ter to infer the nature of the N-ter extremity of those sequences. Finaly a label is assign for each of them depending on their modular organization.

{{ decision_tree }}

Modular Organization.

Note that if you click on a sequence, then you will be redirected to the browse datas section with detailed informations about the selected sequence.

This section is dedicated to the modular organization of the calcyanin. Sequences are grouped based on their N-ter type whatever their flag (see Calcyanin classification section).
It makes it possible to visualize the size of the sequences and the position of the different domains.

CoBaHMA-type
X-type
Z-type
Y-type
Unknown-type
{{ cobahma_oms }}

Browse datas

Input field above the list can be used to filter entries based on major attributes.
Clicking on an entry will give you access to the protein sequence(s) attached to it (if any). Related informations about the assembly and/or the sequence will be shown at the end of the section.
Additionnaly, you can use the green icon on the right of a ccyA+ entry to add to cart.
The MULTIPLE flag indicate that for this genome, multiple sequences had a hit against the GlyX3 HMM profile.

The browse data section contain all the datas about genomes processed by pcalf-annotate-workflow, from genomes to calcyanin features. You can filter datas based on Organism name, accession , sequence accession, flag or N-ter type. For that you should use the search bars below. The keyword order doesn't matter. On click, a detailed view of the entry will be produced, including genome metadatas , and sequence informations if any.

About.

The picture below describe the workflow use by pcalf to retrieve calcyanin from a set of protein sequence or directly from genomes.

{{ workflow }}
Download sequence or features in fasta format, by batch or based on selection. You can select items from the "browse datas" section. You can then download a specific feature, gene or protein for those sequences. On the other hand, if you want all ccyA gene for a specific type of calcyanin , e.g Calcyanin with known N-ter of type Y-type, then you can go with the batch download part.

Cart

Download fasta

Batch download

Download fasta