Cell Maps PPI Downloader

Description: {DESCRIPTION}

Version: {VERSION}

Usage

cellmaps_ppidownloadercmd.py [-h] [--cm4ai_table CM4AI_TABLE] [--edgelist EDGELIST] [--edgelist_geneid_one_col EDGELIST_GENEID_ONE_COL]
                                [--edgelist_symbol_one_col EDGELIST_SYMBOL_ONE_COL] [--edgelist_geneid_two_col EDGELIST_GENEID_TWO_COL] [--edgelist_symbol_two_col EDGELIST_SYMBOL_TWO_COL]
                                [--baitlist BAITLIST] [--baitlist_symbol_col BAITLIST_SYMBOL_COL] [--baitlist_geneid_col BAITLIST_GENEID_COL]
                                [--baitlist_numinteractors_col BAITLIST_NUMINTERACTORS_COL] [--provenance PROVENANCE] [--logconf LOGCONF] [--skip_logging] [--verbose] [--version]
                                outdir

For definitions of the positional arguments run: cellmaps_ppidownloadercmd.py -h

Outputs

The tool creates several files and folders in the specified output directory.
Below is the list and description of each output generated by the tool.

- baitlist.tsv
    This file contains information about the bait proteins used in the affinity purification-mass spectrometry (AP-MS) process.
    (from input, not always generated)

- edgelist.tsv
    An edge list representation of the protein-protein interactions. Each row in this file represents an interaction between two proteins.
    (from input, not always generated)

- ppi_edgelist.tsv
    A processed edge list file which represents protein-protein interactions, where proteins are identified by their symbols.

    geneA	geneB
    DNMT3A	SAP18
    DNMT3A	DDX3X
    DNMT3A	SEC16A
    DNMT3A	U2SURP
    DNMT3A	SYNJ2

- ppi_gene_node_attributes.tsv
    Contains attributes for each gene node in the protein-protein interaction network. This includes information like gene names, ensembl ID, and other relevant data.

    name	represents	ambiguous	bait
    DNMT3A	ensembl:ENSG00000119772		TRUE
    HDAC2	ensembl:ENSG00000196591		TRUE
    KDM6A	ensembl:ENSG00000147050		TRUE
    SMARCA4	ensembl:ENSG00000127616		TRUE

Logs and Metadata

- ppi_gene_node_attributes.errors
    If there are any errors encountered while processing the gene node attributes, they will be logged in this file.

- output.log
    Log file detailing the operational logs of the script. Useful for understanding the flow of operations and debugging any issues.

- error.log
    Log file specifically capturing any errors encountered during the script's execution.

- ro-crate-metadata.json
    Metadata in RO-Crate format, a community effort to establish a lightweight approach to packaging research data with their metadata.

    The main object contains identifier (@id), type (@type), name, descriptions, keywords and isPartOf, that describes the hierarchical relationship (organization and project).

    Graph: The @graph key contains an array of objects that detail other entities related to the main dataset.
    a. Metadata, Datasets, Software
    b. Output Files: details of output files generated by the tool.

