homelette.pdb_io

The homelette.pdb_io submodule contains an object for parsing and manipulating PDB files. There are several constructor function that can read PDB files or download them from the internet.

Functions and classes

Functions and classes present in homelette.pdb_io are listed below:


homelette.pdb_io.read_pdb(file_name: str)homelette.pdb_io.PdbObject

Reads PDB from file.

Parameters

file_name (str) – PDB file name

Returns

Return type

PdbObject

Notes

If a PDB file with multiple MODELs is read, only the first model will be conserved.

homelette.pdb_io.download_pdb(pdbid: str)homelette.pdb_io.PdbObject

Download PDB from the RCSB.

Parameters

pdbid (str) – PDB identifier

Returns

Return type

PdbObject

Notes

If a PDB file with multiple MODELs is read, only the first model will be conserved.

class homelette.pdb_io.PdbObject(lines: Iterable)

Object encapsulating functionality regarding the processing of PDB files

Parameters

lines (Iterable) – The lines of the PDB

Variables

lines – The lines of the PDB, filtered for ATOM and HETATM records

Returns

Return type

None

Notes

Please contruct instances of PdbObject using the constructor functions.

If a PDB file with multiple MODELs is read, only the first model will be conserved.

write_pdb(file_name) → None

Write PDB to file.

Parameters

file_name (str) – The name of the file to write the PDB to.

Returns

Return type

None

parse_to_pd() → pandas.DataFrame

Parses PDB to pandas dataframe.

Returns

Return type

pd.DataFrame

Notes

Information is extracted according to the PDB file specification (version 3.30) and columns are named accordingly. See https://www.wwpdb.org/documentation/file-format for more information.

get_sequence(ignore_missing: bool = True) → str

Retrieve the 1-letter amino acid sequence of the PDB, grouped by chain.

Parameters

ignore_missing (bool) – Changes behaviour with regards to unmodelled residues. If True, they will be ignored for generating the sequence (default). If False, they will be represented in the sequence with the character X.

Returns

Amino acid sequence

Return type

str

get_chains() → list

Extract all chains present in the PDB.

Returns

Return type

list

transform_extract_chain(chain)homelette.pdb_io.PdbObject

Extract chain from PDB.

Parameters

chain (str) – The chain ID to be extracted.

Returns

Return type

PdbObject

transform_renumber_residues(starting_res: int = 1)homelette.pdb_io.PdbObject

Renumber residues in PDB.

Parameters

starting_res (int) – Residue number to start renumbering at (default 1)

Returns

Return type

PdbObject

Notes

Missing residues in the PDB (i.e. unmodelled) will not be considered in the renumbering. If multiple chains are present in the PDB, numbering will be continued from one chain to the next one.

transform_change_chain_id(new_chain_id)homelette.pdb_io.PdbObject

Replace chain ID for every entry in PDB.

Parameters

new_chain_id (str) – New chain ID.

Returns

Return type

PdbObject

transform_remove_hetatm()homelette.pdb_io.PdbObject

Remove all HETATM entries from PDB.

Returns

Return type

PdbObject

transform_filter_res_name(selection: Iterable, mode: str = 'out')homelette.pdb_io.PdbObject

Filter PDB by residue name.

Parameters
  • selection (Iterable) – For which residue names to filter

  • mode (str) – Filtering mode. If mode = “out”, the selection will be filtered out (default). If mode = “in”, everything except the selection will be filtered out.

Returns

Return type

PdbObject

transform_filter_res_seq(lower: int, upper: int)homelette.pdb_io.PdbObject

Filter PDB by residue number.

Parameters
  • lower (int) – Lower bound of range to filter with.

  • upper (int) – Upper bound of range to filter with, inclusive.

Returns

Return type

PdbObject

transform_concat(*others: homelette.pdb_io.PdbObject)homelette.pdb_io.PdbObject

Concat PDB with other PDBs.

Parameters

*others ('PdbObject) – Any number of PDBs.

Returns

Return type

PdbObject