APIs

DR-SIP package (drsip)

Package contains filters and protocols for predicting homo-oligomeric transmembrane proteins (HoTPs).

Functions

drsip.DR_SIP(reader_obj=None, dist_rest_file='', transmem_helix_sel_strs=[], soluble=False)[source]

Initialize the correct docking protocol.

The default protocol is the membrane protocol. To select the soluble protocol, set soluble to True.

Currently, only supports docking results from ZDOCK 3.0.2.

Parameters:
  • mobile_pdb (static_pdb,) – Full path to the PDB files containing the static and mobile monomers.
  • zdock_output_file (str) – Full path to the ZDOCK 3.0.2 generated output file.
  • dist_rest_file (str, optional) –

    Full path to the distance restraints file, defaults to an empty string. When empty, the distance restraints filter is not used. Required when soluble is set to True.

    File is formatted such that each line corresponds to a distance restraint (tab delimited):

    chain_id_1 res_id_1 chain_id_2 res_id_2 distance
    
  • transmem_helix_sel_strs (lists of strings, optional) – Define which residues are the transmembrane helices. Each string should select for 1 transmembrane helix using the MDAnalysis’ selection language. Required if soluble is False.
  • soluble (bool, optional) – Defaults to False, which uses DR-SIP’s membrane protocol. When set to True, uses the membrane protocol.
drsip.load_data(filename, reader_obj=None)[source]

Load saved DR_SIP data from file and return DR_SIP instance.

Parameters:filename (str) – Full path to save file.
Returns:Instance of DR_SIP with data from save file.
Return type:drsip_inst

Classes

class drsip.DR_SIP_Base(dist_rest_file='')[source]

DR_SIP base class.

For subclassing only.

Parameters:
  • mobile_pdb (static_pdb,) – Full path to the PDB files containing the static and mobile monomers.
  • zdock_output_file (str) – Full path to the ZDOCK 3.0.2 generated output file.
  • dist_rest_file (str, optional) –

    Full path to the distance restraints file, defaults to an empty string.

    File is formatted such that each line corresponds to a distance restraint (tab delimited):

    chain_id_1 res_id_1 chain_id_2 res_id_2 distance
    
    chain_id_1 res_id_1 chain_id_2 res_id_2 distance
    
build_final_results_table(clusters, filtered_poses_data)[source]

Returns the final results table.

Table contains the final ranking of the docking poses.

Needs to be implemented by the child class.

Parameters:
  • clusters (list of int) – List containing the cluster number that each pose is in.
  • filtered_poses_data (pandas.Dataframe) – Table containing the filter data and cluster number of each pose.
Returns:

results_table – Final results table.

Return type:

pandas.Dataframe

calc_CAPRI_class(ref_pdb_file, static_sel_str, mobile_sel_str)[source]

Classify each docking pose to one of four CAPRI classes.

Each docking pose is compared to the reference pose in ref_pdb_file and assigned to one of the following classes: ‘Incorrect’, ‘Acceptable’, ‘Medium’ or ‘High’.

Uses the CAPRI criteria to classify the poses. See docking_eval.CAPRI_Criteria.

Needs to be implemented by the child class.

Parameters:
  • ref_pdb_file (str) – Full file path to PDB file containing the reference pose.
  • mobile_sel_str (static_sel_str,) – MDAnalysis style selection string to select the atoms in ref_pdb_file that corresponds to the static and mobile monomers, respectively.
Returns:

CAPRI_class – Table of the docking poses and their respective classifications.

Return type:

pandas.Dataframe

get_CAPRI_class_table()[source]

Returns the CAPRI classification table.

Returns a table containing the CAPRI classification of each docking pose.

Run the calc_CAPRI_class (membrane or soluble) method before calling this method.

Returns:Table of the docking poses with their respective classifications, % Native Contacts and iRMSD.
Return type:pandas.Dataframe
get_final_results_table()[source]

Returns the final results table.

Returns:Table of the final results.
Return type:pandas.Dataframe
run()[source]

Executes the docking protocol.

Needs to be implemented by the child class.

write_results_file(filename)[source]

Write the top results to a CSV file.

Parameters:filename (str) – Full path to save the comma-seperated (CSV) file.
class drsip.DR_SIP_Membrane(reader_obj=None, transmem_helix_sel_strs=[], dist_rest_file='')[source]

DR-SIP’s membrane protocol.

Protocol applies the Cn symmetry, tilt angle and distance restraints (optional) filters on docking poses.

Currently, only supports docking results from ZDOCK 3.0.2.

Parameters:
  • mobile_pdb (static_pdb,) – Full path to the PDB files containing the static and mobile monomers.
  • zdock_output_file (str) – Full path to the ZDOCK 3.0.2 generated output file.
  • transmem_helix_sel_strs (lists of strings) –

    Define which residues are the transmembrane helices. Each string should select for 1 transmembrane helix using the MDAnalysis’ selection language.

  • dist_rest_file (str, optional) –

    Full path to the distance restraints file, defaults to an empty string.

    File is formatted such that each line corresponds to a distance restraint (tab delimited):

    chain_1 resid_1 chain_2 resid_2 distance
    
build_final_results_table(clusters, filtered_poses_data)[source]

Returns the final results table.

Table contains the representative poses of each cluster ranked based on the size of the clusters (in descending order). For each representative pose the Cluster ID, Order of Symmetry, Cn symmetry RMSD, Tilt Angle, Pearson and Spearman correlations are included in the table.

The representative pose of each cluster are chosen based on 2 criteria:

  1. The consensus order of symmetry within a cluster. Only poses that have the consensus order of symmetry are chosen.
  2. The pose with the lowest Cn symmetry RMSD.

Use get_final_results_table to return the table.

Parameters:
  • clusters (list of int) – List containing the cluster number that each pose is in.
  • filtered_poses_data (pandas.Dataframe) – Table containing the filter data and cluster number of each pose.
calc_CAPRI_class(ref_pdb_file, static_sel_str, mobile_sel_str, num_mers)[source]

Classify each docking pose to one of four CAPRI classes.

Each docking pose is compared to the reference pose in ref_pdb_file and assigned to one of the following classes: ‘Incorrect’, ‘Acceptable’, ‘Medium’ or ‘High’.

Uses the modified CAPRI criteria for HoTPs to classify the poses. See docking_eval.assign_Cn_symm_CAPRI_class.

The modified CAPRI criteria includes an additional criterion where the predicted size (“n”) of the pose must be equal the size of the reference complex (num_mers).

Parameters:
  • ref_pdb_file (str) – Full file path to PDB file containing the reference pose.
  • mobile_sel_str (static_sel_str,) – MDAnalysis style selection string to select the atoms in ref_pdb_file that corresponds to the static and mobile monomer, respectively.
  • num_mers (int) – The size of the original Cn symmetric HoTP complex.
Returns:

CAPRI_class – Table of the docking poses with their respective classifications, % Native Contacts and iRMSD.

Return type:

pandas.Dataframe

run(Cn_rmsd_cutoff=2.0, dist_rest_cutoff=0.3, tilt_angle_cutoff=0.541052, cluster_RMSD_cutoff=12.0, num_mers=None)[source]

Executes the membrane protocol.

Parameters:
  • Cn_rmsd_cutoff (float, optional) – The Cn symmetry RMSD filter cutoff value (in Angstrom). Default is 2 Angstrom.
  • dist_rest_cutoff (float, optional) – The distance restraints filter cutoff value (correlation). Default correlation 0.3.
  • tilt_angle_cutoff (float, optional) – The tilt angle filter cutoff value (radians). Default angle is 0.541052 rad (31 degrees).
  • cluster_RMSD_cutoff (float, optional) – Cutoff used to cluster the docking poses. Default cutoff is 12 Angstrom.
save_data(filename)[source]

Save current session/instance to file.

Parameters:filename (str) – Full file path to save to
write_pose(pose_num, filename)[source]

Write a Cn symmetric complex to a PDB file.

Given the original rank of a ZDOCK pose, write out the closest ideal Cn symmetric complex to a PDB file.

Parameters:
  • pose_num (int) – The pose’s original ZDOCK rank.
  • filename (str) – Filename to store the complex in PDB format.
write_results_file(filename)[source]

Write the top results to a CSV file.

Parameters:filename (str) – Full path to save the comma-seperated (CSV) file.
write_topN_poses(folder, num_poses=20)[source]

Write the top-N Cn symmetric complexes to PDB files.

The top-N complexes are from the DR-SIP final results table.

Parameters:
  • folder (str) – Folder to write the PDB files of the top-N complexes Files are written out as Pose_1.pdb, Pose_2.pdb, …, Pose_N.pdb.
  • num_poses (int, optional) – The number of top-N poses to write out. Default: 20
class drsip.DR_SIP_Soluble(reader_obj, dist_rest_file)[source]

DR-SIP’s soluble protocol.

Protocol applies the distance restraints filter on docking poses.

Currently, only supports docking results from ZDOCK 3.0.2.

Parameters:
  • mobile_pdb (static_pdb,) – Full path to the PDB files containing the static and mobile monomers.
  • zdock_output_file (str) – Full path to the ZDOCK 3.0.2 generated output file.
  • dist_rest_file (str) –

    Full path to the distance restraints file.

    File is formatted such that each line corresponds to a distance restraint (tab delimited):

    chain_1 resid_1 chain_2 resid_2 distance
    
build_final_results_table(clusters, filtered_poses_data)[source]

Build the final results table.

Table contains the representative poses of each cluster ranked based on the size of the clusters (in descending order). For each representative pose the Cluster ID, the Pearson and Spearman correlations are included in the table.

The representative pose of each cluster has the highest Spearman correlation. If there are more than one pose with the same highest correlation, the one with the highest Pearson correlation is chosen.

Use get_final_results_table to return the table.

Parameters:
  • clusters (list of int) – List containing the cluster number that each pose is in.
  • filtered_poses_data (pandas.Dataframe) – Table containing the filter data and cluster number of each pose.
calc_CAPRI_class(ref_pdb_file, static_sel_str, mobile_sel_str)[source]

Classify each docking pose to one of four CAPRI classes.

Each docking pose is compared to the reference pose in ref_pdb_file and assigned to one of the following classes: ‘Incorrect’, ‘Acceptable’, ‘Medium’ or ‘High’.

Uses the CAPRI criteria to classify the poses. See docking_eval.assign_soluble_CAPRI_class.

Parameters:
  • ref_pdb_file (str) – Full file path to PDB file containing the reference pose.
  • mobile_sel_str (static_sel_str,) – MDAnalysis style selection string to select the atoms in ref_pdb_file that corresponds to the static and mobile monomer, respectively.
Returns:

CAPRI_class – Table of the docking poses with their respective classifications, % Native Contacts and iRMSD.

Return type:

pandas.Dataframe

run(dist_rest_cutoff=0.3, cluster_RMSD_cutoff=12.0)[source]

Executes the soluble protocol.

Parameters:
  • dist_rest_cutoff (float, optional) – The distance restraints filter cutoff value (correlation). Default correlation 0.3.
  • cluster_RMSD_cutoff (float, optional) – Cutoff used to cluster the docking poses. Default cutoff is 12 Angstrom.
save_data(filename)[source]

Save current session to file.

Parameters:filename (str) – Full file path to save to
write_pose(pose_num, filename)[source]

Write a pose to a PDB file.

Parameters:
  • pose_num (int) – The pose’s original ZDOCK rank.
  • filename (str) – Filename of PDB file to store the docking pose.
write_results_file(filename)[source]

Write the top results to a CSV file.

Parameters:filename (str) – Full path to save the comma-seperated (CSV) file.
write_topN_poses(folder, num_poses=20)[source]

Write the top-N docking poses to PDB.

Parameters:
  • folder (str) – Folder to write the PDB files of the top-N docking poses ranked by DR-SIP. Files are written out as Pose_1.pdb, Pose_2.pdb, …, Pose_N.pdb.
  • num_poses (int, optional) – The number of top poses to write out. Default: 20

Distance Restraints Filter (drsip.dist_restraints)

Module contains the function implementing the distance restraints (DR) filter.

Functions

drsip.dist_restraints_filter.dist_restraints_filter(static_coord, mobile_coord, dist_mat_idx, dist_rest_data, cutoff=0.3)[source]

Distance restraints (DR) filter.

Pearson and Spearman correlations between the given distance restraints and the distances between C-alpha atoms of the corresponding residue pairs in docking poses.

Poses passes the filter when both correlations are >0.3.

Parameters:
  • mobile_ori_coord (static_ori_coord,) – Nx3 coordinate array of the static and mobile monomers, respectively, from the docking pose. Where N is the number of atoms.
  • dist_mat_idx (list) – Contains 2 lists. The lists represents the row (first list) and column (second) indices (0-indexed) of the distance matrix that corresponds to the residue pairs involved in the distance restraints.
  • dist_rest_data (np.array) – Array containing the distance restraints. The length of the dist_rest_data should be the same as the 2 lists in dist_mat_idx.
  • cutoff (float, optional) – If the docking pose’s Spearman and Pearson correlation are >cutoff, the docking pose passes the filter.
Returns:

Contains the distances between the C-alpha atoms (np.array), Spearman correlation (float), Pearson correlation (float) and pass status (bool). The pass status is True when the docking pose passes the filter and vice versa.

Return type:

list

Cn Symmetry Criterion (drsip.Cn_symm_criterion)

Module contains functions implementing the Cn symmetry criterion.

Functions

drsip.Cn_symm_criterion.Cn_symm_criterion(static_ori_coord, mobile_ori_coord, diff_vect, cutoff=2.0)[source]

Cn symmetry criterion.

Checks if the current docking pose is close to its closest ideal Cn symmetric pose.

Parameters:
  • mobile_ori_coord (static_ori_coord,) – Nx3 coordinate array of the static and mobile monomers, respectively, from the docking pose. Where N is the number of atoms.
  • diff_vect (np.array) – Vector from the center of mass (COM) of the static monomer to the COM of the mobile monomer.
  • cutoff (float, optional) – The Cn symmetric RMSD cutoff. If the docking pose’s Cn symmetric RMSD > cuttoff, it fails the criterion. Default cutoff is 2.0 Angstrom.
Returns:

Returns the axis of rotation (np.array), static monomer’s translation vector (np.array), number of oligomers (int), Cn symmetric RMSD (float) and RMSD_pass_status (bool). The RMSD_pass_status is True when the docking pose passes the criterion, otherwise False.

Return type:

list

drsip.Cn_symm_criterion.get_angle_of_rotation(rot_mat, axis_of_rot)[source]

Extract the angle of rotation from the rotation matrix.

Parameters:
  • rot_mat (np.array) – 3x3 rotation matrix.
  • axis_of_rot (np.array) – Axis of rotation extracted from the rotation matrix. See get_axis_of_rot.
Returns:

Returns the angle of rotation in radians.

Return type:

float

drsip.Cn_symm_criterion.get_axis_of_rot(rot_mat)[source]

Extract the axis of rotation from the rotation matrix.

Parameters:rot_mat (np.array) – 3x3 rotation matrix.
Returns:Returns the axis of rotation.
Return type:np.array
drsip.Cn_symm_criterion.get_rot_mat_arbitrary_axis(rot_angle, axis_of_rot)[source]

Returns the rotation matrix for the rotation by the given rotation angle about the axis of rotation.

Parameters:
  • rot_angle (float) – Rotation angle in radians.
  • axis_of_rot (np.array) – Axis of rotation.
Returns:

rot_mat – 3x3 rotation matrix.

Return type:

np.array

drsip.Cn_symm_criterion.get_trans_vect(axis_of_rot, rot_angle, diff_vect)[source]

Returns the translation vector.

The translation vector is the vector from the complex’s center of mass (COM) to the COM of the static monomer. The COM of the complex is set to the origin.

Parameters:
  • axis_of_rot (np.array) – Axis of rotation. See get_axis_of_rot.
  • rot_angle (float) – Rotation angle in radians. See get_angle_of_rotation.
  • diff_vect (np.array) – Vector from the center of mass (COM) of the static monomer to the COM of the mobile monomer.
Returns:

trans_vect – Translation vector.

Return type:

np.array

drsip.Cn_symm_criterion.get_num_mers(rot_angle)[source]

Predict the size of the complex from the rotation angle.

The size (“n”) is the number of monomers in the Cn symmetric complex.

Parameters:rot_angle (float) – Rotation angle in radians. See get_angle_of_rotation.
Returns:Number of monomers (“n”).
Return type:int

Tilt Angle Criterion (drsip.tilt_angle_criterion)

Module contains functions implementing the tilt angle criterion.

Functions

drsip.tilt_angle_criterion.tilt_angle_criterion(cylindrical_axis, membrane_norm, cutoff=0.541052)[source]

Tilt angle criterion

The tilt angle of a monomer is the acute angle between its cylindrical axis and the membrane normal. If the tilt angle is less than the cutoff, it passes the criterion otherwise it fails the criterion.

Parameters:
  • membrane_norm (cylindrical_axis,) – 3-dim vector of the monomer’s cylindrical axis and the membrane normal. The membrane normal is the axis orthogonal to the membrane bilayer.
  • cutoff (float, optional) – Tilt angle cutoff in radians. Default cutoff is 0.541052 rad (31 degrees).
Returns:

Returns the tilt angle (float) and pass status (bool). The pass status is True if the monomer passes the filter, otherwise False.

Return type:

list

drsip.tilt_angle_criterion.get_cylindrical_axis(CA_sel, transmembrane_helix_sel_strs)[source]

Get the cylindrical axis of a monomer

The cylindrical axis of a monomer is the average of the longitudinal axis (unit vector) of the transmembrane helices.

Parameters:
  • CA_sel (MDAnalysis.core.groups.AtomGroup) – MDAnalysis’ atomgroup containing the monomer’s C-alpha atoms.
  • transmembrane_helix_sel_strs (list) – List of MDAnalysis style atom selection strings, selecting each transmembrane helix.
Returns:

3-dim vector (unit vector) containing the cylindrical axis of the monomer.

Return type:

np.array

Structure Clustering (drsip.struct_clustering)

Module contains functions implementing the structure clustering part of the protocol.

Functions

drsip.struct_clustering.min_dist(dist_mat_1, dist_mat_2)[source]

Compare the elements of 2 matrices and keep the minimum values.

Values are compared element-wise and the minimum of the 2 values is returned.

Parameters:dist_mat_2 (dist_mat_1,) – NxN distance matrices between the 2 monomers in the docking pose. Where N are the number of atoms.
Returns:Returns a new distance matrix containing the minimum values for each element.
Return type:np.array
drsip.struct_clustering.calc_MSD()[source]

Compute the mean squared deviation (MSD).

MSDs are computed between the reference and the mobile monomers from different poses.

Parameters:
  • ref_coord (np.array) – Nx3 coordinate matrix of the reference monomer with N atoms.
  • mobile_coords (np.array) – MxNx3 coordinate matrices of mobile monomers with N atoms from M poses.
Returns:

M-dim vector containing the MSDs between the static and mobile monomers.

Return type:

np.array

drsip.struct_clustering.calc_MSD_mob_static()[source]

Compute the MSD between mobile and static monomers.

Parameters:
  • rotated_static_coords (np.array) – MxNx3 coordinate matrix of the static monomers with N atoms from the M poses after superimposing the mobile monomer to the static monomer.
  • mobile_coords (np.array) – MxNx3 coordinate matrices of the mobile monomers with N atoms from M poses.
Returns:

MxM matrix of the MSDs between all pairs of poses.

Return type:

np.array

drsip.struct_clustering.cluster_poses(dist_mat, cutoff=12.0, criterion='distance')[source]

Cluster docking poses

Docking poses are clustered with hierarchical clustering and average linkage criteria. The distance matrix contains the RMSD between the poses. See cal_rmsd_between_poses_membrane for HoTPs or cal_rmsd_between_poses_soluble for soluble proteins.

Clustering is implemented by SciPy.

Parameters:
  • dist_mat (np.array) – MxM distance matrix containing the RMSD between all pairs of poses. Where M is the number of poses.
  • cutoff (float, optional) – The cutoff used to determine when to stop combining clusters. This happens when the shortest distance between all pairs of clusters is >cutoff. Default cutoff is 12.0 Angstroms.
  • criterion (str, optional) – Default criterion is ‘distance’ which uses the cophenetic distance to compute distances between members from different clusters. The cophenetic distance is the distance between the 2 clusters. See SciPy documentation for details.
Returns:

Returns an array of length M where each element in the array corresponds to cluster number that the m’th pose has been assigned to.

Return type:

np.array

drsip.struct_clustering.calc_rot_static_coord(static_coord, mobile_com, static_com, rot_mat)[source]

Compute coordinate of the static monomer after superimposition.

After the mobile monomer is superimposed onto one of the monomers from the other pose, the static monomer moves with the mobile monomer. This function computes the coordinates of this static monomer.

Parameters:
  • static_coord (np.array) – Nx3 coordinate matrix of the static monomer with N atoms.
  • mobile_com (np.array) – Vector pointing from the origin to the mobile monomer’s center of mass (COM). COM of the mobile monomer in the pose before superimposition.
  • static_com (np.array) – Vector pointing from the origin to the static monomer’s center of mass (COM). COM of the static monomer in the pose before superimposition.
  • rot_mat (np.array) – 3x3 rotation matrix used to superimpose the mobile monomer to the monomer on the other pose.
Returns:

Nx3 coordinate matrix of the static monomer after superimposition of the mobile monomer.

Return type:

np.array

drsip.struct_clustering.cal_rmsd_between_poses_membrane(static_coord, mobile_coords)[source]

Compute the RMSD/distance between the docking poses of HoTPs.

Each pose contains 2 monomers, the static monomer whose coordinates are fixed and the mobile monomer that is translated and rotated during docking.

There are 4 ways to compute the RMSD between any 2 monomers, each chosen from one of the 2 poses:

  1. Static monomers from both poses are superimposed and RMSD is computed for the mobile monomers. No superimposition is required for ZDOCK generated poses since the static monomers are fixed.
  2. Mobile monomers from both poses are superimposed and RMSD is computed for the static monomers.
  3. A mobile monomer from one pose is superimposed onto the static monomer in the second pose. The RMSD of monomers that were not superimposed are computed.
  4. The monomers used to compute the RMSD in (3) is superimposed and while the other two monomers are used to compute the RMSD.

This function computes all 4 RMSDs and picks the smallest RMSD to represent the distance between these 2 poses.

Parameters:
  • static_coord (np.array) – Nx3 coordinate matrix of the static monomer with N atoms.
  • mobile_coords (np.array) – MxNx3 coordinate matrices of the mobile monomers with N atoms from M poses.
Returns:

MxM matrix containing the RMSDs between the poses.

Return type:

np.array

drsip.struct_clustering.cal_rmsd_between_poses_soluble(static_coord, poses_mobile_coords)[source]

Compute the RMSD between poses of soluble proteins.

Parameters:
  • static_coord (np.array) – Nx3 coordinate matrix of the static monomer with N atoms.
  • mobile_coords (np.array) – MxNx3 coordinate matrices of the mobile monomers with N atoms from M poses.
Returns:

MxM matrix containing the RMSDs between the poses.

Return type:

np.array

Saving and Loading (drsip.save_load)

Module contains helper functions to save and load DR-SIP data files.

Functions

drsip.save_load.save_pd_table(storage, table, var_name)[source]

Split and store the values and indices

Parameters:dist_mat_2 (dist_mat_1,) – NxN distance matrices between the 2 monomers in the docking pose. Where N are the number of atoms.
Returns:Returns a new distance matrix containing the minimum values for each element.
Return type:np.array
drsip.save_load.load_pd_table(storage, var_name, pop=False)[source]

Reconstruct Pandas table from dictionary key-value pairs and save to storage

Parameters:
  • storage (dict) – Dictionary to load the data from
  • var_name (str) – Original variable name
  • pop (bool, optional) – If True, remove key-value pairs used to reconstruct the table from storage
drsip.save_load.load_StrIO(storage, var_name)[source]
Convert strings in storage to StringIO and replace original string in
storage
Parameters:
  • storage (dict) – Dictionary to load and save the string/StringIO data
  • var_name (str) – Variable name
drsip.save_load.convert_StrIO_or_file_to_str(input_obj)[source]