PPI3D Data Downloads Help
Retrieve data subsets
To retrieve customized subsets of clustered PPI3D data, go to the data download page and fill the data request form. This form allows specifying different interaction types, PDB data-related criteria, complex types (for example, selecting homodimers or heterodimers) and sizes, interface properties and desired clustering level. Most of the criteria are self-explanatory.
After submitting the form, the server queries the database. This usually takes several minutes, depending on the number of retrieved interfaces. The queries are cached, therefore, if someone asks for the same dataset, the already saved data are given.
The structure of the retrieved data table
After the database query finishes, the results page is shown with 4 links to downloadable files:
- The requested data table in a CSV file;
- The submitted criteria for generating the dataset in a JSON file;
- A log file with the summary of the query processing;
- A markdown document with the description of the data table.
In the data table, each row corresponds to one interaction interface.
Columns can be grouped into several categories. Most of the column names are self-explanatory, the others are described in more detailed below:
- PDB data:
- pdb_id
- biounit_no
- release_date
- resolution
- pdb_annotation
- Interaction types:
- protein_peptide_interaction
- protein_nucleic_interaction
- protein_protein_interaction
- domain_domain_interaction
- homo
- Information about subunit 1:
- subunit_1
- s1_protein_type
- subunit_1_title
- scop_family_1
- s1_taxonomy_id: the ID of NCBI Taxonomy database
- s1_number_of_residues: total number of residues in the sequence
- s1_number_of_visible_residues: number of residues that have coordinates in the structure
- s1_sequence
- Information about subunit 2:
- subunit_2
- s2_protein_type
- subunit_2_title
- scop_family_2
- s2_taxonomy_i: the ID of NCBI Taxonomy database
- s2_number_of_residues: total number of residues in the sequence
- s2_number_of_visible_residues: number of residues that have coordinates in the structure
- s2_sequence
- Interface properties:
- area
- number_of_contacts
- number_of_interface_ligands
- Clustering information for subunits:
- s1_sequence_cluster_95: cluster according to protein sequences at 95% identity
- s1_binding_site_cluster_data_95: cluster of the binding site, in the form of {sequence_cluster}_{structure_cluster}
- s1_sequence_cluster_70: cluster according to protein sequences at 70% identity
- s1_binding_site_cluster_data_70: cluster of the binding site, in the form of {sequence_cluster}_{structure_cluster}
- s1_sequence_cluster_40: cluster according to protein sequences at 40% identity
- s1_binding_site_cluster_data_40: cluster of the binding site, in the form of {sequence_cluster}_{structure_cluster}
- s1_binding_site_cluster_data_40_area: cluster of the binding site, in the form of {sequence_cluster}_{structure_cluster}
- s2_sequence_cluster_95: cluster according to protein sequences at 95% identity
- s2_binding_site_cluster_data_95: cluster of the binding site, in the form of {sequence_cluster}_{structure_cluster}
- s2_sequence_cluster_70: cluster according to protein sequences at 70% identity
- s2_binding_site_cluster_data_70: cluster of the binding site, in the form of {sequence_cluster}_{structure_cluster}
- s2_sequence_cluster_40: cluster according to protein sequences at 40% identity
- s2_binding_site_cluster_data_40: cluster of the binding site, in the form of {sequence_cluster}_{structure_cluster}
- s2_binding_site_cluster_data_40_area: cluster of the binding site, in the form of {sequence_cluster}_{structure_cluster}
- Clustering information for interaction interface:
- cluster_data_95: cluster of the interface, in the form of {interface_type}_{sequence_cluster}_{structure_cluster}
- cluster_data_70: cluster of the interface, in the form of {interface_type}_{sequence_cluster}_{structure_cluster}
- cluster_data_40: cluster of the interface, in the form of {interface_type}_{sequence_cluster}_{structure_cluster}
- cluster_data_40_area: cluster of the interface, in the form of {interface_type}_{sequence_cluster}_{structure_cluster}
“Interface type” fields have 0 for protein-protein interaction, and 1 for protein-peptide or protein-nucleic acid interactions. If clustering by sequences only is necessary for the interaction interface, both first two parts of the cluster number should be used.
- Download URLs:
- download_url: URL to download PDB formatted file of binary complex
The fields of the table are also described in the accompanying file that can be downloaded.
Downloading the structures
To download the structures, please use the link provided in the CSV file. The structure of binary interaction, containing 2 subunits, will be downloaded in the PDB format:
curl https://bioinformatics.lt/ppi3d/download/interface_coordinates/protein_protein-1ktz-1-1ktz_A-1-1ktz_B-1.pdb
The name of the downloaded file has the following parts, separated by "-":
- Description: the type of the interface (protein_protein, protein_peptide or protein_nucleic);
- PDB ID;
- The number of the Biological Assembly where this interface was found (it usually represents the first release of the PDB entry and does not necessarily correspond to the current numbering of PDB Biological assemblies);
- Subunit 1 name, PDB ID and chains information;
- Subunit 1 symmetry: the symmetry code for the subunit, taken from the Biological Assembly file;
- Subunit 2 name, PDB ID and chains information;
- Subunit 2 symmetry: the symmetry code for the subunit, taken from the Biological Assembly file.
Downloading other interface data
In addition to the structure, other interface data are possible to download. To retrieve all the interface-related download links, please change "interface_coordinates" into "interface_links", and replace the extension to ".json":
curl https://bioinformatics.lt/ppi3d/download/interface_links/protein_protein-1ktz-1-1ktz_A-1-1ktz_B-1.pdb
The server will respond with a JSON-formatted list of possible download URLs:
{
"coordinates": "https://bioinformatics.lt/ppi3d/download/interface_coordinates/protein_protein-1ktz-1-1ktz_A-1-1ktz_B-1.pdb",
"interface_residues_and_contacts": "https://bioinformatics.lt/ppi3d/download/interface_residues_and_contacts/protein_protein-1ktz-1-1ktz_A-1-1ktz_B-1.json"
}
The URL named "interface_residues_and_contacts" may be used to download the interface residues, contacts and ligands in JSON format.
Additional download options
There are a few possibilities to download the intermediate data that are produced during the PPI3D clustering and might be also of use for external applications.
To download a multiple sequence alignment for a sequence cluster, use the following URL:
curl https://bioinformatics.lt/ppi3d/download/multiple_sequence_alignment/domain_cluster_{sequence_identity_threshold}_{cluster_number}.fasta
Here "sequence_identity_threshold" is one of the possible sequence identity thresholds (95, 70, 40), and "cluster_number" is the number of the cluster of sequences, that can be found in the clustering information columns of the PPI3D data table. The sequence alignment will be downloaded in FASTA format.
The residue renumbering according to the PDB sequences, used in PPI3D clustering, can be downloaded as follows:
curl https://bioinformatics.lt/ppi3d/download/residue_correspondence/{pdb_id}/{biounit_no}/{protein_type}/{subunit_name}-{subunit_symmetry}
Other download options can be implemented upon request of the users. Please contact the authors by email ppi3d (at) bti (dot) vu (dot) lt.
