More about PDBnet
Chains and domains
We first decomposed the PDB structures into chains (e.g., peptide and nucleotide chains), which are considered as a unit to construct clusters and networks. We also provide the domain information from Pfam and CATH for each protein chain, together with the mapping of those amino-acid residues that interact with other chains. We are working on the domain-based clustering and network, and will implement it to PDBnet in near future.
There is a redundancy in PDB such as the structures of the same protein and mutant proteins. Thus, we first removed the redundancy by clustering chains based on sequence similarity. For this purpose, we used the cluster lists based on BLAST (with identity thresholds of 90% as a default and 70% optional) obtained from PDB.
We have examined all the pairs of chains whether they share the same PDB code. If two chains share the same PDB code we created a link between the clusters containing these chains. These links indicate that there are some associations through the complex structure and therefore some functional relations between the clusters. We also examined physical contacts among all pairs of chains within each complex structure, and created the links (we call them “hard links”) between the two clusters if any two chains in the complex structures are within a threshold distance (less than 5Å in distance between any atoms). Those links without the direct physical contacts are designated as “soft links”.
By putting the linked clusters together, we obtained separate networks. Some of these networks contain nucleic acid chains. These networks represent segments of molecular networks in structurome, and should corresponds to some biological functions. We are currently integrating our structure-based networks with other protein-protein interaction networks, and will implement them to PDBnet in near future.
Various kinds of information on molecules, clusters and networks were implemented into a relational database using MySQL. These data were also integrated with sequence, structure, property and functional information through a backbone database, 3DinSight, developed in our laboratory.
Search and visualization interfaces
All the data in PDBnet can be searched through a search interface. Users can examine the detailed information about chain, domains, clusters and networks. The interaction of clusters and the network are visualized by using Graphviz, with clusters as oval nodes connected by the hard and soft links in red and black lines, respectively. Each node is hyperlinked to the corresponding cluster page. We also provide an integrated viewer page, to help users to overlook the relationship among molecular networks, 3D structures of molecular complex and the sequence alignment of the core cluster (the query cluster at the center) together. For more information, please see the Tutorial.
We constructed clusters based on sequence similarity (90% identity as a default and 70% optional) in order to remove the sequence redundancy of PDB chains. However, the network constructed from the clusters may still contain some redundancy resulting from a mixture of species. The species information is provided for each cluster. In future, we will incorporate species-specific subsets of clusters and networks.
Domains and Chimeras
Domains are usually the unit of compact structure, function and evolution. Therefore, it is more convenient for the analysis of molecular interaction and network to construct domain-based clusters and network. We are currently working on the implementation of the domain-based clustering and network construction to PDBnet. In the current version of PDBnet, we provide the domain information (Pfam and CATH) of proteins within the chain and cluster information pages, together with the information about the residues involved in the interaction with other chains. Thus, users can inspect the domain organization within each cluster and which domains are responsible for the links with other clusters within each network. Some PDB structures contain artificial chimera, which may produce spurious cluster members and links. Thus, we provided that information taken from PDB entries for the relevant member chains.
Currently, the cluster ID’s are numbered in the order of cluster size. Thus, the cluster ID may not be the permanent identifier of clusters. We provide the information of clusters with all the membership for all the past updates in Statistics page, so that users can trace the cluster IDs and their membership. The same situation applies to the network ID. We are also testing a non-hierarchical clustering method to create clusters, in order to keep track of the identity of clusters more easily.