Real Value Prediction of Solvent Accessibility using Neural Network
Ref: Shandar Ahmad, M. Michael Gromiha and Akinori Sarai
We have developed a neural network
model to predict real value of
solvent accessibility from sequence information.
Detailed results of predictions for training, test and validation data
have been provided through links on this page.
For online predictions using this method, please visit
Three lists of proteins have been created from each set of proteins provided
references given below. Rotating training, test and validation data sets
these lists, leaves six sets of data for each of these referred groups
They have been labelled as set1, set2, set3, ... set6. Each of these data
two directories called "data" and "preds". "data" directory contains
all the input
information use for training/ prediction. In each "data" directory,
there are files
called "train.dat", "test.dat" and "val.dat". "train.dat" has the list
used for training the network. "test.dat" has been used for determining
stopping point for training and "val.dat" proteins, have been kept
training process for cross validation after training. It may be noted
the data files for Manesh-215 set have two significant digits after the
whereas the other datasets have only integer values for their ASA values
in the data directories. This is due to the fact that DSSP has been used
for calculating ASA values of all data sets and DSSP returns an integer
of ASA in A^2. For Manesh-215 data sets, ASA values were calculated using
another standard program called ASC (see ref. below), and ASC returns,
ASA values upto second place of decimal (in A^2). The actual effect of these
decimal places is however insignificant compared to the variation in
prediction accuracy values. "pred" directory contains results
of prediction for the corresponding set for all training, test and
proteins. All these prediction files have four columns. First column
residue name, second column is the desired value, third coulmn is the
predicted value and the last coulmn has the absolute error in prediction.
Units have been normalised to unity. To get the percent relative accessibility
one has to multiply these values by 100. To obtain the total solvent
in A^2, one needs to multiply this with ASA of the extended state of
for residue type X, as described in the following reference:
The data containing the above information can be downloaded as a single tar file now.
Please click here to do so.
For comments and suggestions, please contact: