Direct Readout Energy
Protein residues and DNA bases show a variety of interactions. Some interactions such as Asn-A and Lys-G are frequently observed in the complex structures. The spatial distributions of side chains around base pair indicate a possibility that the distribution may be converted to energy potential, in a similar manner to the contact potential between amino acids in protein structures, and it can be used for the target prediction In order to derive the statistical potential of interactions between bases and amino acids, we defined a coordinate system by taking an origin N9 atom for A and G and N1 atom for T and C. We considered the amino acids within a given box, and the box was divided into grids. Then we transformed the distributions of C atom of amino acid into statistical potentials defined by the following equations,


where ab is the number of pairs, a and b observed, w is the weight given to each observation,  f(s) is the relative frequency of occurrence of any amino acids at grid point s, and gab(s) is the equivalent relative frequency of occurrence of amino acid a against base b. R and T are gas constant and absolute temperature, respectively. Here, we used a box of |x| = |y| =13.5Å and |z| = 6 Å and a grid interval of 3Å, which was determined by examining various intervals. In order to quantify the specificity, we evaluated Z-score by calculating energy against 50,000 random DNA sequences. Z-score is defined by (X - m)/s in the histogram, where X is sum of contact potential in a complex form, m is mean energy over 50,000 combinations and s is standard deviation of the energy. For example, Z-score of -3.0 means that there are potentially two DNA sequences that are better fit to the framework among 1,000 random sequences.