Frequently Asked Questions about Direct and Indirect Readout Energies:
(This document will remain under constant upgrade, without notice).

Q1. What is meant by direct and indirect readout?
Ans: DNA recognition by proteins is central to all gene expression control.
Nature has designed Proteins, which bind to DNA and initiate, terminate or
regulate the gene expression. Not all proteins will do this to all genes (DNA). That is where the question of "recognition" will arise. Viability of recognition is determined by interatomic interaction energies between the protein molecule (made of amino acid residues) and DNA molecule (made of nucleic acid bases). Two types of such interactions are possible. In the first interaction, the amino acid residues and nucleic acid bases from the protein and DNA recognise each other by specific, local or direct mechanism. This so called "Direct Readout" or "Specific Binding" behavior can be understood in terms of the electrostatic interactions between the side chain atoms of the amino acid residues and nucleic acid bases (together forming a residue-base pair). There is a second type of energy which comes from the structural deformation (relative to an ideal helical structure) of the DNA, which is necessary to provide the right kind of conformation for the protein and DNA to come in contact and perhaps make the specific interaction possible. This energy does not depend on what amino acid residue is present at which location, but only depends on the specificity of the DNA conformation that may be acquired only by that DNA sequence compared to any sequence of bases. This is called "Indirect Readout" or "Nonspecific" energy because it does not directly contribute to the residue-base interaction, but appears as an overall facilitator, albeit necessary.

Q2: What is a statistical potential?
Exact interactions between residues and bases are complex because they depend on the environment (e.g. pH, solvent, temperature) at the time of interaction. Therefore, a simple application of Columb's law or Lennard Jones potential cannot conveniently model the exact interaction environment. Energy calculations become far more realistic if the calculations are based on observed or "phenomenological" parameters. Energies based on the statistics of observed interactions have been particularly useful. Energy expressions based on the statistics of observed interactions are called statistical potentials.

Q3: What is energy Z-score?
Ans: Energy is typically measured in Joules, or Kcal (per mole). However, the energies calculated from statistical potentials cannot be translated into any physical units, as they are based on structural and other parameters. Thus to compare the stability of two interactions calculated separately, we need to have a dimensionless scale which is free from the data set used and variations of energy therein. To achieve this, the energy values obtained from statistical potentials are transformed into energy Z-scores. Principle of such a transformation is that a large number of random sequences of DNA are generated which can represents the entire sequence space. Energies are calculated for each one of them. Using the mean and standard deviation in this set of random sequences, we transform the entire distributions into one with zero mean value (subtract mean from each data) and a unit standard deviation (divide each value by standard deviation). Energy in this scale regime is called energy Z-score. Each energy value thus carries the information about its "specificity" or distance from the mean value. A higher positive Z-score means higher energy and hence unlikely structure. A higher negative value is favourable due to its high preference over other sequences.


This documents can be freely copied or distributed, at your own risks and responsibility. I do not guarantee univerasl agreements on these oversimplified explanations.