Conformational parameters and energy Z-scores for free and protein-bound DNA

Introduction
Some of the following text, using mathematical expressions may not be displayed correctly in older browsers. We are trying to fix it.

Conformational properties of DNA play a central role in its stability and dynamics, transcription regulation and control and interaction with proteins. Gene regulatory proteins and transcription factor targets interact with DNA by a mechanism which is either a direct (specific) readout determined by protein structure and sequence and or an indirect (non-specific) caused by the elastic properties and conformational deformations in the DNA.

A approach to determine conformational stability and energy of DNA, is to use statistical force fields based on conformational properties. A DNA molecule is treated as an elastic object, with several degrees of freedom in its conformations. The local conformation of the DNA is identified at each location of base pair (from complementary strands) in terms of known deformations such as base-base translational shifts, base-pair rolls and tilts etc.

Each of these degrees of freedom is characterized by different degrees of flexibility. When a real DNA structure is observed, its stability or energy can be estimated by the amount of deformations compared to a typical or average structure. Thus, to transform a conformational parameters data into energy data, three steps are required viz. determination of an average value of that conformation, the deviation of a conformational parameter of target structure from an average conformation and some potential which can transform these deviations into energy values based on the elastic properties of the corresponding base pair for the particular conformational parameter.

In order to make a fare comparison between energy values obtained from different data sets and under different conditions, the energy values calculated in this way are normalized by a mean and standard deviation in a large randomly generated sequence database superimposed on the DNA conformation being studied. We have been investigating different types of conformational properties of DNA which affect its energy and have developed a statistical force field to determine conformational energy and its normalized value called Z-score.

Using these force fields, we have been able to successfully explain the role of direct and indirect readout contributions in protein-DNA interactions. Here, we present a web server, which can be used to obtain the energy Z-scores based on our force fields. This web server takes the coordinate data of a DNA molecule or its complex with protein, calculates its conformational parameters and energy based on our published force field. The submitted DNA structure is then superimposed onto a large set of randomly generated DNA sequences with the same lengths, and corresponding energies are calculated in these random sequences. From the mean and standard deviation in the energy of random sequences, a normalized energy value for the submitted (target) sequence is calculated. The output of the server consists of the results obtained from the calculation of conformational parameters, energy table of all possible combinations at individual base pair locations, and the energy Z-scores. It is expected that the biologists wishing to study the DNA stability and flexibility, and those interested in non specific interactions of DNA with proteins will find it an useful source of information.

Method

Conformational parameters of DNA

There are many ways in which DNA conformation may be characterized. Our force fields are based on six types of conformational parameters viz shift, slide, rise, tilt, roll and twist. The values of these parameters are extracted from the output of 3DNA program provided by Olson group.

Conformational coordinates
Shift Tilt
Slide Roll
Rise Twist

Development of force field

As described in the introduction, conformational energy of a particular base pair depends on the deformation at that base pair position for each type of conformation. As an example Figure 1 schematically shows the deformation in a base pair with respect to the mean or expected angle of tilt. Elasticity of the given base-pair conformation for any tilt depends on the distribution of tilt angles in the whole database. In this web server, the force field used is borrowed from our previously published work.

Energy Z-scores

Energy Z-score for a target sequence and structure determine the specificity of that sequence towards the observed conformation or structure. A higher negative Z-score implies a more specific conformation and hence greater sequence dependence. In terms of energy E, the Z-score is calculated from the expression:

E = Δθ T F Δ θ

where Δθ is the six dimensional conformational fluctuation an F is a force field matrix.
(a) (b)
Figure 1. (a) An example of conformational parameter (tilt). Two successive bases in the DNA helix are tilted to each other by an angle. θ represents mean deformation in the database (used for developing the force field) and Δθ shows deformation in a base-pair for an example target. Energy contribution from this base pair deformation depends on overall distribution of θ in the database. (b) A typical distribution of elastic deformation values in the DNA.
  1. DNA-Z-score, mean and standard deviation: This gives the values of conformational energy normalized to its Z-scores, the mean value of this energy for a large set of randomly generated DNA sequences, superimposed on the given target conformation.
  2. Energy tables: While calculating conformational energy of random DNA sequences, we superimpose them to the target DNA conformation. In this way, each of the target sequence base pair can be replaced by any random base pair. However, different base pairs have different values of mean conformational parameters and standard deviations. This leads to a different value of energy contribution from each base pair if the sequence is changed. For each base pair position, there are 4×4=16 possible base pair combinations, out of which only 10 are unique. An energy table is created for each of these 10 steps corresponding to the same conformational parameters as in the target sequence, but with different possible base pairs at those positions. This energy table serves as a guidance to the variation of conformational energy in different base steps and can be used to estimate which conformational positions have higher specificity for individual base pairs.
  3. Conformational parameters In the end, we provide a data of conformational base steps or parameters used to obtain the reported Z-scores and other information. These values are calculated using Olson’s X3DNA program.

Points to remember in using this web server:

Non-standard nucleic acid bases

Force field used in this web server was derived from a non-redundant data set of protein-DNA complexes in the PDB. Only four bases viz. A, C, G and T were used to generate this force field. Due to this reason, the server does not calculate the conformational energy or Z-scores for DNA whose sequence contains any other identification code other than these four standard bases. A warning message will be displayed in such cases.

NMR structure models

Protein Data Bank frequently has protein and DNA structures determined by using NMR and coordinate data for these structures usually consists of several models. In our web server, we use only the first model of the PDB file for calculating conformational properties. All subsequent models are ignored.

Sequence size limit

Calculation of DNA Z-scores using above method requires calculation of energy for the target sequence and also energy for a large set of random sequences superimposed on this conformation. Number of random sequences needed for generating a converging solution for Z-scores increases rapidly with an increase in sequence length. We have therefore limited the DNA sequence length to 50 nucleic acid bases in the current version of server.