0
EMBL Hamburg Biological
Small Angle Scattering
BioSAXS
SASBDB

CORAL manual

coral

Written by M.V. Petoukhov and D.I. Svergun.
Post all your questions about CORAL to the ATSAS Forum.

© ATSAS Team, 2009-2013

Table of Contents

Manual

The following describes the method implemented in CORAL, details of the dialog prompt as well as the required input and the produced output files.

CORAL combines the algorithms of SASREF and BUNCH and is described by Petoukhov, M.V., Franke, D., Shkumatov, A.V., Tria, G., Kikhney, A.G., Gajda, M., Gorba, C., Mertens, H.D.T., Konarev, P.V. and Svergun, D.I. (2012)
"New developments in the ATSAS program package for small-angle scattering data analysis". J. Appl. Cryst. 45, 342-350 © International Union of Crystallography

Introduction

CORAL (COmplexes with RAndom Loops) performs SAXS-based rigid body modeling of complexes, one or several of whose components lack some fragments (e.g. termini portions or interdomain linkers are missing). CORAL, similarly to SASREF, translates and rotates the atomic models of individual domains belonging to multiple components of the complex, however these rearrangements are not fully random: the distances between the N- and C-terminal portions of the subsequent domains belonging to one chain are constrained. For this purpose a pre-generated library of self-avoiding random loops composed of DRs is utilized. It covers the linker lengths from 5 to 100 amino acids and samples 20 random structures for every possible end-to-end distances for the given length with the binning step of 2A. When a domain is moved in CORAL, its new position is examined by querying the library: if a linker of the appropriate length to connect this domain with preceding/following one cannot be found, then such a movement is rejected. If the query is successful, the corresponding random loop is inserted as a placeholder of the missing linker and its contribution is also added to the computed scattering intensity of the system and to the target function (e.g. overlaps, contacts restraints etc). C- and N- terminal portions can also be randomly selected from the library but they do not constrain the associated domain motion. CORAL allows one to simultaneously fit multiple scattering curves from the subsets of the entire system assuming the same arrangement of domains/subunits in these constructs. Also it allows taking symmetry (as constraint) and anisometry (as restraint) into account.

A simulated annealing protocol is employed to find the optimal positions and orientations of available high resolution models of domains and the approximate conformations of the missing portions of polypeptide chain(s). Please refer to SASREF and BUNCH manuals and the paper cited above for the details of simulated annealing protocol, and the computation of the scattering intensity from a mixed model combining atomic resolution structures with DR chains.

Running CORAL

Interactive Configuration

CORAL can only be run in the dialog mode, no command line arguments are accepted. A special configuration file has to be created before running CORAL. There are two modes, EXPERT and USER. In the former mode, the users have the options to adjust any program parameters. In the latter mode, fewer questions are asked as the default values are used for the most of program parameters, the user only needs to provide basic input. The default settings are the same in both modes.

CORAL interactive prompt:

Screen TextDefaultAsked in USER-mode? Description
Computation mode (User or Expert) USER Y Mode selection.
Log file name N/A Y Project identifier, will be used as a prefix for all output file names.
Enter project description N/A Y Any text that will be stored in the log file.
File name with objects info N/A Y File name with the initial configuration.
Fix the subunit at original position? [Y/N] N Y The fixation option may be used to keep the desired positions of certain domains. This question is asked for each domain in all symmetry-independent chains. Make sure that the fixed domains are not very far from (0,0,0), otherwise the overall center may be significantly displaced from the origin so that the intensity calculation will be affected.
Pair of domains to group 0,0 Y One may force consorted movements of specific domains by pairing them, e.g. to keep the known binding interface. If more than two domains have to be paired, all combinations of the pairs have to be specified. E.g. for pairing the 1st, the 3rd and the 5th domains, one needs to enter subsequently 1,3; 1,5; 3,5. This question is asked until 0,0 is answered.
DR formfactor multiplier 1.0 N The weight of the DR formfactors may be adjusted. For instance, an increased value (~1.2) would allow to account for an extra hydration if it is known that the loops are exposed to the solvent.
Symmetry: P1...19 or Pn2 (n=1,..,12) P1 Y Supported symmetries are: P1, P2-P19 (nineteen-fold), P222, P32-P(12)2. The n-fold axis is typically Z, if there is in addition a two-fold axis it coincides with Y.
Cross penalty weight 100.0 N How much the Cross Penalty shall influence the acceptance or rejection of mutation. A value of 0.0 disables the penalty. If unsure, use the default value. If clashes between the loops or domains are observed, try increasing this penalty weight.
Shift penalty weight 1.0 N How much shift from the origin of the entire protein shall influence the acceptance or rejection of mutation. A value of 0.0 disables the penalty. If unsure, use the default value. This penalty is necessary to keep the model close to the origin so that the higher order harmonics are not lost and the scattering is computed accurately. One needs to increase the weight in case resulting model is significantly shifted from the origin.
File name, contact conditions, CR for none <.cnd > empty Y If the information on contacting residues is available it may be used as a modeling restraint. The information is provided in a file with special format. By default no information is given.
Contacts penalty weight 10.0 N How much improper contacts shall influence the acceptance or rejection of mutation. If unsure, use the default value. If desired interfaces are not obtained, try increasing this penalty weight. This question is only asked if the contact conditions file is provided.
Input total number of scattering curves 1 Y If in addition to the entire complex, the scattering curves of its partial constructs are available, they can be fitted simultaneously assuming the same arrangement of domains in all the constructs.
Account for constant background Y N Whether or not to adjust a background constant in the fitting
Input first & last residues in 1-st construct 1,var Y The residues range present in the given construct (scattering curve). The residues belonging to different chains are sequentially numbered according to their appearance in the configuration file. This question is asked for each construct, i.e. the number of times equals to the total number of scattering curves (answer to the previous question). The default answer is from 1 to the last residue.
Enter file name, 1-st experimental data <.dat > N/A Y The name of the data file containing the experimental SAXS profile of a certain construct. The question is asked for each construct.
Angular units in the input file :
4*pi*sin(theta)/lambda [1/angstrom] (1)
4*pi*sin(theta)/lambda [1/nm ] (2)
2* sin(theta)/lambda [1/angstrom] (3)
2* sin(theta)/lambda [1/nm ] (4)
1 Y Formula for the scattering vector in the data file and its units. The question is asked for each construct.
Fitting range in fractions of Smax 1.0 Y Percentage of the scattering curve to fit, starting at the first point. Default is the entire curve. The question is asked for each construct.
Spatial step in angstroems 5.0 Y Maximal random shift of a domain at a single modification of the system in the course of simulated annealing.
Angular step in degrees 20.0 Y Maximal random rotation angle of a domain at a single modification of the system in the course of simulated annealing.
Initial annealing temperature 10.0 N Starting temperature of simulated annealing protocol.
Annealing schedule factor 0.9 N Factor by which the temperature is decreased at each step; 0.9 is a good average value. If slower cooling is wanted increase the value (e.g. to 0.95).
Max # of iterations at each T var N Finalize temperature step and cool after this many iterations at the latest. The default value depends on the total numbers of domains and residues.
Max # of successes at each T var N Finalize temperature step and cool after at most this many successful mutations. The default value depends on the total numbers of domains and residues.
Min # of successes to continue 100 N Stop simulated annealing if not at least this many successful mutations within a single temperature step can be done.
Max # of annealing steps 100 N Stop if simulated annealing is not finished after this many steps. The slower the system is cooled, the more temperature steps are required.

Runtime Output

On runtime, two lines of output will be generated for each temperature step:

 j:   1 T: 0.100E+01 Suc:  1000 Eva:     2711 CPU:  0.503E+02 F:30.8120 Pen: 28.0621
 The best chi values: 1.65827

The fields can be interpreted as follows, top-left to bottom-right:

FieldDescription
j Step number. Starts at 1, increases monotonically.
T Temperature measure, starts at an arbitrary high value, decreases each step by the annealing schedule factor.
Suc Number of successful mutations in this temperature step. Limited by the minimum and maximum number of successes. The number of successes should slowly decrease, the first couple of steps should be terminated by the maximum number of successes criterion. If instead the maximum number of iterations are done, or the number of successes drops suddenly by a large amount, the system should probably be cooled more slowly.
Eva Accumulated number of function evaluations.
CPU Elapsed wall-clock time since the annealing procedure was started.
F The best target function value obtained so far.
Pen Accumulated penalty value of the best target function.
The best chi values For each curve out of total number of curves, the χ value of the best target function is given.

CORAL Input Files

CORAL uses the SAXS experimental data files (*.dat) in ASCII format containing 3 columns: (1) experimental scattering vector, (2) experimental intensity and (3) experimental errors.

Starting configuration

Initial configuration is specified using a configuration file which format is demonstrated by the example of a complex consisting of two proteins A and B (i.e. two chains), whereby A contains three domains a1.pdb, a2.pdb and a3.pdb, 20 aa are missing at the N-terminal and the two linkers are of 25 and 30 aa; B contains two domains b1.pdb and b2.pdb, 10 aa are missing at the C-terminal and the linker is of 15 aa:

   NTER 20
   a1.pdb
   LINK 25
   a2.pdb
   LINK 30
   a3.pdb
   b1.pdb
   LINK 15
   b2.pdb
   CTER 10

Note: any two pdb files not separated by the string LINK..., assumed to be belonging to different chains. If the symmetry is applied, the configuration should describe the asymmetric part only.

Contacts

An optional contact conditions file has a format similar to that of SASREF with the only difference that it refers to the chains instead of subunits(domains). The following conditions require the distance of 7 Å between the residues 25 and 115 from the same chain and the distance of 5 Å between the residue 40 from the first chain and 50 from the second.

      dist 7.0
      1 25 25 1 115 115
      dist 5.0
      1 40 40 2 50 50

If two (or more) alternatives are given after the line with the keyword "dist", the program compares the better (smaller) distance among them with the specified one.

CORAL Output Files

After each simulated annealing step, CORAL creates a set of output files, each filename starts with a customizable prefix that gets an extension appended. If a prefix has been used before, existing files will be overwritten without further note.

ExtensionDescription
.log Contains the same information as the screen output and is updated during execution of the program.
.pdb Current model of the entire complex. The REMARK section of the file contains information about the parameters of the model, e.g. penalties and χ.
-i.fit Fit of the scattering curve computed from a construct versus the corresponding experimental data. i stands for the construct number. Columns in the output file are: 's', 'Iexp' and 'Icomp'.

Example

Glutamate decarboxylase (Gad) hexamer with three calmodulin (CaM) molecules bound to three pairs of Gad C-terminal peptides is employed for sample run. The structures of the homohexameric Gad core and 1:2 complex of CaM with C-terminal peptide are known. The peptide is connected to the Gad core domain by a 22 aa linker.

The configuration file (config.con) is as follows:

      m1.pdb
      LINK 22
      pept1a.pdb
      m2.pdb
      LINK 22
      pept2a.pdb
      cama.pdb
      m3.pdb
      LINK 22
      pept1b.pdb
      m4.pdb
      LINK 22
      pept2b.pdb
      camb.pdb
      m5.pdb
      LINK 22
      pept1c.pdb
      m6.pdb
      LINK 22
      pept2c.pdb
      camc.pdb

Here, m?.pdb and pept??.pdb are the atomic models of the Gad core domains and its C-terminal portions, respectively, and cam?.pdb are three copies of the CaM molecule.

A listing of questions / answers of CORAL in USER mode:

$ coral
 Computation mode (User or Expert) ...... <         User >:
 Log file name .......................... <         .log >: gadcam
 Project identificator .................................. : gadcam
 Enter project description .............. : Gad hexamer + 3 CaM
 Random sequence initialized from ....................... : 191723
 File name with objects info ............ <         .con >: config
 Coordinates of the 1-st subunit evaluated from ......... : m1.pdb
  3512 atoms read, center at   -30.70   -8.02   10.84
 Fix the subunit at original position? [ Y / N ] <           No >: Y
 Coordinates of the 2-nd subunit evaluated from ......... : pept1a.pdb
   234 atoms read, center at   -58.01   33.40    4.22
 Fix the subunit at original position? [ Y / N ] <           No >:
 Coordinates of the 3-rd subunit evaluated from ......... : m2.pdb
  3512 atoms read, center at    -8.40   30.59  -10.84
 Fix the subunit at original position? [ Y / N ] <           No >: Y
 Coordinates of the 4-th subunit evaluated from ......... : pept2a.pdb
   234 atoms read, center at   -66.51   30.35    8.17
 Fix the subunit at original position? [ Y / N ] <           No >:
 Coordinates of the 5-th subunit evaluated from ......... : cama.pdb
  1170 atoms read, center at   -67.64   33.48   -2.84
 Fix the subunit at original position? [ Y / N ] <           No >:
 Coordinates of the 6-th subunit evaluated from ......... : m3.pdb
  3512 atoms read, center at     8.40   30.59   10.84
 Fix the subunit at original position? [ Y / N ] <           No >: Y
 Coordinates of the 7-th subunit evaluated from ......... : pept1b.pdb
   234 atoms read, center at    57.93   33.53    4.22
 Fix the subunit at original position? [ Y / N ] <           No >:
 Coordinates of the 8-th subunit evaluated from ......... : m4.pdb
  3512 atoms read, center at    30.70   -8.02  -10.84
 Fix the subunit at original position? [ Y / N ] <           No >: Y
 Coordinates of the 9-th subunit evaluated from ......... : pept2b.pdb
   234 atoms read, center at    59.54   42.42    8.17
 Fix the subunit at original position? [ Y / N ] <           No >:
 Coordinates of the 10-th subunit evaluated from ........ : camb.pdb
  1170 atoms read, center at    62.81   41.84   -2.84
 Fix the subunit at original position? [ Y / N ] <           No >:
 Coordinates of the 11-th subunit evaluated from ........ : m5.pdb
  3512 atoms read, center at    22.29  -22.57   10.84
 Fix the subunit at original position? [ Y / N ] <           No >: Y
 Coordinates of the 12-th subunit evaluated from ........ : pept1c.pdb
   234 atoms read, center at     0.07  -66.94    4.22
 Fix the subunit at original position? [ Y / N ] <           No >:
 Coordinates of the 13-th subunit evaluated from ........ : m6.pdb
  3512 atoms read, center at   -22.29  -22.57  -10.84
 Fix the subunit at original position? [ Y / N ] <           No >: Y
 Coordinates of the 14-th subunit evaluated from ........ : pept2c.pdb
   234 atoms read, center at     6.97  -72.77    8.17
 Fix the subunit at original position? [ Y / N ] <           No >:
 Coordinates of the 15-th subunit evaluated from ........ : camc.pdb
  1170 atoms read, center at     4.83  -75.32   -2.84
 Fix the subunit at original position? [ Y / N ] <           No >:
 Pair of domains to group .. <            0,           0 >: 2,4
 Pair of domains to group .. <            0,           0 >: 2,5
 Pair of domains to group .. <            0,           0 >: 4,5
 Pair of domains to group .. <            0,           0 >: 7,9
 Pair of domains to group .. <            0,           0 >: 7,10
 Pair of domains to group .. <            0,           0 >: 9,10
 Pair of domains to group .. <            0,           0 >: 12,14
 Pair of domains to group .. <            0,           0 >: 12,15
 Pair of domains to group .. <            0,           0 >: 14,15
 Pair of domains to group .. <            0,           0 >:
 Number of backbone atoms generated ..................... : 3372
 Averaged formfactors of DRs used
 DR formfactor multiplier ............................... : 1.000
 Symmetry: P1...19 or Pn2 (n=1,..,12) ... <           P1 >:
 Cross penalty .......................................... : 6.383e-2
 Cross penalty weight ................................... : 100.0
 Shift penalty .......................................... : 3.182e-3
 Shift penalty weight ................................... : 1.000
 File name, contacts conditions, CR for none <         .cnd >:
 Input total number of scattering curves  <            1 >:
 Input first & last residues in 1-st construct <            1,        3372 >:
 Enter file name, 1-st experimental data  <         .dat >: gad_cam-mer.dat
 Number of experimental points found .................... : 984
 Angular units in the input file :
 4*pi*sin(theta)/lambda [1/angstrom] (1)
 4*pi*sin(theta)/lambda [1/nm      ] (2)
 2*   sin(theta)/lambda [1/angstrom] (3)
 2*   sin(theta)/lambda [1/nm      ] (4)  <            1 >:
 Fitting range in fractions of Smax ..... <        1.000 >:
 Experimental radius of gyration ........................ : 57.80
 Number of points in the Guinier Plot ................... : 10
 Maximum s-vector in master grid ........................ : 0.4478
 Number of points in partial amplitudes ................. : 101
 Maximum order of harmonics ............................. : 14
 Computing X-ray Alms for m1.pdb
 ALMGRZ --- :  181800 summation coefficients used
 Computing X-ray Alms for pept1a.pdb
 Computing X-ray Alms for m2.pdb
 Computing X-ray Alms for pept2a.pdb
 Computing X-ray Alms for cama.pdb
 Computing X-ray Alms for m3.pdb
 Computing X-ray Alms for pept1b.pdb
 Computing X-ray Alms for m4.pdb
 Computing X-ray Alms for pept2b.pdb
 Computing X-ray Alms for camb.pdb
 Computing X-ray Alms for m5.pdb
 Computing X-ray Alms for pept1c.pdb
 Computing X-ray Alms for m6.pdb
 Computing X-ray Alms for pept2c.pdb
 Computing X-ray Alms for camc.pdb
 Total penalty .......................................... : 6.386
 1-st curve:
 NEXP reduced to ........................................ : 983
 Theoretical  points from            5  to          101  used
 The best chi values:11.03272
 Initial fVal ........................................... : 128.1
 Spatial step in angstroems ............. <        5.000 >:
 Spatial step in angstroems ............. <        5.000 >:
 Angular step in degrees ................ <        20.00 >:
 Initial annealing temperature .......................... : 10.00
 Annealing schedule factor .............................. : 0.9000
 Max # of iterations at each T .......................... : 51600
 Max # of successes at each T ........................... : 5160
 Min # of successes to continue ......................... : 100
 Max # of annealing steps ............................... : 100
  ====  Simulated annealing procedure started  ====
 j:   1 T: 0.100E+02 Suc:  1000 Eva:     1165 CPU:  0.107E+03 F:15.8416 Pen:  1.2895
 The best chi values: 3.81472
 j:   2 T: 0.900E+01 Suc:  1000 Eva:     2337 CPU:  0.214E+03 F:15.8416 Pen:  1.2895
 The best chi values: 3.81472
 j:   3 T: 0.810E+01 Suc:  1000 Eva:     3477 CPU:  0.317E+03 F:14.4729 Pen:  1.9555
 The best chi values: 3.53799
 ...

Note that the six core domains of Gad are fixed so that the hexamer remains intact and three CaM are coupled with two C-terminal peptides each to keep their known arrangement.


  Last modified: April 11, 2013

© BioSAXS group 2013