CORAL (COmplexes with RAndom Loops) performs SAXS-based rigid body modeling
of complexes, one or several of whose components lack some fragments (e.g. termini
portions or interdomain linkers are missing). CORAL, similarly to
SASREF, translates and rotates the atomic models
of individual domains belonging to multiple components of the complex,
however these rearrangements are not fully random: the distances between
the N- and C-terminal portions of the subsequent domains belonging to one
chain are constrained. For this purpose a pre-generated library of self-avoiding
random loops composed of DRs is utilized. It covers the linker lengths from 5 to
100 amino acids and samples 20 random structures for every possible end-to-end
distances for the given length with the binning step of 2A. When a domain is moved
in CORAL, its new position is examined by querying the library: if a linker of the
appropriate length to connect this domain with preceding/following one cannot
be found, then such a movement is rejected. If the query is successful, the
corresponding random loop is inserted as a placeholder of the missing linker
and its contribution is also added to the computed scattering intensity
of the system and to the target function (e.g. overlaps, contacts restraints etc).
C- and N- terminal portions can also be randomly selected from the library
but they do not constrain the associated domain motion.
CORAL allows one to simultaneously fit multiple scattering curves from the subsets
of the entire system assuming the same arrangement of domains/subunits in these
constructs. Also it allows taking symmetry (as constraint) and anisometry (as restraint)
into account.
A simulated annealing protocol is employed to find the optimal positions
and orientations of available high resolution models of domains and the approximate
conformations of the missing portions of polypeptide chain(s). Please refer to
SASREF and BUNCH manuals
and the paper cited above for the details of simulated annealing protocol,
and the computation of the scattering intensity from a mixed model combining
atomic resolution structures with DR chains.
CORAL can only be run in the dialog mode, no command line arguments are accepted.
A special configuration file has to be created before running CORAL.
There are two modes, EXPERT and USER. In the former mode, the users
have the options to adjust any program parameters. In the latter mode, fewer questions
are asked as the default values are used for the most of program parameters,
the user only needs to provide basic input. The default settings are the same
in both modes.
The fixation option may be used to keep the desired positions of certain domains.
This question is asked for each domain in all symmetry-independent chains.
Make sure that the fixed domains are not very far from (0,0,0), otherwise
the overall center may be significantly displaced from the origin so that
the intensity calculation will be affected.
One may force consorted movements of specific domains by pairing them,
e.g. to keep the known binding interface. If more than two domains have to be paired,
all combinations of the pairs have to be specified. E.g. for pairing the 1st, the 3rd
and the 5th domains, one needs to enter subsequently 1,3; 1,5; 3,5.
This question is asked until 0,0 is answered.
DR formfactor multiplier
1.0
N
The weight of the DR formfactors may be adjusted.
For instance, an increased value (~1.2) would allow to account for an extra
hydration if it is known that the loops are exposed to the solvent.
Supported symmetries are: P1, P2-P19 (nineteen-fold), P222, P32-P(12)2.
The n-fold axis is typically Z, if there is in addition a two-fold axis it coincides
with Y.
Cross penalty weight
100.0
N
How much the Cross Penalty shall influence the acceptance or rejection
of mutation. A value of 0.0 disables the penalty. If unsure, use the
default value. If clashes between the loops or domains are observed, try
increasing this penalty weight.
Shift penalty weight
1.0
N
How much shift from the origin of the entire protein shall influence
the acceptance or rejection of mutation. A value of 0.0 disables the
penalty. If unsure, use the default value. This penalty is necessary to keep the
model close to the origin so that the higher order harmonics are not lost
and the scattering is computed accurately. One needs to increase the weight
in case resulting model is significantly shifted from the origin.
If the information on contacting residues is available it may be used as
a modeling restraint. The information is provided in a file with special
format. By default no information is given.
Contacts penalty weight
10.0
N
How much improper contacts shall influence the acceptance or rejection
of mutation. If unsure, use the default value. If desired interfaces are not
obtained, try increasing this penalty weight. This question is only asked if
the contact conditions file is provided.
If in addition to the entire complex, the scattering curves
of its partial constructs are available, they can be fitted
simultaneously assuming the same arrangement of domains in all the constructs.
The residues range present in the given construct (scattering curve).
The residues belonging to different chains are sequentially numbered according
to their appearance in the configuration file.
This question is asked for each construct, i.e. the number of times equals
to the total number of scattering curves
(answer to the previous question). The default answer is from 1 to the
last residue.
Enter file name, 1-st experimental data <.dat >
N/A
Y
The name of the data file containing the experimental SAXS profile
of a certain construct. The question is asked for each construct.
Angular units in the input file :
4*pi*sin(theta)/lambda [1/angstrom] (1)
4*pi*sin(theta)/lambda [1/nm ] (2)
2* sin(theta)/lambda [1/angstrom] (3)
2* sin(theta)/lambda [1/nm ] (4)
1
Y
Formula for the scattering vector in the data file and its units.
The question is asked for each construct.
Fitting range in fractions of Smax
1.0
Y
Percentage of the scattering curve to fit, starting at the first point.
Default is the entire curve. The question is asked for each construct.
Spatial step in angstroems
5.0
Y
Maximal random shift of a domain at a single modification
of the system in the course of simulated annealing.
Angular step in degrees
20.0
Y
Maximal random rotation angle of a domain at a single modification
of the system in the course of simulated annealing.
Finalize temperature step and cool after at most this many
successful mutations. The default value depends on the total numbers
of domains and residues.
On runtime, two lines of output will be generated for each
temperature step:
j: 1 T: 0.100E+01 Suc: 1000 Eva: 2711 CPU: 0.503E+02 F:30.8120 Pen: 28.0621
The best chi values: 1.65827
The fields can be interpreted as follows, top-left to bottom-right:
Field
Description
j
Step number. Starts at 1, increases monotonically.
T
Temperature measure, starts at an arbitrary high value, decreases
each step by the annealing schedule
factor.
Suc
Number of successful mutations in this temperature step.
Limited by the minimum and
maximum number of successes.
The number of successes should slowly decrease, the first couple of
steps should be terminated by the maximum
number of successes criterion. If instead the
maximum number of iterations are done, or the number
of successes drops suddenly by a large amount, the system should
probably be cooled more slowly.
Eva
Accumulated number of function evaluations.
CPU
Elapsed wall-clock time since the annealing procedure was started.
F
The best target function value obtained so far.
Pen
Accumulated penalty value of the best target function.
The best chi values
For each curve out of total number of curves, the χ
value of the best target function is given.
CORAL uses the SAXS experimental data files (*.dat) in ASCII format
containing 3 columns: (1) experimental scattering vector, (2) experimental intensity
and (3) experimental errors.
Initial configuration is specified using a configuration file which format is
demonstrated by the example of a complex consisting of two proteins A and B (i.e. two chains),
whereby A contains three domains a1.pdb, a2.pdb and a3.pdb,
20 aa are missing at the N-terminal and the two linkers are of 25
and 30 aa; B contains two domains b1.pdb and b2.pdb,
10 aa are missing at the C-terminal and the linker is of 15 aa:
NTER 20
a1.pdb
LINK 25
a2.pdb
LINK 30
a3.pdb
b1.pdb
LINK 15
b2.pdb
CTER 10
Note: any two pdb files not separated by the string LINK..., assumed to be belonging
to different chains. If the symmetry is applied, the configuration should describe
the asymmetric part only.
An optional contact conditions file has a format
similar to that of SASREF with the only
difference that it refers to the chains instead of subunits(domains). The following
conditions require the distance of 7 Å between the residues 25 and 115 from the
same chain and the distance of 5 Å between the residue 40 from the first chain
and 50 from the second.
If two (or more) alternatives are given after the line with the keyword
"dist", the program compares the better (smaller) distance among them with
the specified one.
After each simulated annealing step, CORAL creates a set of output files,
each filename starts with a customizable prefix
that gets an extension appended. If a prefix has been used before, existing
files will be overwritten without further note.
Fit of the scattering curve computed from a construct
versus the corresponding experimental data. i stands for the
construct number. Columns in the output file
are: 's', 'Iexp' and 'Icomp'.
Glutamate decarboxylase (Gad) hexamer with three calmodulin (CaM) molecules bound
to three pairs of Gad C-terminal peptides is employed for sample run. The structures
of the homohexameric Gad core and 1:2 complex of CaM with C-terminal peptide are
known. The peptide is connected to the Gad core domain by a 22 aa linker.
The configuration file (config.con) is as follows:
m1.pdb
LINK 22
pept1a.pdb
m2.pdb
LINK 22
pept2a.pdb
cama.pdb
m3.pdb
LINK 22
pept1b.pdb
m4.pdb
LINK 22
pept2b.pdb
camb.pdb
m5.pdb
LINK 22
pept1c.pdb
m6.pdb
LINK 22
pept2c.pdb
camc.pdb
Here, m?.pdb and pept??.pdb are the atomic models of the Gad
core domains and its C-terminal portions, respectively, and cam?.pdb are
three copies of the CaM molecule.
A listing of questions / answers of CORAL in USER mode:
$ coral
Computation mode (User or Expert) ...... < User >:
Log file name .......................... < .log >: gadcam
Project identificator .................................. : gadcam
Enter project description .............. : Gad hexamer + 3 CaM
Random sequence initialized from ....................... : 191723
File name with objects info ............ < .con >: config
Coordinates of the 1-st subunit evaluated from ......... : m1.pdb
3512 atoms read, center at -30.70 -8.02 10.84
Fix the subunit at original position? [ Y / N ] < No >: Y
Coordinates of the 2-nd subunit evaluated from ......... : pept1a.pdb
234 atoms read, center at -58.01 33.40 4.22
Fix the subunit at original position? [ Y / N ] < No >:
Coordinates of the 3-rd subunit evaluated from ......... : m2.pdb
3512 atoms read, center at -8.40 30.59 -10.84
Fix the subunit at original position? [ Y / N ] < No >: Y
Coordinates of the 4-th subunit evaluated from ......... : pept2a.pdb
234 atoms read, center at -66.51 30.35 8.17
Fix the subunit at original position? [ Y / N ] < No >:
Coordinates of the 5-th subunit evaluated from ......... : cama.pdb
1170 atoms read, center at -67.64 33.48 -2.84
Fix the subunit at original position? [ Y / N ] < No >:
Coordinates of the 6-th subunit evaluated from ......... : m3.pdb
3512 atoms read, center at 8.40 30.59 10.84
Fix the subunit at original position? [ Y / N ] < No >: Y
Coordinates of the 7-th subunit evaluated from ......... : pept1b.pdb
234 atoms read, center at 57.93 33.53 4.22
Fix the subunit at original position? [ Y / N ] < No >:
Coordinates of the 8-th subunit evaluated from ......... : m4.pdb
3512 atoms read, center at 30.70 -8.02 -10.84
Fix the subunit at original position? [ Y / N ] < No >: Y
Coordinates of the 9-th subunit evaluated from ......... : pept2b.pdb
234 atoms read, center at 59.54 42.42 8.17
Fix the subunit at original position? [ Y / N ] < No >:
Coordinates of the 10-th subunit evaluated from ........ : camb.pdb
1170 atoms read, center at 62.81 41.84 -2.84
Fix the subunit at original position? [ Y / N ] < No >:
Coordinates of the 11-th subunit evaluated from ........ : m5.pdb
3512 atoms read, center at 22.29 -22.57 10.84
Fix the subunit at original position? [ Y / N ] < No >: Y
Coordinates of the 12-th subunit evaluated from ........ : pept1c.pdb
234 atoms read, center at 0.07 -66.94 4.22
Fix the subunit at original position? [ Y / N ] < No >:
Coordinates of the 13-th subunit evaluated from ........ : m6.pdb
3512 atoms read, center at -22.29 -22.57 -10.84
Fix the subunit at original position? [ Y / N ] < No >: Y
Coordinates of the 14-th subunit evaluated from ........ : pept2c.pdb
234 atoms read, center at 6.97 -72.77 8.17
Fix the subunit at original position? [ Y / N ] < No >:
Coordinates of the 15-th subunit evaluated from ........ : camc.pdb
1170 atoms read, center at 4.83 -75.32 -2.84
Fix the subunit at original position? [ Y / N ] < No >:
Pair of domains to group .. < 0, 0 >: 2,4
Pair of domains to group .. < 0, 0 >: 2,5
Pair of domains to group .. < 0, 0 >: 4,5
Pair of domains to group .. < 0, 0 >: 7,9
Pair of domains to group .. < 0, 0 >: 7,10
Pair of domains to group .. < 0, 0 >: 9,10
Pair of domains to group .. < 0, 0 >: 12,14
Pair of domains to group .. < 0, 0 >: 12,15
Pair of domains to group .. < 0, 0 >: 14,15
Pair of domains to group .. < 0, 0 >:
Number of backbone atoms generated ..................... : 3372
Averaged formfactors of DRs used
DR formfactor multiplier ............................... : 1.000
Symmetry: P1...19 or Pn2 (n=1,..,12) ... < P1 >:
Cross penalty .......................................... : 6.383e-2
Cross penalty weight ................................... : 100.0
Shift penalty .......................................... : 3.182e-3
Shift penalty weight ................................... : 1.000
File name, contacts conditions, CR for none < .cnd >:
Input total number of scattering curves < 1 >:
Input first & last residues in 1-st construct < 1, 3372 >:
Enter file name, 1-st experimental data < .dat >: gad_cam-mer.dat
Number of experimental points found .................... : 984
Angular units in the input file :
4*pi*sin(theta)/lambda [1/angstrom] (1)
4*pi*sin(theta)/lambda [1/nm ] (2)
2* sin(theta)/lambda [1/angstrom] (3)
2* sin(theta)/lambda [1/nm ] (4) < 1 >:
Fitting range in fractions of Smax ..... < 1.000 >:
Experimental radius of gyration ........................ : 57.80
Number of points in the Guinier Plot ................... : 10
Maximum s-vector in master grid ........................ : 0.4478
Number of points in partial amplitudes ................. : 101
Maximum order of harmonics ............................. : 14
Computing X-ray Alms for m1.pdb
ALMGRZ --- : 181800 summation coefficients used
Computing X-ray Alms for pept1a.pdb
Computing X-ray Alms for m2.pdb
Computing X-ray Alms for pept2a.pdb
Computing X-ray Alms for cama.pdb
Computing X-ray Alms for m3.pdb
Computing X-ray Alms for pept1b.pdb
Computing X-ray Alms for m4.pdb
Computing X-ray Alms for pept2b.pdb
Computing X-ray Alms for camb.pdb
Computing X-ray Alms for m5.pdb
Computing X-ray Alms for pept1c.pdb
Computing X-ray Alms for m6.pdb
Computing X-ray Alms for pept2c.pdb
Computing X-ray Alms for camc.pdb
Total penalty .......................................... : 6.386
1-st curve:
NEXP reduced to ........................................ : 983
Theoretical points from 5 to 101 used
The best chi values:11.03272
Initial fVal ........................................... : 128.1
Spatial step in angstroems ............. < 5.000 >:
Spatial step in angstroems ............. < 5.000 >:
Angular step in degrees ................ < 20.00 >:
Initial annealing temperature .......................... : 10.00
Annealing schedule factor .............................. : 0.9000
Max # of iterations at each T .......................... : 51600
Max # of successes at each T ........................... : 5160
Min # of successes to continue ......................... : 100
Max # of annealing steps ............................... : 100
==== Simulated annealing procedure started ====
j: 1 T: 0.100E+02 Suc: 1000 Eva: 1165 CPU: 0.107E+03 F:15.8416 Pen: 1.2895
The best chi values: 3.81472
j: 2 T: 0.900E+01 Suc: 1000 Eva: 2337 CPU: 0.214E+03 F:15.8416 Pen: 1.2895
The best chi values: 3.81472
j: 3 T: 0.810E+01 Suc: 1000 Eva: 3477 CPU: 0.317E+03 F:14.4729 Pen: 1.9555
The best chi values: 3.53799
...
Note that the six core domains of Gad are fixed so that the hexamer remains intact
and three CaM are coupled with two C-terminal peptides each to keep their known arrangement.