The use of GASBOR is similar to that of DAMMIN or DAMMIF. Most of
parameters have the same meaning. The most important difference is that
the protein structure is represented not by dummy spheres on lattice
(called dummy atoms in DAMMIN/DAMMIF, but not corresponding to real atoms),
but rather by an ensemble of dummy residues (corresponding to average
residue densities) placed anywhere in continuous space with a preferred
number of close distance neighbours for each atom. The centers
of these residues aim to approximate positions of the C-α
atoms in the protein structure. The number of residues should be equal to
that in the protein.
Note, however, that these residues are anonymous, in the sense
that their ordinal numbers in the model has nothing to do with the numbering
primary sequence of the protein!
Accordingly, the program does not subtract any Porod constant from
the experimental data. In DAMMIN, it was
recommended to discard high angle portions of the scattering patterns;
in GASBOR, on the contrary, one should use them. The program is
able to fit the data up to the resolution of 5 angstroms, i.e. momentum
transfer s = 4π*sin(θ)/λ = 1.2
Å-1.
Prefix to prepend to output filenames. Default is the name of the
GASBOR input file without extension.
-sy <SYMMETRY>
Specify the point symmetry of the particle. Point groups
P1, ..., P19, Pn2 (n = 2, ..., 12), P23, P432 or PICO (icosahedral)
are supported. By default, no symmetry is enforced (P1).
-id <DESCRIPTION>
Project description. By default, the command line content is used.
-an <ANISOMETRY>
Particle anisometry: oblate (O), prolate (P) or unknown (default).
-dr <DIRECTION>
Direction of anisometry, applicable with P2 symmetry only: along (L), across (C) or
unknown (default).
-un <UNIT>
Angular unit of the input file, either 1 [1/Angstrom] or 2 [1/nm]; undefined by default.
-h
Print a summary of arguments and options and exit.
There are two versions of GASBOR, one performing the fit of
the intensity in reciprocal space (GASBORI), and the other
fitting the real space P(r) function (GASBORP). The
algorithms of the two versions are similar. The reciprocal space version
is slower but usually yields better fits to the experimental data. The
real space version is much faster, and should be used when number of dummy
residues makes runtime excessive (as runtime is proportional to square of
number of dummy residues.)
In addition, reciprocal space version is also available in implementation
accounting for oligomeric equilibrium (GASBORMX). In this case,
ab initio model of symmetric oligomer is built while assuming some fraction
of monomers in solution (i.e. polydisperse sample).
After starting GASBOR one may specify:
Prompt
Possible value(s)
Default value
Description
Computation mode
User or Expert
User
After choosing Expert GASBOR will let you
configure additional expert mode
parameters.
The question is only asked by GASBORMX which may fit a
concentration series of oligomeric equilibrium.
Input data, GNOM output file name
filename, like:lyz.out
Input file with valid GNOM output. If GASBOR
doesn't accept the file, then check that GNOM
run has been finished and P(r) function written to the file.
This question is asked for each curve, i.e. the number of times equals
to the total number of curves to fit
by GASBORMX.
Angular units in the input file
1
1
means that data unit is Å-1
2
means that data unit is nm-1.
This question is asked for each curve, i.e. the number of times equals
to the total number of curves to fit
by GASBORMX.
Portion of the curve to be fitted
0.001-1.0
1.0 for entire curve
Whether curve should be fitted in entirety, or just a part of it.
This question is asked for each curve, i.e. the number of times equals
to the total number of curves to fit
by GASBORMX.
Volume fraction of monomer (if known)
-1.0; (0.0, 1.0)
-1.0 if unknown
If a positive number (below 1.0) is given, such volume fraction
of the monomer is kept fixed in the course of modelling. This question is only
asked by GASBORMX.
Initial DRM
filename, like:gasbor.pdb
none
Enter, if you want to start with a model from previous
GASBOR run. Otherwise just press CR.
Symmetry: P1...19 or Pn2 (n=1,..,12)
or P23 or P432 or PICO
P1...P19 or
P12...P122 or
P23 or P432 or
PICO
P2 for GASBORMX, otherwise P1
Particle symmetry to be enforced. Number of residues given further
refers to a single asymmetric unit (monomer).
Number of residues in asymmetric part
integer > 0
none
Number of residues within a single asymmetric unit.
Fibonacci grid order
0...18
order that gives number of waters close to the number of dummy residues
Order of the Fibonacci grid to generate dummy waters.
Expected particle shape: <P>rolate, <O>blate, or <U>nknown
P, O or U
U
Constrains particle shape, if it is known to be significantly non-globular (non-spherical).
Gives more accurate results in this case.
Regularized intensity is recomputed to have so many points
for fitting.
Radius of the search volume
positive real number
Dmax/2
Radius of the volume in which dummy atoms will be placed.
Limits the sampling space.
Histogram penalty weight
positive real number
1.000e-3
Weight of the penalty when histogram of interresidue distances
looks different from expected for a protein.
Bond length penalty weight
positive real number
1.000e-2
Penalty for the bond lengths other than 3.8 Å.
Discontiguity penalty weight
positive real number
1.000e-2
Penalty for disconnected dummy residues.
Peripheral penalty weight
positive real number
1.0
Penalty term that ensures compact arrangement of DRs at the beginning.
The weight is gradually reduced in the course of simulated annealing.
Contrast of the hydration layer
positive real number
3.000e-2
Contrast of the hydration layer relative to the solvent
Sequence file name
any filename:lyzozyme.seq
none
Filename with protein sequence to compute the sequence specific
dummy residue form factors. Besides other
limitations, lines in this file must not exceed 256 characters.
Weight:
0
2
Weight I(s) fit according to s2
1
as above, with constant for s<MaxPor
2
as above, with average for s<MaxPor
3
weight I(s) proportionaly to s
4
as above, with constant for s<MaxI*s
5
as above, with average for s<MaxI*s
6
compute fit in logarithmic scale
Account for constant background
Yes or No
Yes
Whether constant background should be subtracted when fitting.
Initial scale factor
positive real number
depends on input
Initial scaling factor for fitting experimental data.
Fixing threshold for Rf
0.0
obsolete
Fixing threshold for PenCha
0.0
obsolete
Fixing threshold for PenLen
0.0
obsolete
Initial annealing temperature
positive real number
1.000e-3
Initial temperature for annealing process defines probability
of jumping into state of higher pseudo-energy (worse score)
on each Monte-Carlo step.
Annealing schedule factor
positive real number<1.0
0.9000
Temperature will be multiplied by this factor after each round
of simulated annealing to decrease it.
# of independent atoms to modify
integer > 0
1
Number of atoms to reposition on each annealing step.
Max # of iterations at each T
integer ≥ 0
45000
Each round of simulated annealing will terminate after at most
so many iterations, and temperature will be decreased.
Max # of successes at each T
integer > 0
4500
Each round of simulated annealing will terminate prematurely
after so many successful iterations, and temperature will be
decreased.
Min # of successes to continue
integer > 1
45
Program will terminate after a round of simulated annealing
gives less than 45 successes.
Max # of annealing steps
integer > 0
100
Maximum number of annealing steps, after which program will always terminate.
After printing program version number and querying or printing all
parameters, GASBOR will display a message that Simulated
annealing procedure started and after each round of simulated
annealing at new temperature, it will print a report line:
As the water shell may be reasonably represented with the ratio of
number of residues/number of waters not exceeding 3, the program
may currently handle proteins with a total number of residues not
exceeding 6000 (i.e. total MM not exceeding ~700 kDa).
Speed
O(dummy atoms2)
A GASBOR run on lyzozyme (129 residues) on a
PIV-2.2 GHz machine required less than an hour of CPU
time using GASBORI and less than 20 min using
GASBORP. The CPU time grows quadratically with the
number of residues so that it may require long times on proteins
with high molecular mass.
For large proteins (>2000 aminoacids),
DAMMIF/DAMMIN
is recommended -- it will run much faster and give similar results. The
influence of the internal structure for large macromolecules
is less important and the shape approximation would do a good job.
Lysozyme has no symmetry, and 129 residues:
Enter P1 symmetry, 129 residues
and default answers to all other questions.
You may also use command line:
$ gasbori gnlyzfu.out 129
Here is resulting output:
*** Ab inito reconstruction of a protein structure ***
*** by a chain-like ensemble of dummy residues ***
*** Version 2.2i build 31.07.08 ***
*** Last modified --- 31/07/08 20:00 ***
*** Please reference: D.I.Svergun, M.V.Petoukhov & ***
*** M.H.J.Koch (2001) Biophys. J. 80, 2946-2953 ***
*** Copyright (c) ATSAS Team ***
*** EMBL, Hamburg Outstation, 2000 - 2008 ***
Type gasbori /help for batch mode use
=== GASBOR Version 2.2i build 31.07.08 started on 29-Sep-2009 13:37:34
Project identificator .................................. : gnlyzf
Enter project description .............. :
Random sequence initialized from ....................... : 133734
** Information read from the GNOM file **
Data set title: Angular axis n01000.sax Datafile n10000.sub
Raw data file name: lyzful.dat
Maximum diameter of the particle ....................... : 50.00
Solution at Alpha = 0.500E+00 Rg : 0.144E+02 I(0) : 0.526E+03
Radius of gyration ..................................... : 14.40
Number of GNOM data points ............................. : 230
Maximum s value [1/angstrom] ........................... : 1.316
Number of Shannon channels ............................. : 20.95
Number of knots in the curve to fit .................... : 42
Symmetry: P1...19 or Pn2 (n=1,..,12)
Number of equivalent positions ......................... : 1
Number of dummy waters ................................ : 90
Excluded volume per residue ............................ : 28.73
Radius of the search volume ............................ : 25.00
Histogram penalty weight ............................... : 1.000e-3
Bond length penalty weight ............................. : 1.000e-2
Discontiguity penalty weight ........................... : 1.000e-2
Peripheral penalty weight .............................. : 1.000
Expected particle shape: <P>rolate, <O>blate,
Contrast of the hydration layer ........................ : 3.000e-2
Computation of the initial intensity ...
Histogram penalty value ................................ : 36.62
Bond length penalty value .............................. : 1.930
Initial DRM # of graphs ................................ : 61
Discontiguity value .................................. : 1.196
Peripheral penalty value ............................... : 0.2645
Weight: 0-2 = s^2, 3-5 = s, 6 = log .................... : 2
*** Accounting for constant background ***
Initial scale factor ................................... : 1.409e-4
Constant background subtracted ......................... : -1.095
Initial R^2 factor ..................................... : 5.796e-2
Initial R factor ..................................... : 0.2408
Initial penalty ........................................ : 0.3324
Initial fVal ........................................... : 0.3904
R-factor fixing threshold .............................. : 0.0
Fixing threshold PenCha ................................ : 0.0
Fixing threshold PenLen ................................ : 0.0
Initial annealing temperature .......................... : 1.000e-3
Annealing schedule factor .............................. : 0.9000
# of independent atoms to modify ....................... : 1
Max # of iterations at each T .......................... : 55000
Max # of successes at each T ........................... : 5500
Min # of successes to continue ......................... : 55
Max # of annealing steps ............................... : 100
==== Simulated annealing procedure started ====
j: 1 T: 0.100E-02 Suc: 5500 Eva: 11556 CPU: 0.329E+01 SqF: 0.5116
Rf: 0.10745 His: 26.97 Bnd: 1.419 Dis:0.0807 Per :0.2082
...
j: 36 T: 0.250E-04 Suc: 55 Eva: 1425327 CPU: 0.378E+03 SqF: 0.0936
Rf: 0.02932 His: 6.93 Bnd: 0.067 Dis:0.0000 Per :0.4966
Final Chi against raw data ............................. : 0.9592
=== GASBOR Version 2.2i build 31.07.08 finished on 29-Sep-2009 13:43:55
Use in the batch mode:
gasbori <Inp_File> <Num_DRs> [/key1 <key1>]...
[/keyN <keyN>]
where the compulsory arguments Inp_File and Num_DRs
are the name of a GNOM output file (extension .out)
and the number of dummy residues in asymmetric part
The following program options can be given as keys
with their values (defaults are given in brackets)
/lo Log file name (same as the GNOM file name)
/sy Particle symmetry (P1)
/id Project description (command line content)
Transketolase is homodimer in solution, and each monomer has 680 residues,
giving a total of 1360 residues):
Enter P2 for symmetry, 680 for residues and
default answers to all other questions.
*** Ab inito reconstruction of a protein structure ***
*** by a chain-like ensemble of dummy residues ***
*** Version 2.2i build 21.06.06 ***
*** Last modified --- 21/06/06 12:00 ***
*** Please reference: D.I.Svergun, M.V.Petoukhov & ***
*** M.H.J.Koch (2001) Biophys. J. 80, 2946-2953 ***
*** Copyright (c) ATSAS Team ***
*** EMBL, Hamburg Outstation, 2000 - 2005 ***
Type gasbori /help for batch mode use
=== GASBOR Version 2.2i build 21.06.06 started on 06-Oct-2009 16:42:28
Computation mode (User or Expert) ...... < User >:
Log file name .......................... < .log >: log
Input data, GNOM output file name ...... < .out >: 1trk.out
Project identificator .................................. : log
Enter project description .............. : project
Random sequence initialized from ....................... : 164228
** Information read from the GNOM file **
Data set title: Transketolase collated from n85, o14+o16 6-11-98
Raw data file name: trkexp.dat
Maximum diameter of the particle ....................... : 12.00
Solution at Alpha = .164E+01 Rg : .336E+01 I(0) : .190E+03
Radius of gyration ..................................... : 3.360
Number of GNOM data points ............................. : 283
Angular units in the input file :
4*pi*sin(theta)/lambda [1/angstrom] (1)
4*pi*sin(theta)/lambda [1/nm ] (2) < 2 >: 2
Angular units multiplied by ............................ : 0.1000
Maximum diameter divided by ............................ : 0.1000
Maximum s value [1/angstrom] ........................... : 0.3418
Number of Shannon channels ............................. : 13.06
Portion of the curve to be fitted ...... < 1.000 >:
Number of knots in the curve to fit .................... : 26
Initial DRM (CR for random) ............ < .pdb >:
Symmetry: P1...19 or Pn2 (n=1,..,12)
or P23 or P432 or PICO ................. < P1 >: P2
Number of equivalent positions ......................... : 2
Number of residues in asymmetric part .. < 517 >: 680
Fibonacci grid order ................... < 15 >:
Number of dummy waters ................................ : 988
Excluded volume per residue ............................ : 28.73
Radius of the search volume ............................ : 60.00
Histogram penalty weight ............................... : 1.000e-3
Bond length penalty weight ............................. : 1.000e-2
Discontiguity penalty weight ........................... : 1.000e-2
Peripheral penalty weight .............................. : 1.000
Expected particle shape: <P>rolate, <O>blate,
or <U>nknown .......................... < Unknown >:
Contrast of the hydration layer ........................ : 3.000e-2
Computation of the initial intensity ...
Histogram penalty value ................................ : 37.38
Bond length penalty value .............................. : 1.604
Initial DRM # of graphs ................................ : 708
Discontiguity value .................................. : 2.191
Peripheral penalty value ............................... : 0.2647
Weight: 0-2 = s^2, 3-5 = s, 6 = log .................... : 2
*** Accounting for constant background ***
Initial scale factor ................................... : 5.042e-7
Constant background subtracted ......................... : 0.3339
Initial R^2 factor ..................................... : 3.837e-2
Initial R factor ..................................... : 0.1959
Initial penalty ........................................ : 0.3400
Initial fVal ........................................... : 0.3784
R-factor fixing threshold .............................. : 0.0
Fixing threshold PenCha ................................ : 0.0
Fixing threshold PenLen ................................ : 0.0
Initial annealing temperature .......................... : 1.000e-3
Annealing schedule factor .............................. : 0.9000
# of independent atoms to modify ....................... : 1
Max # of iterations at each T .......................... : 130000
Max # of successes at each T ........................... : 13000
Min # of successes to continue ......................... : 130
Max # of annealing steps ............................... : 100
==== Simulated annealing procedure started ====
j: 1 T: 0.100E-02 Suc: 13000 Eva: 14987 CPU: 0.142E+03 SqF: 0.5510
Rf: 0.11559 His: 36.01 Bnd: 1.958 Dis:0.4537 Per :0.2301
...
j: 56 T: 0.304E-05 Suc: 63 Eva: 3730743 CPU: 0.386E+05 SqF: 0.0789
Rf: 0.01835 His: 5.43 Bnd: 0.046 Dis:0.0000 Per :0.3158
Final Chi against raw data ............................. : 1.211
=== GASBOR Version 2.2i build 21.06.06 finished on 07-Oct-2009 03:25:52
Use in the batch mode:
gasbori <Inp_File> <Num_DRs> [/key1 <key1>]...
[/keyN <keyN>]
where the compulsory arguments Inp_File and Num_DRs
are the name of a GNOM output file (extension .out)
and the number of dummy residues in asymmetric part
The following program options can be given as keys
with their values (defaults are given in brackets)
/lo Log file name (same as the GNOM file name)
/sy Particle symmetry (P1)
/id Project description (command line content)