0
EMBL Hamburg Biological
Small Angle Scattering
BioSAXS
SASBDB

GASBOR manual

gasbor

Documentation written by M.Petoukhov, D.I.Svergun, and M.J.Gajda.
Post all your questions about GASBOR to the ATSAS Forum.

© ATSAS Team, 2000-2013

Table of Contents

Manual

Introduction

GASBOR is program for ab initio reconstruction of protein structure by a chain-like ensemble of dummy residues. It has been published along with the paper Determination of domain structure of proteins from X-ray solution scattering of D.I. Svergun, M.V. Petoukhov & M.H.J. Koch.

Algorithm description

The use of GASBOR is similar to that of DAMMIN or DAMMIF. Most of parameters have the same meaning. The most important difference is that the protein structure is represented not by dummy spheres on lattice (called dummy atoms in DAMMIN/DAMMIF, but not corresponding to real atoms), but rather by an ensemble of dummy residues (corresponding to average residue densities) placed anywhere in continuous space with a preferred number of close distance neighbours for each atom. The centers of these residues aim to approximate positions of the C-α atoms in the protein structure. The number of residues should be equal to that in the protein.

Note, however, that these residues are anonymous, in the sense that their ordinal numbers in the model has nothing to do with the numbering primary sequence of the protein!

Accordingly, the program does not subtract any Porod constant from the experimental data. In DAMMIN, it was recommended to discard high angle portions of the scattering patterns; in GASBOR, on the contrary, one should use them. The program is able to fit the data up to the resolution of 5 angstroms, i.e. momentum transfer s = 4π*sin(θ)/λ = 1.2 Å-1.

Running gasbor

Command-Line Arguments and Options

Major parameters can be specified from the command line (see below) to run the program in batch mode.
Usage:

$ gasbor <GNOMFILE> <Num_DRs> [OPTIONS]

where gasbor should be replaced by the full program name (e.g. gasborp).

GASBOR requires the following command line arguments:

ArgumentDescription
GNOMFILE A relative or absolute path to a GNOM output file.
Num_DRs Number of dummy residues in asymmetric part.

GASBOR recognizes the following command-line options.

OptionDescription
-lo <LOG_FILE> Prefix to prepend to output filenames. Default is the name of the GASBOR input file without extension.
-sy <SYMMETRY> Specify the point symmetry of the particle. Point groups P1, ..., P19, Pn2 (n = 2, ..., 12), P23, P432 or PICO (icosahedral) are supported. By default, no symmetry is enforced (P1).
-id <DESCRIPTION> Project description. By default, the command line content is used.
-an <ANISOMETRY> Particle anisometry: oblate (O), prolate (P) or unknown (default).
-dr <DIRECTION> Direction of anisometry, applicable with P2 symmetry only: along (L), across (C) or unknown (default).
-un <UNIT> Angular unit of the input file, either 1 [1/Angstrom] or 2 [1/nm]; undefined by default.
-h Print a summary of arguments and options and exit.

Interactive Configuration

GASBOR reads in output files of GNOM.

There are two versions of GASBOR, one performing the fit of the intensity in reciprocal space (GASBORI), and the other fitting the real space P(r) function (GASBORP). The algorithms of the two versions are similar. The reciprocal space version is slower but usually yields better fits to the experimental data. The real space version is much faster, and should be used when number of dummy residues makes runtime excessive (as runtime is proportional to square of number of dummy residues.)

In addition, reciprocal space version is also available in implementation accounting for oligomeric equilibrium (GASBORMX). In this case, ab initio model of symmetric oligomer is built while assuming some fraction of monomers in solution (i.e. polydisperse sample).

After starting GASBOR one may specify:

Prompt Possible value(s) Default value Description
Computation mode User or Expert User After choosing Expert GASBOR will let you configure additional expert mode parameters.
In User and Expert mode
Project identificator lyz or any other legal filename prefix Filename for log (here: lyz.log) and other output files.
Enter project description any text: Lyzozyme at 3 mg/ml Description that will be put into log file.
Total number of curves to fit 1 < integer < 10 1 The question is only asked by GASBORMX which may fit a concentration series of oligomeric equilibrium.
Input data, GNOM output file name filename, like: lyz.out Input file with valid GNOM output. If GASBOR doesn't accept the file, then check that GNOM run has been finished and P(r) function written to the file. This question is asked for each curve, i.e. the number of times equals to the total number of curves to fit by GASBORMX.
Angular units in the input file 1 1 means that data unit is Å-1
2 means that data unit is nm-1.
This question is asked for each curve, i.e. the number of times equals to the total number of curves to fit by GASBORMX.
Portion of the curve to be fitted 0.001-1.0 1.0 for entire curve Whether curve should be fitted in entirety, or just a part of it. This question is asked for each curve, i.e. the number of times equals to the total number of curves to fit by GASBORMX.
Volume fraction of monomer (if known) -1.0; (0.0, 1.0) -1.0 if unknown If a positive number (below 1.0) is given, such volume fraction of the monomer is kept fixed in the course of modelling. This question is only asked by GASBORMX.
Initial DRM filename, like: gasbor.pdb none Enter, if you want to start with a model from previous GASBOR run. Otherwise just press CR.
Symmetry: P1...19 or Pn2 (n=1,..,12)
or P23 or P432 or PICO
P1...P19 or P12...P122 or P23 or P432 or PICO P2 for GASBORMX, otherwise P1 Particle symmetry to be enforced. Number of residues given further refers to a single asymmetric unit (monomer).
Number of residues in asymmetric part integer > 0 none Number of residues within a single asymmetric unit.
Fibonacci grid order 0...18 order that gives number of waters close to the number of dummy residues Order of the Fibonacci grid to generate dummy waters.
Expected particle shape: <P>rolate, <O>blate, or <U>nknown P, O or U U Constrains particle shape, if it is known to be significantly non-globular (non-spherical). Gives more accurate results in this case.
In Expert mode only:
Number of knots in the curve to fit 11...201 42 Regularized intensity is recomputed to have so many points for fitting.
Radius of the search volume positive real number Dmax/2 Radius of the volume in which dummy atoms will be placed. Limits the sampling space.
Histogram penalty weight positive real number 1.000e-3 Weight of the penalty when histogram of interresidue distances looks different from expected for a protein.
Bond length penalty weight positive real number 1.000e-2 Penalty for the bond lengths other than 3.8 Å.
Discontiguity penalty weight positive real number 1.000e-2 Penalty for disconnected dummy residues.
Peripheral penalty weight positive real number 1.0 Penalty term that ensures compact arrangement of DRs at the beginning. The weight is gradually reduced in the course of simulated annealing.
Contrast of the hydration layer positive real number 3.000e-2 Contrast of the hydration layer relative to the solvent
Sequence file name any filename: lyzozyme.seq none Filename with protein sequence to compute the sequence specific dummy residue form factors. Besides other limitations, lines in this file must not exceed 256 characters.
Weight: 0 2 Weight I(s) fit according to s2
1 as above, with constant for s<MaxPor
2 as above, with average for s<MaxPor
3 weight I(s) proportionaly to s
4 as above, with constant for s<MaxI*s
5 as above, with average for s<MaxI*s
6 compute fit in logarithmic scale
Account for constant background Yes or No Yes Whether constant background should be subtracted when fitting.
Initial scale factor positive real number depends on input Initial scaling factor for fitting experimental data.
Fixing threshold for Rf 0.0 obsolete
Fixing threshold for PenCha 0.0 obsolete
Fixing threshold for PenLen 0.0 obsolete
Initial annealing temperature positive real number 1.000e-3 Initial temperature for annealing process defines probability of jumping into state of higher pseudo-energy (worse score) on each Monte-Carlo step.
Annealing schedule factor positive real number<1.0 0.9000 Temperature will be multiplied by this factor after each round of simulated annealing to decrease it.
# of independent atoms to modify integer > 0 1 Number of atoms to reposition on each annealing step.
Max # of iterations at each T integer ≥ 0 45000 Each round of simulated annealing will terminate after at most so many iterations, and temperature will be decreased.
Max # of successes at each T integer > 0 4500 Each round of simulated annealing will terminate prematurely after so many successful iterations, and temperature will be decreased.
Min # of successes to continue integer > 1 45 Program will terminate after a round of simulated annealing gives less than 45 successes.
Max # of annealing steps integer > 0 100 Maximum number of annealing steps, after which program will always terminate.

Runtime Output

After printing program version number and querying or printing all parameters, GASBOR will display a message that Simulated annealing procedure started and after each round of simulated annealing at new temperature, it will print a report line:

 j:   1 T: 0.100E-02 Suc:  5500 Eva:    11544 CPU:  0.427E+02 SqF: 0.5172
  Rf: 0.08396 His: 26.20 Bnd: 1.302 Dis:0.1593 Per :0.2196
Report header Columns Description
j: 4-7 Iteration number.
T: 11-20 Temperature of iteration.
Suc: 27-31 Number of successes at given iteration.
Eva: 38-45 Total number of function evaluations until end of this iteration.
CPU: 52-61 Total CPU time since beginning of run until end of this iteration.
SqF: 68-73 square root of the target function at the end of iteration
Rf: 6-13 R-factor penalty at the end of iteration
His: 19-24 Histogram penalty at the end of iteration
Bnd: 30-35 Bond angle penalty at the end of iteration
Dis: 41-45 Discontiguity penalty at the end of iteration
Per : 53-58 Peripheral penalty at the end of iteration

After run is completed, final χ2 against data is printed to the output.

gasbor Input Files

The only input file is GNOM output containing both regularized scattering curve and P(r) (for real-space GASBOR.)

gasbor Output Files

After the program is finished, you will get the files:

Filename Description
<name>.log log file
<name>.fit fit to the desmeared and smoothed by GNOM data (GASBORI)
<name>-i.fit fits to the desmeared and smoothed by GNOM data (GASBORMX), where i runs from 1 to the total number of curves to fit.
<name>.hst fit to the GNOM data in real space (GASBORP)
<name>.fir fit to the raw experimental data
<name>-i.fir fits to the raw experimental data (in GASBORMX), where i runs from 1 to the total number of curves to fit.
<name>.pdb resulting model in PDB-like format that can be viewed e.g. with RasMol in the spacefill mode or with MASSHA.

PDB output

PDB-alike output file from GASBOR contains:

AtomsMeaning
C-α atoms (code CA) positions of dummy residues
H atoms positions of dummy bound waters

Limitations

Problem Limit Details
Maximum number of dummy residues and waters dummy atoms< 8000 As the water shell may be reasonably represented with the ratio of number of residues/number of waters not exceeding 3, the program may currently handle proteins with a total number of residues not exceeding 6000 (i.e. total MM not exceeding ~700 kDa).
Speed O(dummy atoms2) A GASBOR run on lyzozyme (129 residues) on a PIV-2.2 GHz machine required less than an hour of CPU time using GASBORI and less than 20 min using GASBORP. The CPU time grows quadratically with the number of residues so that it may require long times on proteins with high molecular mass.

For large proteins (>2000 aminoacids), DAMMIF/DAMMIN is recommended -- it will run much faster and give similar results. The influence of the internal structure for large macromolecules is less important and the shape approximation would do a good job.

Examples

Lysozyme

Lysozyme has no symmetry, and 129 residues:
Enter P1 symmetry, 129 residues and default answers to all other questions.

You may also use command line:

$ gasbori gnlyzfu.out 129

Here is resulting output:


  ***  Ab inito reconstruction of a protein structure    ***
  ***   by a chain-like ensemble of dummy residues       ***
  ***        Version 2.2i build 31.07.08                 ***
  ***      Last modified        ---   31/07/08 20:00     ***
  ***  Please reference: D.I.Svergun, M.V.Petoukhov &    ***
  ***   M.H.J.Koch (2001) Biophys. J. 80, 2946-2953      ***
  ***   Copyright (c) ATSAS Team                         ***
  ***   EMBL, Hamburg Outstation, 2000 - 2008            ***

   Type gasbori /help for batch mode use

  === GASBOR Version 2.2i build 31.07.08 started on   29-Sep-2009   13:37:34

 Project identificator .................................. : gnlyzf
 Enter project description .............. :
 Random sequence initialized from ....................... : 133734
  ** Information read from the GNOM file **
 Data set title:    Angular axis n01000.sax             Datafile n10000.sub
 Raw data file name:  lyzful.dat
 Maximum diameter of the particle ....................... : 50.00
  Solution at Alpha =  0.500E+00   Rg :  0.144E+02   I(0) :   0.526E+03
 Radius of gyration ..................................... : 14.40
 Number of GNOM data points ............................. : 230
 Maximum s value [1/angstrom] ........................... : 1.316
 Number of Shannon channels ............................. : 20.95
 Number of knots in the curve to fit .................... : 42
 Symmetry: P1...19 or Pn2 (n=1,..,12)
 Number of equivalent positions ......................... : 1
  Number of dummy waters ................................ : 90
 Excluded volume per residue ............................ : 28.73
 Radius of the search volume ............................ : 25.00
 Histogram penalty weight ............................... : 1.000e-3
 Bond length penalty weight ............................. : 1.000e-2
 Discontiguity penalty weight ........................... : 1.000e-2
 Peripheral penalty weight .............................. : 1.000
 Expected particle shape: <P>rolate, <O>blate,
 Contrast of the hydration layer ........................ : 3.000e-2
  Computation of the initial intensity ...
 Histogram penalty value ................................ : 36.62
 Bond length penalty value .............................. : 1.930
 Initial DRM # of graphs ................................ : 61
 Discontiguity   value .................................. : 1.196
 Peripheral penalty value ............................... : 0.2645
 Weight: 0-2 = s^2, 3-5 = s, 6 = log .................... : 2
 *** Accounting for constant background ***
 Initial scale factor ................................... : 1.409e-4
 Constant background subtracted ......................... : -1.095
 Initial R^2 factor ..................................... : 5.796e-2
 Initial R   factor ..................................... : 0.2408
 Initial penalty ........................................ : 0.3324
 Initial fVal ........................................... : 0.3904
 R-factor fixing threshold .............................. : 0.0
 Fixing threshold PenCha ................................ : 0.0
 Fixing threshold PenLen ................................ : 0.0
 Initial annealing temperature .......................... : 1.000e-3
 Annealing schedule factor .............................. : 0.9000
 # of independent atoms to modify ....................... : 1
 Max # of iterations at each T .......................... : 55000
 Max # of successes at each T ........................... : 5500
 Min # of successes to continue ......................... : 55
 Max # of annealing steps ............................... : 100
  ====  Simulated annealing procedure started  ====
 j:   1 T: 0.100E-02 Suc:  5500 Eva:    11556 CPU:  0.329E+01 SqF: 0.5116
  Rf: 0.10745 His: 26.97 Bnd: 1.419 Dis:0.0807 Per :0.2082
...
 j:  36 T: 0.250E-04 Suc:    55 Eva:  1425327 CPU:  0.378E+03 SqF: 0.0936
  Rf: 0.02932 His:  6.93 Bnd: 0.067 Dis:0.0000 Per :0.4966

 Final Chi against raw data ............................. : 0.9592

   === GASBOR Version 2.2i build 31.07.08 finished on   29-Sep-2009   13:43:55

  Use in the batch mode:
  gasbori <Inp_File> <Num_DRs> [/key1 <key1>]...
  [/keyN <keyN>]
  where the compulsory arguments Inp_File and Num_DRs
  are the name of a GNOM output file (extension .out)
  and the number of dummy residues in asymmetric part

  The following program options can be given as keys
  with their values (defaults are given in brackets)
  /lo   Log file name   (same as the GNOM file name)
  /sy   Particle symmetry                       (P1)
  /id   Project description   (command line content)

Transketolase

Transketolase is homodimer in solution, and each monomer has 680 residues, giving a total of 1360 residues):
Enter P2 for symmetry, 680 for residues and default answers to all other questions.


  ***  Ab inito reconstruction of a protein structure    ***
  ***   by a chain-like ensemble of dummy residues       ***
  ***        Version 2.2i build 21.06.06                 ***
  ***      Last modified        ---   21/06/06 12:00     ***
  ***  Please reference: D.I.Svergun, M.V.Petoukhov &    ***
  ***   M.H.J.Koch (2001) Biophys. J. 80, 2946-2953      ***
  ***   Copyright (c) ATSAS Team                         ***
  ***   EMBL, Hamburg Outstation, 2000 - 2005            ***

   Type gasbori /help for batch mode use

  === GASBOR Version 2.2i build 21.06.06 started on   06-Oct-2009   16:42:28

 Computation mode (User or Expert) ...... <         User >:
 Log file name .......................... <         .log >: log
 Input data, GNOM output file name ...... <         .out >: 1trk.out
 Project identificator .................................. : log
 Enter project description .............. : project
 Random sequence initialized from ....................... : 164228
  ** Information read from the GNOM file **
 Data set title:    Transketolase collated from n85, o14+o16   6-11-98
 Raw data file name:  trkexp.dat
 Maximum diameter of the particle ....................... : 12.00
  Solution at Alpha =   .164E+01   Rg :   .336E+01   I(0) :    .190E+03
 Radius of gyration ..................................... : 3.360
 Number of GNOM data points ............................. : 283
 Angular units in the input file :
 4*pi*sin(theta)/lambda [1/angstrom] (1)
 4*pi*sin(theta)/lambda [1/nm      ] (2)  <            2 >: 2
 Angular units multiplied by ............................ : 0.1000
 Maximum diameter divided by ............................ : 0.1000
 Maximum s value [1/angstrom] ........................... : 0.3418
 Number of Shannon channels ............................. : 13.06
 Portion of the curve to be fitted ...... <        1.000 >:
 Number of knots in the curve to fit .................... : 26
 Initial DRM (CR for random) ............ <         .pdb >:
 Symmetry: P1...19 or Pn2 (n=1,..,12)
 or P23 or P432 or PICO ................. <           P1 >: P2
 Number of equivalent positions ......................... : 2
 Number of residues in asymmetric part .. <          517 >: 680
 Fibonacci grid order ................... <           15 >:
 Number of dummy waters ................................ : 988
 Excluded volume per residue ............................ : 28.73
 Radius of the search volume ............................ : 60.00
 Histogram penalty weight ............................... : 1.000e-3
 Bond length penalty weight ............................. : 1.000e-2
 Discontiguity penalty weight ........................... : 1.000e-2
 Peripheral penalty weight .............................. : 1.000
 Expected particle shape: <P>rolate, <O>blate,
  or <U>nknown .......................... <      Unknown >:
 Contrast of the hydration layer ........................ : 3.000e-2
  Computation of the initial intensity ...
 Histogram penalty value ................................ : 37.38
 Bond length penalty value .............................. : 1.604
 Initial DRM # of graphs ................................ : 708
 Discontiguity   value .................................. : 2.191
 Peripheral penalty value ............................... : 0.2647
 Weight: 0-2 = s^2, 3-5 = s, 6 = log .................... : 2
 *** Accounting for constant background ***
 Initial scale factor ................................... : 5.042e-7
 Constant background subtracted ......................... : 0.3339
 Initial R^2 factor ..................................... : 3.837e-2
 Initial R   factor ..................................... : 0.1959
 Initial penalty ........................................ : 0.3400
 Initial fVal ........................................... : 0.3784
 R-factor fixing threshold .............................. : 0.0
 Fixing threshold PenCha ................................ : 0.0
 Fixing threshold PenLen ................................ : 0.0
 Initial annealing temperature .......................... : 1.000e-3
 Annealing schedule factor .............................. : 0.9000
 # of independent atoms to modify ....................... : 1
 Max # of iterations at each T .......................... : 130000
 Max # of successes at each T ........................... : 13000
 Min # of successes to continue ......................... : 130
 Max # of annealing steps ............................... : 100
  ====  Simulated annealing procedure started  ====
 j:   1 T: 0.100E-02 Suc: 13000 Eva:    14987 CPU:  0.142E+03 SqF: 0.5510
  Rf: 0.11559 His: 36.01 Bnd: 1.958 Dis:0.4537 Per :0.2301
...
 j:  56 T: 0.304E-05 Suc:    63 Eva:  3730743 CPU:  0.386E+05 SqF: 0.0789
  Rf: 0.01835 His:  5.43 Bnd: 0.046 Dis:0.0000 Per :0.3158

 Final Chi against raw data ............................. : 1.211

   === GASBOR Version 2.2i build 21.06.06 finished on   07-Oct-2009   03:25:52

  Use in the batch mode:
  gasbori <Inp_File> <Num_DRs> [/key1 <key1>]...
  [/keyN <keyN>]
  where the compulsory arguments Inp_File and Num_DRs
  are the name of a GNOM output file (extension .out)
  and the number of dummy residues in asymmetric part

  The following program options can be given as keys
  with their values (defaults are given in brackets)
  /lo   Log file name   (same as the GNOM file name)
  /sy   Particle symmetry                       (P1)
  /id   Project description   (command line content)

References

  1. Svergun, D.I., Petoukhov, M.V. & Koch, M.H.J. (2001) Determination of domain structure of proteins from X-ray solution scattering. Biophys. J. 80, 2946-2953.
  2. Petoukhov, M.V., Franke, D., Shkumatov, A.V., Tria, G., Kikhney, A.G., Gajda, M., Gorba, C., Mertens, H.D.T., Konarev, P.V. and Svergun, D.I. (2012) New developments in the ATSAS program package for small-angle scattering data analysis. J. Appl. Cryst. 45, 342-350 © International Union of Crystallography DOI

  Last modified: April 11, 2013

© BioSAXS group 2013