0
EMBL Hamburg Biological
Small Angle Scattering
BioSAXS
 
  Home  >  ATSAS software  >  BUNCH  >  Manual

BUNCH manual

bunch

Written by M.V. Petoukhov and D.I. Svergun.
Post all your questions about BUNCH to the ATSAS Forum.

© ATSAS Team, 2003-2009

Table of Contents

Manual

The following describes the method implemented in BUNCH, details of the dialog prompt as well as the required input and the produced output files.

BUNCH implements the algorithm described by Petoukhov, M.V. & Svergun, D.I. (2005) "Global rigid body modeling of macromolecular complexes against small-angle scattering data." Biophys J. 89, 1237-1250.

Introduction

BUNCH performs modeling of multidomain proteins against SAXS data using a combined rigid body and ab initio modeling approach. The program allows determination of three-dimensional domain structure of proteins based on multiple scattering data sets from deletion mutants when the structure(s) of individual domains are available.

A simulated annealing protocol is employed to find the optimal positions and orientations of available high resolution models of domains and the probable conformations of the dummy residues chains attached to the appropriate terminal residues of domains to simultaneously fit the experimental scattering data from all constructs. Domains/loops arrangement with steric clashes, DR loops with improper distribution of bond or dihedral angles as well as too extended loops are penalized.

The theoretical scattering patterns I(s) are calculated from the available high resolution coordinates of the domains with known structure and from the portion with unknown structure represented as dummy residues using spherical harmonics. The partial scattering amplitudes Alm(s) of the domains in arbitrary positions and orientations depend on their scattering amplitudes in the reference positions and on three rotational and three translational parameters. The reference partial scattering amplitudes are precomputed by the program CRYSOL. The partial amplitudes of dummy residues comprising the unknown parts are calculated using the form factor of a dummy residue.

Eventual symmetry (must be the same for all deletion mutants) can be taken into account, whereby BUNCH searches for the configuration of asymmetic part and generates the rest according to the symmetry rules.

Please refer to the paper cited above for further details about the implemented algorithm.

Running BUNCH

Interactive Configuration

BUNCH can only be run in the dialog mode, no command line arguments are accepted. An additional preparatory step has to be performed before running BUNCH. There are two modes, EXPERT and USER. In the former mode, the user have the options to adjust any program parameters. In the latter mode, fewer questions are asked as the default values are used for the most of program parameters, the user only needs to provide basic input. The default settings are the same in both modes.

BUNCH interactive prompt:

Screen TextDefaultAsked in USER-mode? Description
Computation mode (User or Expert) USER Y Mode selection.
Log file name N/A Y Project identifier, will be used as a prefix for all output file names.
Enter project description N/A Y Any text that will be stored in the log file.
Initial structure N/A Y File name with the input file to BUNCH previously generated with pre_Bunch.
DR formfactor multiplier 1.0 N The weight of the DR formfactors may be adjusted. For instance, an increased value (~1.2) would allow to account for an extra hydration if it is known that the loops are exposed to the solvent. Negative value will mean that the individual (primary sequence-based) form factors will be used instead of averaged one.
Symmetry: P1...19 or Pn2 (n=1,..,12) P1 Y Supported symmetries are: P1, P2-P19 (nineteen-fold), P222, P32-P(12)2. The n-fold axis is typically Z, if there is in addition a two-fold axis it coincides with Y.
Angles penalty weight 10.0 N How much the Bond Angles Penalty shall influence the acceptance or rejection of mutation. A value of 0.0 disables the penalty. If unsure, use the default value. If the resulting bond angles do not look good try increasing this penalty weight.
Dihedrals penalty weight 1.0 N How much the Dihedral Angles Penalty shall influence the acceptance or rejection of mutation. A value of 0.0 disables the penalty. If unsure, use the default value. If the resulting dihedral angles do not look good try increasing this penalty weight.
Cross penalty weight 100.0 N How much the Cross Penalty shall influence the acceptance or rejection of mutation. A value of 0.0 disables the penalty. If unsure, use the default value. If clashes between the loops or domains are observed, try increasing this penalty weight.
Extended loops penalty weight 1.0 N This weight governs the penalty responsible for "moderate" Rg values of the missing portions. Increase this weight if they are known to make folded domains. Decrease or switch off the penalty if the loops are known to be extended/disordered.
Distances penalty weight 10.0 N This weight governs the penalty that ensures that the histogram of the distances between the closest 20 DRs along the chain is compatible with the averaged distributuion of 20 successive CA atoms in the backbones of disordered loops.
Shift penalty weight 1.0 N How much shift from the origin of the entire protein shall influence the acceptance or rejection of mutation. A value of 0.0 disables the penalty. If unsure, use the default value. This penalty is necessary to keep the model close to the origin so that the higher order harmonics are not lost and the scattering is computed accurately.
File name, contact conditions, CR for none <.cnd > empty Y If the information on contacting residues is available it may be used as a modeling restraint. The information is provided in a file with special format. By default no information is given.
Contacts penalty weight 10.0 N How much improper contacts shall influence the acceptance or rejection of mutation. If unsure, use the default value. If desired interfaces are not obtained, try increasing this penalty weight. This question is only asked if the contact conditions file is provided.
Input total number of scattering curves 1 Y If in addition to the entire multidomain protein, the scattering curves of its partial constructs (deletion mutants) are available, they can be fitted simultaneously assuming the same arrangement of domains in all the constructs.
Use Kratky Geometry. N N If the answer is Yes, the computed curves will be smeared to fit the data from Kratky camera.
Input first & last residues in 1-st construct 1,var Y The residues range present in the given construct (scattering curve). This question is asked for each construct, i.e. the number of times equals to the total number of scattering curves (answer to the previous question). The default answer is from 1 to the last residue in the full-length protein.
Enter file name, 1-st experimental data <.dat > N/A Y The name of the data file containing the experimental SAXS profile of a certain construct. The question is asked for each construct.
Angular units in the input file :
4*pi*sin(theta)/lambda [1/angstrom] (1)
4*pi*sin(theta)/lambda [1/nm ] (2)
2* sin(theta)/lambda [1/angstrom] (3)
2* sin(theta)/lambda [1/nm ] (4)
1 Y Formula for the scattering vector in the data file and its units. The question is asked for each construct.
Fitting range in fractions of Smax 1.0 Y Percentage of the scattering curve to fit, starting at the first point. Default is the entire curve. The question is asked for each construct.
Amplitudes, 1-st subunit <.alm > N/A Y The name of the file with partial scattering amplitudes of a certain domain computed by CRYSOL. This question is asked for each domain, i.e. the number of times equals to the total number of domains.
Fix the subunit at this position? [Y/N] N Y The fixation option may be used to keep the desired relative arrangement of certain domains, e.g. to keep the known dimerization interface. This question is asked for each domain.
Angular step in degrees 20.0 Y Maximal random rotation angle of a chain portion at a single modification of the system in the course of simulated annealing.
Initial annealing temperature 1.0 N Starting temperature of simulated annealing protocol.
Annealing schedule factor 0.9 N Factor by which the temperature is decreased at each step; 0.9 is a good average value. If slower cooling is wanted increase the value (e.g. to 0.95).
Max # of iterations at each T var N Finalize temperature step and cool after this many iterations at the latest. The default value is MAX(5000* number of unfixed domains,50*number_of_amino_acids).
Max # of successes at each T var N Finalize temperature step and cool after at most this many successful mutations. The default value is MAX(500* number of unfixed domains,5*number_of_amino_acids).
Min # of successes to continue var N Stop simulated annealing if not at least this many successful mutations within a single temperature step can be done.The default value is MAX(50*number of unfixed domains, 0.5*number_of_amino_acids,100).
Max # of annealing steps 100 N Stop if simulated annealing is not finished after this many steps. The slower the system is cooled, the more temperature steps are required.

Runtime Output

On runtime, two lines of output will be generated for each temperature step:

 j:   1 T: 0.100E+01 Suc:  1000 Eva:     2711 CPU:  0.503E+02 F:30.8120 Pen: 28.0621
 The best chi values: 1.65827

The fields can be interpreted as follows, top-left to bottom-right:

FieldDescription
j Step number. Starts at 1, increases monotonically.
T Temperature measure, starts at an arbitrary high value, descreases each step by the annealing schedule factor.
Suc Number of successful mutations in this temperature step. Limited by the minimum and maximum number of successes. The number of successes should slowly decrease, the first couple of steps should be terminated by the maximum number of successes criterion. If instead the maximum number of iterations are done, or the number of successes drops suddenly by a large amount, the system should probably be cooled more slowly.
Eva Accumulated number of function evaluations.
CPU Elapsed wall-clock time since the annealing procedure was started.
F The best target function value obtained so far.
Pen Accumulated penalty value of the best target function.
The best chi values For each curve out of total number of curves, the χ value of the best target function is given.

BUNCH Input Files

BUNCH uses the SAXS experimental data files (*.dat) in ASCII format containing 3 columns: (1) experimental scattering vector, (2) experimental intensity and (3) experimental errors; binary files with partial scattering amplitudes computed by CRYSOL.

Starting model

Initial approximation is made by a tool called pre_bunch which generates a PDB file containing a single CA-chain (even if there are several symmetry related polypeptide chains) with the length equal to the full-length sequence. The gap between the domains in the case of a two-domain protein must be at least four amino acids.

pre_bunch prompt:

Screen TextDefaultDescription
Input sequence file nameN/AFile name with one-letter full length sequence of a single (monomeric) chain of the multidomain protein. Lines in this file must not exceed 256 characters.
Number of domainsN/A Number of domains (separate PDB files with rigid bodies) in a monomer.
Input pdb file nameN/AThe question is asked successively for each domain. NOTE that the sequence as it appears in the PDB file MUST match exactly the corresponding piece of the input sequence.
Shift the structure to the origin? [Y/N]Y For monomers (P1) it is recommended to start from the model centered at the origin. If however, a symmetry is present and furthermore multimerization domains are planned to be fixed the shift should not be applied.
Output pdb file nameN/A This file will be used as the input starting model for BUNCH.

Contacts

An optional contact conditions file has a format similar to that of SASREF with the only difference that it refers to the chains instead of subunits(domains). The following conditions require the distance of 7 Å between the residues 25 and 115 from the same chain and the distance of 5 Å between the residues 40 from two symmetry related chains.

      dist 7.0
      1 25 25 1 115 115
      dist 5.0
      1 40 40 2 40 40

If two (or more) alternatives are given after the line with the keyword "dist", the program compares the better (smaller) distance among them with the specified one.

Important: here, residue number is the ordnial number of CA atom in the PDB file, i.e. in the following file, Pro32 will have residue number equal to 2.

ATOM 1  N   GLY A 31 -6.047 33.786  1.442
ATOM 2  CA  GLY A 31 -5.711 33.334  0.066
ATOM 3  C   GLY A 31 -4.332 32.718  0.000
ATOM 4  O   GLY A 31 -3.676 32.483  0.995
ATOM 5  N   PRO A 32 -3.874 32.485 -1.215
ATOM 6  CA  PRO A 32 -2.562 31.874 -1.416
ATOM 7  C   PRO A 32 -1.444 32.754 -0.866
ATOM 8  O   PRO A 32 -1.566 33.990 -0.808
ATOM 9  CB  PRO A 32 -2.464 31.760 -2.936
ATOM 10 CG  PRO A 32 -3.446 32.698 -3.473
ATOM 11 CD  PRO A 32 -4.564 32.799 -2.483
ATOM 12 N   LEU A 33 -0.348 32.111 -0.506
ATOM 13 CA  LEU A 33  0.834 32.815 -0.070
ATOM 14 C   LEU A 33  1.392 33.614 -1.230
ATOM 15 O   LEU A 33  1.470 33.154 -2.364
ATOM 16 CB  LEU A 33  1.900 31.869  0.390
ATOM 17 CG  LEU A 33  1.537 31.036  1.611
ATOM 18 CD1 LEU A 33  2.576 29.958  1.797
ATOM 19 CD2 LEU A 33  1.490 31.984  2.815

BUNCH Output Files

After each simulated annealing step, BUNCH creates a set of output files, each filename starts with a customizable prefix that gets an extension appended. If a prefix has been used before, existing files will be overwritten without further note.

ExtensionDescription
.log Contains the same information as the screen output and is updated during execution of the program.
.pdb Current model of the entire multidomain protein. The REMARK section of the file contains information about the application used and about the parameters of the model, e.g. penalties and χ.
-i.fit Fit of the scattering curve computed from a construct versus the corresponding experimental data. i stands for the construct number. Columns in the output file are: 's', 'Iexp' and 'Icomp'.

Examples

Dimeric GST-DHFR fusion protein with two-fold symmetry axis is employed for sample runs. Atomic models are available for both domains. The PDB file with GST monomer is positioned and oriented so that the correct dimer is obtained by its rotation by 180 ° about the Z-axis. In Example 1 BUNCH is run with the fixation option to keep the GST dimer intact. In Example 2 proper interface between the GST monomers is ensured by the use of contact conditions. Additional information on the contacts (Asp77 with Pro86 and Met69 with Gly97) is given in the file contacts.cnd

The files 1gtaz100.alm and 1ra900.alm containing scattering amplitudes of GST and DHFR monomers are computed using CRYSOL.

A listing of questions / answers of pre_bunch is as follows:

$ pre_bunch
Input sequence file name ............... < .seq >: gst-dhfr
Number of residues read ................................ : 387
Number of domains ...................... < 0 >: 2
Input pdb file name .................... < .pdb >: 1gtaz1
Input pdb file name .................... < .pdb >: 1ra9
Shift the structure to the origin ? [ Y / N ] < Yes >: N
Output pdb file name ................... < .pdb >: gst-dhfr_ini

USER mode with fixation

A listing of questions/answers for a sample run in the USER mode using the fixation option:

$ bunch
 Computation mode (User or Expert) ...... <         User >:
 Log file name .......................... <         .log >: gstdh1
 Project identificator .................................. : gstdh1
 Enter project description .............. : Gst-Dhfr with fixation of Gst
 Random sequence initialized from ....................... : 122227
 Initial structure ...................... <         .pdb >: Gst-Dhfr_ini
  LOADAM --W- : rAtom not assigned
 Number of atoms read ................................... : 387
 Center of the initial structure :    -0.9539   7.8390   5.4511
 Maximum radius ......................................... : 43.23
Averaged formfactors of DRs used
 DR formfactor multiplier ............................... : 1.000
 Symmetry: P1...19 or Pn2 (n=1,..,12) ... <           P1 >: p2
 Angles penalty ......................................... : 21.29
 Dihedrals penalty ...................................... : 0.9778
 Angles penalty weight .................................. : 10.00
 Dihedrals penalty weight ............................... : 1.000
 Cross penalty .......................................... : 0.4853
 Cross penalty weight ................................... : 100.0
 Extended loops penalty ................................. : 0.0
 Extended loops penalty weight .......................... : 1.000
 Distances penalty ...................................... : 0.6956
 Distances penalty weight ............................... : 10.00
 Shift penalty .......................................... : 0.2543
 Shift penalty weight ................................... : 1.000
 File name, contacts conditions, CR for none <         .cnd >:
 Input total number of scattering curves  <            1 >: 1
 Input first & last residues in 1-st construct <            1,         387 >:
 Enter file name, 2-nd experimental data  <         .dat >: Gst-D_med
 Number of experimental points found .................... : 263
 Angular units in the input file :
 4*pi*sin(theta)/lambda [1/angstrom] (1)
 4*pi*sin(theta)/lambda [1/nm      ] (2)
 2*   sin(theta)/lambda [1/angstrom] (3)
 2*   sin(theta)/lambda [1/nm      ] (4)  <            1 >:
 Fitting range in fractions of Smax ..... <        1.000 >:
 Experimental radius of gyration ........................ : 41.44
 Number of points in the Guinier Plot ................... : 15
 Amplitudes, 1-st subunit ............... <         .alm >: 1gtaz100
 Maximum order of harmonics ............................. : 15
 Number of points in partial amplitudes ................. : 51
 Current subunit:   1786 atoms read, center at   -12.40    0.00    0.00
 Residues in the full-length protein ........ : 1    -      218
 Fix the subunit at this position? [ Y / N ] <           No >: Y
 ALMGRZ --- :  110976 summation coefficients used
 Amplitudes, 2-nd subunit ............... <         .alm >: 1ra900
 Current subunit:   1299 atoms read, center at    14.93   18.33   12.16
 Residues in the full-length protein ........ : 229    -    387
 Fix the subunit at this position? [ Y / N ] <           No >:
 Total penalty .......................................... : 269.6
 1-st curve:
 NEXP reduced to ........................................ : 257
 Theoretical  points from            2  to           51  used
 The best chi values: 4.76847
 Initial fVal ........................................... : 292.4
 Angular step in degrees ................ <        20.00 >:
 Initial annealing temperature .......................... : 1.000
 Annealing schedule factor .............................. : 0.9000
 Max # of iterations at each T .......................... : 10000
 Max # of successes at each T ........................... : 1000
 Min # of successes to continue ......................... : 100
 Max # of annealing steps ............................... : 100
  ====  Simulated annealing procedure started  ====
 j:   1 T: 0.100E+01 Suc:  1000 Eva:     2711 CPU:  0.503E+02 F:30.8120 Pen: 28.0621
 The best chi values: 1.65827
 j:   2 T: 0.900E+00 Suc:  1000 Eva:     5545 CPU:  0.103E+03 F:28.9356 Pen: 28.1204
 The best chi values: 0.90284
 j:   3 T: 0.810E+00 Suc:  1000 Eva:     8360 CPU:  0.157E+03 F:28.2545 Pen: 26.8202
 The best chi values: 1.19763
 ...

USER mode with contacts

A listing of questions/answers for a sample run in the USER mode using the contact conditions:

$ bunch
 Computation mode (User or Expert) ...... <         User >:
 Log file name .......................... <         .log >: gstdh2
 Project identificator .................................. : gstdh2
 Enter project description .............. : Gst-Dhfr with contacts
 Random sequence initialized from ....................... : 123503
 Initial structure ...................... <         .pdb >: Gst-Dhfr_ini
  LOADAM --W- : rAtom not assigned
 Number of atoms read ................................... : 387
 Center of the initial structure :    -0.9539   7.8390   5.4511
 Maximum radius ......................................... : 43.23
Averaged formfactors of DRs used
 DR formfactor multiplier ............................... : 1.000
 Symmetry: P1...19 or Pn2 (n=1,..,12) ... <           P1 >: p2
 Angles penalty ......................................... : 21.29
 Dihedrals penalty ...................................... : 0.9778
 Angles penalty weight .................................. : 10.00
 Dihedrals penalty weight ............................... : 1.000
 Cross penalty .......................................... : 0.4853
 Cross penalty weight ................................... : 100.0
 Extended loops penalty ................................. : 0.0
 Extended loops penalty weight .......................... : 1.000
 Distances penalty ...................................... : 0.6956
 Distances penalty weight ............................... : 10.00
 Shift penalty .......................................... : 0.2543
 Shift penalty weight ................................... : 1.000
 File name, contacts conditions, CR for none <         .cnd >: contacts
Condition #  1: Distance   7.000
   Between chain #  1, Residues from ASP    77 to ASP    77
       and chain #  2, Residues from PRO    86 to PRO    86
Condition #  2: Distance   7.000
   Between chain #  1, Residues from MET    69 to MET    69
       and chain #  2, Residues from GLY    97 to GLY    97
 Contacts conditions penalty ............................ : 0.2311
 Input total number of scattering curves  <            1 >:
 Input first & last residues in 1-st construct <            1,         387 >:
 Enter file name, 1-st experimental data  <         .dat >: Gst-D_med
 Number of experimental points found .................... : 263
 Angular units in the input file :
 4*pi*sin(theta)/lambda [1/angstrom] (1)
 4*pi*sin(theta)/lambda [1/nm      ] (2)
 2*   sin(theta)/lambda [1/angstrom] (3)
 2*   sin(theta)/lambda [1/nm      ] (4)  <            1 >:
 Fitting range in fractions of Smax ..... <        1.000 >:
 Experimental radius of gyration ........................ : 41.44
 Number of points in the Guinier Plot ................... : 15
 Amplitudes, 1-st subunit ............... <         .alm >: 1gtaz100
 Maximum order of harmonics ............................. : 15
 Number of points in partial amplitudes ................. : 51
 Current subunit:   1786 atoms read, center at   -12.40    0.00    0.00
 Residues in the full-length protein ........ : 1    -      218
 Fix the subunit at this position? [ Y / N ] <           No >:
 ALMGRZ --- :  110976 summation coefficients used
 Amplitudes, 2-nd subunit ............... <         .alm >: 1ra900
 Current subunit:   1299 atoms read, center at    14.93   18.33   12.16
 Residues in the full-length protein ........ : 229    -    387
 Fix the subunit at this position? [ Y / N ] <           No >:
 Total penalty .......................................... : 272.0
 1-st curve:
 NEXP reduced to ........................................ : 257
 Theoretical  points from            2  to           51  used
 The best chi values: 4.76847
 Initial fVal ........................................... : 294.7
 Angular step in degrees ................ <        20.00 >:
 Initial annealing temperature .......................... : 1.000
 Annealing schedule factor .............................. : 0.9000
 Max # of iterations at each T .......................... : 10000
 Max # of successes at each T ........................... : 1000
 Min # of successes to continue ......................... : 100
 Max # of annealing steps ............................... : 100
  ====  Simulated annealing procedure started  ====
 j:   1 T: 0.100E+01 Suc:  1000 Eva:     3211 CPU:  0.601E+02 F:38.0726 Pen: 32.0230
 The best chi values: 2.45959
 j:   2 T: 0.900E+00 Suc:  1000 Eva:     6869 CPU:  0.129E+03 F:31.7038 Pen: 30.8787
 The best chi values: 0.90834
 j:   3 T: 0.810E+00 Suc:  1000 Eva:    10697 CPU:  0.204E+03 F:29.7097 Pen: 27.9605
 The best chi values: 1.32260
 ...

  Last modified: April 11, 2013

© BioSAXS group 2013