SASREF 6 manual

sasref

Written by M.V. Petoukhov and D.I. Svergun.
Post all your questions about SASREF to the ATSAS Forum.

Manual
Examples
- Constructing a Complex

Manual

The following describes the method implemented in SASREF, details of the dialog prompt as well as the required input and the produced output files.

If you use results from SASREF in your own publication, please cite:

Petoukhov, M.V. & Svergun, D.I. (2005) Global rigid body modeling of macromolecular complexes against small-angle scattering data. Biophys J. 89, 1237-1250.

Introduction

SASREF performs quaternary structure modeling of a complex formed by subunits with known atomic structure against the SAXS data set. Further, it can simultaneously fit multiple SAXS data sets from the subcomplexes, if available, and account for the particle symmetry.

A simulated annealing protocol is employed to construct an interconnected ensemble of subunits without steric clashes, while minimizing the discrepancy between the experimental scattering data and the curves calculated from the appropriate subunits assemblies.

The theoretical scattering patterns I(s) are expressed in terms of spherical harmonics from the partial scattering amplitudes of the subunits A_lm(s) in their given positions and orientations. The subunit's amplitudes in arbitrary arrangement depend on its scattering amplitudes in the reference position and on three rotational and three translational parameters. The reference partial scattering amplitudes of the subunits have to be precomputed by the program CRYSOL (recommended values are lm=15, ns=51).

Eventual symmetry (must be the same for all subcomplexes) can be taken into account, whereby SASREF searches for the subunits arrangement inside the asymmetic part and the rest is generated according to symmetry rules.

Please refer to the paper cited above for further details about the implemented algorithm.

Running sasref

Interactive Configuration

SASREF can only be run in the dialog mode, no command line arguments are accepted. There are two modes, EXPERT and USER. In the former mode, the user have the options to adjust any program parameters. In the latter mode, fewer questions are asked as the default values are used for the most of program parameters, the user only needs to provide basic input. The default settings are the same in both modes.

SASREF interactive prompt:

Screen Text	Default	Asked in `USER`-mode?	Description
`Computation mode (User or Expert)`	`USER`	Y	Mode selection.
`Log file name`	N/A	Y	Project identifier, will be used as a prefix for all output file names.
`Enter project description`	N/A	Y	Any text that will be stored in the log file.
`Input total number of subunits`	`1`	Y	Number of separate rigid bodies in the asymmetric part of the complex.
`Symmetry: P1...19 or Pn2 (n=1,..,12)`	`P1`	Y	Supported symmetries are: `P1, P2-P19 (nineteen-fold), P222, P32-P(12)2`, as well as cubic types (`P23, P432`) and icosahedral symmetry (`Pico`). The n-fold axis is typically Z, if there is in addition a two-fold axis, it coincides with Y.
`Input total number of scattering curves`	`1`	Y	If in addition to the whole complex, some sub-complexes (and their scattering profiles) are available, they can be fitted simultaneously assuming the same arrangement of subunits in all the constructs.
`Use Kratky Geometry`	`N`	N	If the answer is `Yes`, the computed curves will be smeared to fit the data from Kratky camera.
`Input first & last subunits in 1-st construct`	`1,`var	Y	The range of subunits present in the given construct (one scattering curve=one construct; one subunit=one rigid body=one pdb file). This question is asked for each construct, i.e. the number of times equals to the `total number of scattering curves` (answer to the previous question). The default answer is from `1` to the `total number of scattering curves`.
`Enter file name, 1-st experimental data <dat >`	N/A	Y	The name of the data file containing the experimental SAXS profile of a certain construct. The question is asked for each construct.
`Angular units in the input file : 4pisin(theta)/lambda [1/angstrom] (1) 4pisin(theta)/lambda [1/nm ] (2) 2* sin(theta)/lambda [1/angstrom] (3) 2* sin(theta)/lambda [1/nm ] (4)`	`1`	Y	Formula for the scattering vector in the data file and its units. The question is asked for each construct.
`Fitting range in fractions of Smax`	`1.0`	Y	Percentage of the scattering curve to fit, starting at the first point. Default is the entire curve. The question is asked for each construct.
`Amplitudes, 1-st subunit <alm>`	N/A	Y	The name of the file with partial scattering amplitudes of a certain subunit computed by CRYSOL. This question is asked for each subunit, i.e. the number of times equals to the `total number of subunits`.
`Initial rotation by alpha`	`0.0`	Y	The user can specify an arbitrary initial rotation by Euler angle Alpha. By default, no rotation is made, i.e. the reference orientation in the PDB file is used as a starting one. This question is asked for each subunit.
`Initial rotation by beta`	`0.0`	Y	The user can specify an arbitrary initial rotation by Euler angle Beta. This question is asked for each subunit.
`Initial rotation by gamma`	`0.0`	Y	The user can specify an arbitrary initial rotation by Euler angle Gamma. This question is asked for each subunit.
`Initial shift along X`	var	Y	The user can specify an arbitrary initial shift along the X-axis of the orthogonal coordinate system. By default, the subunit is shifted to the position as it appears in the PDB file. Another reasonable option is to place the subunit at the origin (`0.0 0.0 0.0`) and let the program build the complex "from scratch". This question is asked for each subunit.
`Initial shift along Y`	var	Y	The user can specify an arbitrary initial shift along the Y-axis of the orthogonal coordinate system. This question is asked for each subunit.
`Initial shift along Z`	var	Y	The user can specify an arbitrary initial shift along the Z-axis of the orthogonal coordinate system. This question is asked for each subunit.
`Movements limitations of subunit: N/F/X/Y/Z/D?`	`N`	Y	It is possible to fix the subunit in the original position/orientations (`F`), e.g. to keep the desired mutual arrangement between the certain subunits, or to move/rotate the subuntis only along specified axes: `X`, `Y`, `Z` or the cube's diagonal (`D`). If the answer is `N`, no restrictions are applied. This question is asked for each subunit.
`Spatial step in angstroems`	`5.0`	N	Maximal random shift of a subunit at a single modification of the system in the course of simulated annealing. This question is asked for each subunit.
`Angular step in degrees`	`20.0`	N	Maximal random rotation angle of a subunit at a single modification of the system in the course of simulated annealing. Setting it to zero may be useful to keep the mutual orientations of certain subunits, e.g. if NMR RDC data are available. This question is asked for each subunit.
`Cross penalty weight`	`10.0`	N	How much the Cross Penalty shall influence the acceptance or rejection of a mutation. A value of `0.0` disables the penalty. If unsure, use the default value. If clashes between the subunits are observed, try increasing this penalty weight.
`Disconnectivity penalty weight`	`10.0`	N	How much the Disconnectivity Penalty shall influence the acceptance or rejection of a mutation. A value of `0.0` disables the penalty. If unsure, use the default value. If not interconnected arrangement of the subunits is observed, try increasing this penalty weight.
`File name, contacts conditions, CR for none <.cnd >`	`empty`	Y	If the information on interface between certain subunits in terms of contacting residues is available, it may be used as a modeling restraint. The information is provided in a file with special format. By default no information is given.
`Contacts penalty weight`	`10.0`	N	How much improper contacts shall influence the acceptance or rejection of a mutation. If unsure, use the default value. If desired interfaces are not obtained, try increasing this penalty weight. This question is only asked if the contacts conditions file is provided.
`Expected particle shape: Prolate, Oblate, or Unknown`	`UNKNOWN`	Y	If, due to prior studies, it is known that the particle's shape shall be either `PROLATE` or `OBLATE`, one may use the anisometry option to enforce a penalty on particles that do not correspond with the expected anisometry. By default, anisometry is '`UNKNOWN`'.
`Anisometry penalty weight`	`1.0`	N	How much improper anisometry shall influence the acceptance or rejection of a mutation. If unsure, use the default value. This question is skipped if the Expected particle shape is '`UNKNOWN`'.
`Expected direction of anisometry: aLong Z, aCross Z, or Unknown`	`UNKNOWN`	Y	This question is only asked if the Expected particle shape is not '`UNKNOWN`' and the symmetry is '`P2`'. The user can specify if the symmetry axis coincides with (`ALONG`) or perpendicular to (`ACROSS`) the anisometry axis.
`Shift penalty weight`	`1.0`	N	How much shift from the origin of the entire complex shall influence the acceptance or rejection of a mutation. A value of `0.0` disables the penalty. If unsure, use the default value. This penalty is necessary to keep the model close to the origin so that the higher order harmonics are not lost and the scattering is computed accurately.
`Initial annealing temperature`	`10.0`	N	Starting temperature of simulated annealing protocol.
`Annealing schedule factor`	`0.9`	N	Factor by which the temperature is decreased; 0.9 is a good average value. If slower cooling is wanted increase the value (e.g. to 0.95).
`Max # of iterations at each T`	`var`	N	Finalize temperature step and cool after this many iterations at the latest. The default value is `5000* total number of subunits`.
`Max # of successes at each T`	`var`	N	Finalize temperature step and cool after at most this many successful mutations. The default value is `500* total number of subunits`.
`Min # of successes to continue`	`var`	N	Stop simulated annealing if not at least this many successful mutations within a single temperature step can be done. The default value is `50* total number of subunits`.
`Max # of annealing steps`	`100`	N	Stop if simulated annealing is not finished after this many steps. The slower the system is cooled, the more temperature steps are required.

Runtime Output

On runtime, two lines of output will be generated for each temperature step:

j:   4 T: 0.729E+01 Suc:  1000 Eva:    12497 CPU:  0.208E+03 F:99.4301 Pen: 13.803
The best chi values:11.64871 5.96331

The fields can be interpreted as follows, top-left to bottom-right:

Field	Description
`j`	Step number. Starts at 1, increases monotonically.
`T`	Temperature measure, starts at an arbitrary high value, decreases each step by the `annealing schedule factor`.
`Suc`	Number of successful mutations in this temperature step. Limited by the minimum and maximum number of successes. The number of successes should slowly decrease, the first couple of steps should be terminated by the maximum number of successes criterion. If instead the maximum number of iterations are done, or the number of successes drops suddenly by a large amount, the system should probably be cooled more slowly.
`Eva`	Accumulated number of function evaluations.
`CPU`	Elapsed wall-clock time since the annealing procedure was started.
`F`	The best target function value obtained so far.
`Pen`	Accumulated penalty value of the best target function.
`The best chi values`	For each curve out of `total number of curves`, the χ value of the best target function is given.

sasref Input Files

SASREF uses the SAXS experimental data files (*.dat) in ascii format containing 3 columns: (1) experimental scattering vector, (2) experimental intensity and (3) experimental errors; binary files with partial scattering amplitudes computed by CRYSOL; and optional contacts conditions file in the following format:

      dist 7.0
      1 0 0 2 1 1
      dist 5.0
      2 0 0 3 1 1
      dist 7.0
      1 342 342 2 25 25
      1 350 350 2 17 17
      dist 6.0
      1 290 297 2  64 79
      dist 7.0
      1 1 0 3 1 0

"dist 7.0" means that the minimum distance between CA atoms of the residues (or P atoms in the nucleotides) specified in the following lines should not exceed 7 Å. The first and the fourth numbers in the line not containing keyword "dist" mean the ordinal numbers of the 1st and the 2nd subunits having the contact by any residue/nucleotide of the 1st subunit in the range from second number to third number with any residue of the 2nd subunit in the range from fifth number to sixth number. 0 means the last residue/nucleotide of the subunit.

If two (or more) alternatives are given after the line with the keyword "dist", the program compares the better (smaller) distance among them with the specified one.

Important: here, residue/nucleotide number is the ordinal number of CA (or P) atom in the PDB file, i.e. in the following file, Pro32 will have residue number equal to 2.

ATOM 1  N   GLY A 31 -6.047 33.786  1.442
ATOM 2  CA  GLY A 31 -5.711 33.334  0.066
ATOM 3  C   GLY A 31 -4.332 32.718  0.000
ATOM 4  O   GLY A 31 -3.676 32.483  0.995
ATOM 5  N   PRO A 32 -3.874 32.485 -1.215
ATOM 6  CA  PRO A 32 -2.562 31.874 -1.416
ATOM 7  C   PRO A 32 -1.444 32.754 -0.866
ATOM 8  O   PRO A 32 -1.566 33.990 -0.808
ATOM 9  CB  PRO A 32 -2.464 31.760 -2.936
ATOM 10 CG  PRO A 32 -3.446 32.698 -3.473
ATOM 11 CD  PRO A 32 -4.564 32.799 -2.483
ATOM 12 N   LEU A 33 -0.348 32.111 -0.506
ATOM 13 CA  LEU A 33  0.834 32.815 -0.070
ATOM 14 C   LEU A 33  1.392 33.614 -1.230
ATOM 15 O   LEU A 33  1.470 33.154 -2.364
ATOM 16 CB  LEU A 33  1.900 31.869  0.390
ATOM 17 CG  LEU A 33  1.537 31.036  1.611
ATOM 18 CD1 LEU A 33  2.576 29.958  1.797
ATOM 19 CD2 LEU A 33  1.490 31.984  2.815

If for instance 3 domains form one polypeptide chain and nothing is missing between C- and N-termini of subsequent pdb files, the simplest view of the contacts conditions file would be:

     dist 4.0
     1 0 0 2 1 1
     dist 4.0
     2 0 0 3 1 1

sasref Output Files

After each simulated annealing step, SASREF creates a set of output files, each filename starts with a customizable prefix that gets an extension appended. If a prefix has been used before, existing files will be overwritten without further note.

Extension	Description
`.log`	Contains the same information as the screen output and is updated during execution of the program.
`.pdb`	The current model of the entire complex. The `REMARK` section of the file contains information about the application used and about the parameters of the model, e.g. penalties and χ.
`-i.fit`	Fit of the scattering curve computed from the complex (subcomplex) versus the corresponding experimental data. i stands for the `construct` number. Columns in the output file are: '`s`', '`I_exp`' and '`I_comp`'.

Examples

Constructing a Complex

A simulated complex constructed using crystallographic coordinates of two proximal monomers of glutamil-tRNA synthetase complexed with tRNA (PDB entry 1g59). The entire dimer has molecular weight of 156 kDa and contains 468 amino acids and 75 bases per monomer, the monomers are related by two-fold symmetry axis. The theoretical scattering curves of the dimeric tRNA and the entire complex computed by CRYSOL are stored in the files trnadim.dat and complex.dat, respectively.

trna.pdb and prot.pdb are the structures of monomeric tRNA and protein in arbitrary orientations, both centered at the origin. The files trna.alm and prot.alm contain the scattering amplitudes of the above monomers calculated using CRYSOL.

Additional information on the contacts between tRNA and protein (U513 with Pro303 and A573 with Gly121 ) is given in the file contacts.cnd

A listing of questions/answers for a sample run in the USER mode is as follows:

 Computation mode (User or Expert) ...... <         User >:
 Log file name .......................... <         .log >: nucpro
 Project identificator .................................. : nucpro
 Enter project description .............. : dimeric protein-RNA complex
 Random sequence initialized from ....................... : 164152
 Input total number of subunits ......... <            1 >: 2
 Symmetry: P1...19 or Pn2 (n=1,..,12) ... <           P1 >: p2
 Input total number of scattering curves  <            1 >: 2
 Input first & last subunits in 1-st construct <            1,           2 >: 1,1
 Enter file name, 1-st experimental data  <         .dat >: trnadim
 Number of experimental points found .................... : 201
 Angular units in the input file :
 4*pi*sin(theta)/lambda [1/angstrom] (1)
 4*pi*sin(theta)/lambda [1/nm      ] (2)
 2*   sin(theta)/lambda [1/angstrom] (3)
 2*   sin(theta)/lambda [1/nm      ] (4)  <            1 >:
 Fitting range in fractions of Smax ..... <        1.000 >:
 Experimental radius of gyration ........................ : 29.40
 Number of points in the Guinier Plot ................... : 29
 Input first & last subunits in 2-nd construct <            1,           2 >:
 Enter file name, 2-nd experimental data  <         .dat >: complex
 Number of experimental points found .................... : 201
 Angular units in the input file :
 4*pi*sin(theta)/lambda [1/angstrom] (1)
 4*pi*sin(theta)/lambda [1/nm      ] (2)
 2*   sin(theta)/lambda [1/angstrom] (3)
 2*   sin(theta)/lambda [1/nm      ] (4)  <            1 >:
 Fitting range in fractions of Smax ..... <        1.000 >:
 Experimental radius of gyration ........................ : 42.51
 Number of points in the Guinier Plot ................... : 21
 Amplitudes, 1-st subunit ............... <         .alm >: trna
 Maximum order of harmonics ............................. : 15
 Number of points in partial amplitudes ................. : 51
  SASREF --W- Lm reduced to compute cross term
 Current subunit:   1597 atoms read, center at     0.00    0.00    0.00
 Initial rotation by alpha .............. <          0.0 >:
 Initial rotation by beta ............... <          0.0 >:
 Initial rotation by gamma .............. <          0.0 >:
 Initial shift along X .................. <     8.140e-6 >: 0
 Initial shift along Y .................. <     1.209e-4 >: 0
 Initial shift along Z .................. <    -3.757e-6 >: 0
 Fix the subunit at this position? [ Y / N ] <           No >:
 ALMGRZ --- :   91800 summation coefficients used
 Amplitudes, 2-nd subunit ............... <         .alm >: prot
  SASREF --W- Lm reduced to compute cross term
 Current subunit:   3813 atoms read, center at     0.00    0.00    0.00
 Initial rotation by alpha .............. <          0.0 >:
 Initial rotation by beta ............... <          0.0 >:
 Initial rotation by gamma .............. <          0.0 >:
 Initial shift along X .................. <    -1.668e-4 >: 0
 Initial shift along Y .................. <     2.098e-5 >: 0
 Initial shift along Z .................. <    -6.819e-6 >: 0
 Fix the subunit at this position? [ Y / N ] <           No >:
 Cross value ............................................ : 14.17
 Discontiguity value .................................... : 0.0
 File name, contacts conditions, CR for none <         .cnd >: contacts
 Condition #  1: Distance   5.000
    Between subunit #  1, Residues from  P     U A 513  to  P     U A 513
        and subunit #  2, Residues from  CA  PRO B 303  to  CA  PRO B 303
 Condition #  2: Distance   5.500
    Between subunit #  1, Residues from  P     A A 573  to  P     A A 573
        and subunit #  2, Residues from  CA  GLY B 121  to  CA  GLY B 121
 Contacts conditions penalty ............................ : 42.36
 Expected particle shape: Prolate, Oblate,
  or Unknown .......................... <      Unknown >:
 Shift penalty is normalized by ......................... : 30.48
 Shift penalty .......................................... : 0.0
 Shift penalty weight ................................... : 1.000
 Total penalty .......................................... : 565.3
 1-st curve:
 NEXP reduced to ........................................ : 200
 Theoretical  points from            1  to           51  used
 2-nd curve:
 NEXP reduced to ........................................ : 200
 Theoretical  points from            1  to           51  used
 The best chi values:11.1205510.89429
 Initial fVal ........................................... : 686.5
 Initial annealing temperature .......................... : 10.00
 Annealing schedule factor .............................. : 0.9000
 Max # of iterations at each T .......................... : 10000
 Max # of successes at each T ........................... : 1000
 Min # of successes to continue ......................... : 100
 Max # of annealing steps ............................... : 100
  ====  Simulated annealing procedure started  ====
 j:   1 T: 0.100E+02 Suc:  1000 Eva:     2884 CPU:  0.747E+02 F: 7.7639 Pen:  0.7008
 The best chi values: 2.32386 2.95396
 j:   2 T: 0.900E+01 Suc:  1000 Eva:     6153 CPU:  0.159E+03 F: 7.7639 Pen:  0.7008
 The best chi values: 2.32386 2.95396
 ...

SASREF 6 manual

Table of Contents