Biological
Small Angle Scattering

ATSAS online | Forum | User information | EMBL Hamburg

SASREF CV/MX manual

sasref cv / mx

Written by M.V. Petoukhov
Post all your questions about SASREF to the ATSAS Forum.

Manual
Examples

Manual

The following describes the special versions of the rigid body modelling program SASREF, SASREFCV (formerly known as SASREF7) for fitting of SANS contrast variation series (also combinable with SAXS) and SASREFMX for the structural analysis of transient complexes and weak oligomers from polydisperse data. In the latter case, the rigid body modelling is coupled with mixture analysis, whereby the volume fractions of the dissociation products are estimated. The two approaches have very much in common, so this manual provides details of the dialog prompt as well as the required configuration / input files as well as the produced output for both programs.

If you use results from SASREFCV in your own publication, please cite:
Petoukhov, M.V. & Svergun, D.I. (2006) Joint use of small-angle X-ray and neutron scattering to study biological macromolecules in solution. Eur. Biophys. J. 35(7), 567-576 (© EBSA 2006 Eur. Biophys. J. 35(7), 567-576)

If you use results from SASREFMX in your own publication, please cite:
Petoukhov, M.V., Franke, D., Shkumatov, A.V., Tria, G., Kikhney, A.G., Gajda, M., Gorba, C., Mertens, H.D.T., Konarev, P.V. and Svergun, D.I. (2012) New developments in the ATSAS program package for small-angle scattering data analysis. J. Appl. Cryst. 45, 342-350 © International Union of Crystallography DOI

Introduction

SASREF CV / MX perform quaternary structure modeling of a complex particle formed by subunits with known atomic structure against the SAS data set in case of contrast variation series and a polydisperse system, respectively. Multiple data sets can be fitted simultaneously, e.g. different D2O content and/or perdeuteration (in SASREFCV) or profiles recorded at different conditions (concentration, temperature, pH, ionic strength) yielding different affinity of the complex particle (in SASREFMX). Both algorithms are capable to account for the symmetry (which can be subunit- and data-specific).

A simulated annealing protocol is employed to construct an interconnected ensemble of subunits without steric clashes, while minimizing the discrepancy between the experimental scattering data and the predicted curves from the appropriate subunits assemblies. In case of SASREFMX, the experimental data is fitted by the linear combination of the profiles calculated from the intact particle and from the dissociation products.

For futher details of the rigid body modelling approach please refer to SASREF manual and to the papers cited above.

Running SASREF CV / MX

Interactive Configuration

SASREF CV / MX can only be run in the dialog mode, no command line arguments are accepted. Similarly to MONSA, significant amount of the user input is provided using configuration files. There are two modes, EXPERT and USER. In the former mode, the user have the options to adjust more parameters. In the latter mode, fewer questions are asked as the default values are used for the most of the program parameters. The default settings are the same in both modes.

SASREF CV / MX interactive prompt:

Screen Text	Default	Asked in `USER`-mode?	Description
`Computation mode (User or Expert)`	`USER`	Y	Mode selection.
`Log file name`	N/A	Y	Project identifier, will be used as a prefix for all output file names.
`Enter project description`	N/A	Y	Any text that will be stored in the log file.
`Symmetry: Pn(2) (n=2-6)`	`P1`	Y	"Master" (highest order) symmetry. Individual subunits or scattering profiles may have lower order symmetry. Supported symmetries are: `P1`(no symmetry)`P2-P6, P222, P32-P62`. The n-fold axis is typically Z, if there is in addition a two-fold axis, it coincides with Y.
`File name with curves info`	N/A	Y	Configuration file for the scattering profiles to be fitted
`File name with smearing parameters`	`empty`	Y	If required, SASREF smears the theoretical curves using the resolution function introduced by J. Skov Pedersen et al. (1990), J. Appl. Cryst., 23, 321. It is mostly needed for the SANS data but could also be applied for non-point SAXS source. Please refer to MONSA manual for the explanations on the file format. If no file name is provided, no smearing is applied.
`File name with subunits info`	N/A	Y	Configuration file for the atomic models of subunits
`File name with cross-dependencies`	N/A	Y	Configuration file with the cross-corelation table between the scattering curves and contributing subunits.
`Cross penalty weight`	`10.0`	N	How much the Cross Penalty shall influence the acceptance or rejection of a mutation. A value of `0.0` disables the penalty. If unsure, use the default value. If clashes between the subunits are observed, try increasing this penalty weight.
`Disconnectivity penalty weight`	`10.0`	N	How much the Disconnectivity Penalty shall influence the acceptance or rejection of a mutation. A value of `0.0` disables the penalty. If unsure, use the default value. If not interconnected arrangement of the subunits is observed, try increasing this penalty weight.
`File name, contacts conditions, CR for none <.cnd>`	`empty`	Y	If the information on interface between certain subunits in terms of contacting residues is available, it may be used as a modeling restraint. The information is provided in a file with special format. By default no information is given.
`Contacts penalty weight`	`10.0`	N	How much improper contacts shall influence the acceptance or rejection of a mutation. If unsure, use the default value. If desired interfaces are not obtained, try increasing this penalty weight. This question is only asked if the contacts conditions file is provided.
`Expected particle shape: Prolate, Oblate, or Unknown`	`UNKNOWN`	Y	If, due to prior studies, it is known that the particle's shape shall be either `PROLATE` or `OBLATE`, one may use the anisometry option to enforce a penalty on particles that do not correspond with the expected anisometry. By default, anisometry is '`UNKNOWN`'.
`Anisometry penalty weight`	`1.0`	N	How much improper anisometry shall influence the acceptance or rejection of a mutation. If unsure, use the default value. This question is skipped if the Expected particle shape is '`UNKNOWN`'.
`Expected direction of anisometry: aLong Z, aCross Z, or Unknown`	`UNKNOWN`	Y	This question is only asked if the Expected particle shape is not '`UNKNOWN`' and the symmetry is '`P2`'. The user can specify if the symmetry axis coincides with (`ALONG`) or perpendicular to (`ACROSS`) the anisometry axis.
`Shift penalty weight`	`1.0`	N	How much shift from the origin of the entire complex shall influence the acceptance or rejection of a mutation. A value of `0.0` disables the penalty. If unsure, use the default value. This penalty is necessary to keep the model close to the origin so that the higher order harmonics are not lost and the scattering is computed accurately.
`Spatial step in angstroems`	`5.0`	N	Maximal random shift of a subunit at a single modification of the system in the course of simulated annealing. This question is asked for each subunit.
`Angular step in degrees`	`20.0`	N	Maximal random rotation angle of a subunit at a single modification of the system in the course of simulated annealing. Setting it to zero may be useful to keep the mutual orientations of certain subunits, e.g. if NMR RDC data are available. This question is asked for each subunit.
`Initial annealing temperature`	`10.0`	N	Starting temperature of simulated annealing protocol.
`Annealing schedule factor`	`0.9`	N	Factor by which the temperature is decreased; 0.9 is a good average value. If slower cooling is wanted increase the value (e.g. to 0.95).
`Max # of iterations at each T`	`var`	N	Finalize temperature step and cool after this many iterations at the latest. The default value is `5000* total number of subunits`.
`Max # of successes at each T`	`var`	N	Finalize temperature step and cool after at most this many successful mutations. The default value is `500* total number of subunits`.
`Min # of successes to continue`	`var`	N	Stop simulated annealing if not at least this many successful mutations within a single temperature step can be done. The default value is `50* total number of subunits`.
`Max # of annealing steps`	`100`	N	Stop if simulated annealing is not finished after this many steps. The slower the system is cooled, the more temperature steps are required.

Runtime Output

On runtime, two lines of output will be generated for each temperature step:

j:   4 T: 0.729E+01 Suc:  1000 Eva:    12497 CPU:  0.208E+03 F:99.4301 Pen: 13.803
The best chi values:11.64871 5.96331

The fields can be interpreted as follows, top-left to bottom-right:

Field	Description
`j`	Step number. Starts at 1, increases monotonically.
`T`	Temperature measure, starts at an arbitrary high value, decreases each step by the `annealing schedule factor`.
`Suc`	Number of successful mutations in this temperature step. Limited by the minimum and maximum number of successes. The number of successes should slowly decrease, the first couple of steps should be terminated by the maximum number of successes criterion. If instead the maximum number of iterations are done, or the number of successes drops suddenly by a large amount, the system should probably be cooled more slowly.
`Eva`	Accumulated number of function evaluations.
`CPU`	Elapsed wall-clock time since the annealing procedure was started.
`F`	The best target function value obtained so far.
`Pen`	Accumulated penalty value of the best target function.
`The best chi values`	For each curve out of `total number of curves`, the χ value of the best target function is given.

SASREFMX additionally outputs the volume fraction of the intact construct for each of the fitted curves:

 Associated volume fractions: 0.42157 0.64983

SASREF CV / MX Input Files

Three compulsory configuration files are to be created containing the information about:

(a) - the scattering data,
(b) - the subunits (rigid bodies) and
(c) - the contribution of each subunit to each scattering curve.

Data control file (data.con in the following examples) has the following format:

 - The first line contains one integer K (total number of scattering curves
   for SASREFCV and total number of scattering curves+1 for SASREFMX)
 - K lines, each containing 8 parameters related to the scattering data set:

Field	Acceptable values	Description
1.	`N/A`	File name with the experimental data `(*.dat)` in ascii format containing 3 columns: (1) experimental scattering vector, (2) experimental intensity and (3) experimental errors
2.	`[-1.0, 0.0-1.0]`	D2O fraction in the solvent or `-1.0`, if X-ray scattering data
3.	`[P1-P6, P222-P62]`	Symmetry for the given construct at the given conditions (which may be different from the overal symmetry)
4.	`[1,2]`	Angular units: `1` = 4πsin(θ)/λ in Å^-1, `2` = 4πsin(θ)/λ in nm^-1
5.	`[0.1-1.0]`	Fraction of the curve to be fitted
6.	`[0-15]`	Setting number: number of the column in the optional "Resolution file" containing the information for smearing. This value must be `0` for X-ray curves and for the neutron curves, for which smearing information is not available. Neutron scattering curves with the same non-zero setting number must have the same angular axis and number of experimental points.
7.	`[0.0-1.0]`	Weight of the curve in the target function.
8.	`[Y/N]`	If `Y`, a constant background will be automatically adjusted for this curve (this could be useful for example to correct for incoherent background in neutron data)

Last line in case of SASREFMX describes the dissociation products and contains 
all dummy values except for the symmetry.

Subunits control file (subs.con in the examples) describes the rigid bodies. Its format is the following:

 - The first line contains one integer M (total number of subunits)
 - M lines, each containing 4 parameters related to the subunit:

Field	Acceptable values	Description
1.	`N/A`	PDB file name
2.	`[Y/N]`	Whether to shift the subunit to the origin at the begining or not
3.	`[N/F/X/Y/Z/D]`	Movements limitations. `N`='No limitations'; `F`='subunit will be fixed'; `X`='rotations/translations along X axis only'; `Y`='rotations/translations along Y axis only'; `Z`='rotations/translations along Z axis only'; `D`='rotations/translations along (1,1,1) vector only';
4.	`[P1-P6, P222-P62]`	Symmetry applied to the given subunit (may be different from the overal symmetry)

Cross-correlation file (table.con in the examples) contains a table which sets the relationship between the subunits and the scattering profiles. The number of its columns equals to the total number of the subunits (M) and the number of its rows equals to the total number of the scattering curves (K). The value in the i-th column and j-th row gives the contribution of the i-th subunit in the j-th scattering data set. For SANS curve this value ([0.0-1.0 or -1.0]) is the subunit perdeuteration (D₂O content in solution where the protein is expressed), whereby -1.0 means that the given subunit is not present in the corresponding construct. For X-ray scattering curve, 0.0 is to be used, if the subunit is present.

In case of SASREFMX, the last row describes the dissociation products which are mixed to all the curves. Here, an integer number, 0 or -1 are allowed, whereby -1 means that the subunit is not among the dissociation products, 0 means that the subunit is a part of a sub-complex which is a dissociation product of a larger assembly (there could be not more than just one such sub-complex) and an integer means the molar ratio (stoichiometry) of the subunit in the original (fully dissociated) sample.

Distance restraints may be imposed on the model using contacts conditions file (optional) in the following format:

      dist 7.0
      1 0 0 2 1 1
      dist 5.0
      2 0 0 3 1 1
      dist 7.0
      1 342 342 2 25 25
      1 350 350 2 17 17
      dist 6.0
      1 290 297 2  64 79
      dist 7.0
      1 1 0 3 1 0

"dist 7.0" means that the minimum distance between CA atoms of the residues (or P atoms in the nucleotides) specified in the following lines should not exceed 7 Å. The first and the fourth numbers in the line not containing keyword "dist" mean the ordial numbers of the 1st and the 2nd subunits having the contact by any residue/nucleotide of the 1st subunit in the range from second number to third number with any residue of the 2nd subunit in the range from fifth number to sixth number. 0 means the last residue/nucleotide of the subunit.

If two (or more) alternatives are given after the line with the keyword "dist", the program compares the better (smaller) distance among them with the specified one.

Please refer to SASREF manual for more details. Important: there is a small difference here in the numbering of the subunits due to the possibility of distinct symmetries applied to individual subunits in SASREF CV / MX. First, all subunits in the asymmetric part appear (as in conventional SASREF version) followed by all symmetry mates of the first subunit, then of the second and so on until the last one.

SASREF CV / MX Output Files

After each simulated annealing step, SASREF CV /MX creates a set of output files, each filename starts with a customizable prefix that gets an extension appended. If a prefix has been used before, existing files will be overwritten without further note.

Extension	Description
`.log`	Contains the same information as the screen output and is updated during execution of the program.
`.pdb`	The current model of the entire complex. The `REMARK` section of the file contains information about the application used and about the parameters of the model, e.g. penalties and χ.
`-i.fit`	Fit of the scattering curve computed from the complex (subcomplex) versus the corresponding experimental data. i stands for the `construct` number. Columns in the output file are: '`s`', '`I_exp`' and '`I_comp`'.

Examples

Building a Complex against X-ray and Contrast Variation SANS Data Sets

A simulated exampe of T7 DNA Polymerase Ternary Complex with DCTP (PDB entry 1t8e). PDB files containing the atomic coordinates of the three subunits are:

phsave1.pdb   -   Polymerase
phsave2.pdb   -   DCTP
phsave3.pdb   -   DNA

Simulated SAS data contain 17 curves in total: 2 X-ray profiles (from the entire complex and from the binary construct without DNA) + 15 neutron scattering curves from the complex [(series of D₂O content: 0, 40, 55, 70 and 100% D₂O)* (3 perduterations of DCTP: 0, 50 and 100%)]:

x-prot.dat       X-ray protein complex
x-compl.dat      X-ray, ternary complex
complh_0.dat     ternary complex with protonated DCTP in 0%D2O
complh_40.dat                                         in 40%D2O
complh_55.dat                                         in 55%D2O
complh_70.dat                                         in 70%D2O
complh_100.dat                                        in 100%D2O
compl50d_0.dat                    50% deuterated DCTP in 0%D2O
compl50d_40.dat                                       in 40%D2O
compl50d_55.dat                                       in 55%D2O
compl50d_70.dat                                       in 70%D2O
compl50d_100.dat                                      in 100%D2O
compl100d_0.dat                 fully deuterated DCTP in 0%D2O
compl100d_40.dat                                      in 40%D2O
compl100d_55.dat                                      in 55%D2O
compl100d_70.dat                                      in 70%D2O
compl100d_100.dat                                     in 100%D2O

Content of the curves.con file:

17
x-prot.dat       -1.00 P1 1 1.0 0 1.0 y
x-compl.dat      -1.00 P1 1 1.0 0 1.0 y
complh_0.dat      0.00 P1 1 1.0 0 1.0 y
complh_40.dat     0.40 P1 1 1.0 0 1.0 y
complh_55.dat     0.55 P1 1 1.0 0 1.0 y
complh_70.dat     0.70 P1 1 1.0 0 1.0 y
complh_100.dat    1.00 P1 1 1.0 0 1.0 y
compl50d_0.dat    0.00 P1 1 1.0 0 1.0 y
compl50d_40.dat   0.40 P1 1 1.0 0 1.0 y
compl50d_55.dat   0.55 P1 1 1.0 0 1.0 y
compl50d_70.dat   0.70 P1 1 1.0 0 1.0 y
compl50d_100.dat  1.00 P1 1 1.0 0 1.0 y
compl100d_0.dat   0.00 P1 1 1.0 0 1.0 y
compl100d_40.dat  0.40 P1 1 1.0 0 1.0 y
compl100d_55.dat  0.55 P1 1 1.0 0 1.0 y
compl100d_70.dat  0.70 P1 1 1.0 0 1.0 y
compl100d_100.dat 1.00 P1 1 1.0 0 1.0 y

Content of the subunits.con file:

3
phsave1.pdb Y F P1
phsave2.pdb Y N P1
phsave3.pdb Y N P1

Content of the table.con file:

0.0  0.0 -1.0 
0.0  0.0  0.0 
0.0  0.0  0.0 
0.0  0.0  0.0 
0.0  0.0  0.0 
0.0  0.0  0.0 
0.0  0.0  0.0 
0.0  0.5  0.0 
0.0  0.5  0.0 
0.0  0.5  0.0 
0.0  0.5  0.0 
0.0  0.5  0.0 
0.0  1.0  0.0 
0.0  1.0  0.0 
0.0  1.0  0.0 
0.0  1.0  0.0 
0.0  1.0  0.0

A listing of questions/answers for a sample run in the USER mode is as follows:

 Computation mode (User or Expert) ...... <         User >:
 Log file name .......................... <         .log >: test1
 Enter project description .............. : T7 DNA POLYMERASE WITH DCTP, 17 curves
 Symmetry: Pn(2) (n=2-6) ................ <           P1 >:
 File name with curves info ............. <         .con >: curves
 File name with smearing parameters ..... <         .res >:
 File name with subunits info ........... <         .con >: subunits
 File name with cross-dependencies ...... <         .con >: table
 File name, contacts conditions, CR for none <         .cnd >:
 Expected particle shape: <P>rolate, <O>blate,
  or <U>nknown .......................... <      Unknown >:
 ...

Quaternary Structure Analysis of a Weak Tetramer

Let weak_tetramer.dat be a SAXS profile from a polydisperse sample containing a tetramer with P222 symmetry and its monomer (i.e. oligomeric equilibrium). The atomic structure of the latter is contained in monomer.pdb. Content of the curves.con file is then:

2
weak_tetramer.dat -1.00 P222 1 1.0 0 1.0 y
dummy.dat         -1.00 P1   1 1.0 0 1.0 y

Content of the subunits.con file:

1
monomer.pdb Y N P222

Content of the table.con file:

0.0   
0.0

Modelling of a Transient Heterodimer

Equimolar mixture of proteins A and B yields an equilibrium between the AB complex and its components in unbound state. Concentration series with distinct volume fracions of the intact complex can be fitted simultaneously. Content of the configuration files in this case is as follows.

curves.con:

4
transient_c1.dat -1.00 P1 1 1.0 0 1.0 y
transient_c2.dat -1.00 P1 1 1.0 0 1.0 y
transient_c3.dat -1.00 P1 1 1.0 0 1.0 y
dummy.dat        -1.00 P1 1 1.0 0 1.0 y

subunits.con:

2
a.pdb Y N P1
b.pdb Y N P1

table.con:

0.0 0.0  
0.0 0.0  
0.0 0.0  
1.0 1.0

Complex with Excess of a Component in Solution

If an excess of subunit C is needed for formation of a stable complex ABC, resulting sample will be polydisperse. Content of the configuration files in this case is as follows.

curves.con:

2
mixture.dat -1.00 P1 1 1.0 0 1.0 y
dummy.dat   -1.00 P1 1 1.0 0 1.0 y

subunits.con:

3
a.pdb Y N P1
b.pdb Y N P1
c.pdb Y N P1

table.con:

0.0 0.0 0.0 
-1.0 -1.0 0.0

Presence of a Sub-Complex in the Mixture

By lack of subunit C (e.g. in a stoichiometric study), some amount of sub-complex AB migh be present in the mixture with the ternary ABC. Content of the configuration files is then as follows.

curves.con:

2
mixture.dat -1.00 P1 1 1.0 0 1.0 y
dummy.dat   -1.00 P1 1 1.0 0 1.0 y

subunits.con:

3
a.pdb Y N P1
b.pdb Y N P1
c.pdb Y N P1

table.con:

0.0 0.0 0.0 
0.0 0.0 -1.0

SASREF CV/MX manual

Table of Contents