Biological
Small Angle Scattering

ATSAS online | Forum | User information | EMBL Hamburg

MONSA manual

monsa

Written by D.I. Svergun. Contribution of M.V. Petoukhov.
Post all your questions about MONSA to the ATSAS Forum.

Manual
Example

Manual

The following sections shortly describe the method implemented in MONSA, usage in dialog mode as well as the required input and the produced output files.

MONSA implements the algorithm described by:

D.Svergun (1999). Restoring low resolution structure of biological macromolecules from solution scattering using simulated annealing. Biophys. J.. 76, 2879-2886.

Svergun, D.I. & Nierhaus, K.H. (2000) A map of protein-rRNA distribution in the 70S Escherichia coli ribosome. J. Biol. Chem. 275, 14432-14439.

The users are referred to these papers for details.

Introduction

MONSA is an extended version of DAMMIN for multiphase bead modelling which allows one to fit simultaneously multiple curves (e.g. from X-ray and/or neutron contrast variation series).

MONSA reads in multiple data sets and information about the contrasts and volume fractions of the phases in a particle. The program can simultaneously fit data recorded at different instrumental settings and also with different radiations (e.g. X-rays and neutrons). The structure of the input data is therefore somewhat complicated.

The program requires:

a MASTER file (file *.mst) containing the general phase information and references to CONTROL file(s);
CONTROL file(s) (*.con) containing the smearing information for the given setting, information about contrasts and references to DATA files (*.dat);
DATA files (*.dat), containing raw experimental data at different contrasts;
a PDB-like file defining the number of phases and the SEARCH VOLUME for the model.

Interactive Configuration

Screen Text	Default	Description
`Log file name:`	N/A	An identifier (up to six characters) to define all the output files names
`Project description:`	N/A	Text description of the problem
`Master file name:`	N/A	Name of the master file
`Maximum order of harmonics:`	`14`	The more harmonics, the more accurate the reconstruction becomes, but the slower the process. May be between 5 and 20
`DAM coordinates file name:`	N/A	Name of the Search Volume file generated by BODIES.
`Symmetry: Pn or Pn2 (n=1,2,3,4,5,6):`	`P1`	Specify the symmetry to enforce on the particle.
`Reset (unfix) all atoms [ Y / N ]:`	`No`	If 'Y', the phases indices allowed for the atoms in the pdb file are set to.
`Atomic radius:`	`var`	If the file is prepared by BODIES, the value is read from the file.
`Atomic volume:`	var	This is ( 4 / 3 ) π × r³ / 0.74 (volume per sphere for dense packing).
`Preference for non-solvent contacts:`	`0.3`	With a value of 0.0, the phase of the atom (solvent or protein) does not influence the looseness penalty weight. When this value is increased, non-solvent contacts are prefered, through the calculation of the looseness penalty weight. If unsure, use the default value.
`Looseness penalty weight:`	`50`	How much the Looseness Penalty shall influence the acceptance or rejection of phase changes. A value of 0.0 disables the penalty. If unsure, use the default value. If unlike smooth surfaces, sharp edges are observed, try decreasing this penalty weight.
`Discontiguity penalty weight:`	`50`	How much the Discontiguity Penalty shall influence the acceptance or rejection of phase changes. A value of 0.0 disables the penalty. If unsure, use the default value.
`Randomize the initial DAM [ Y / N ]`	`Yes`	If 'Y', the starting model is randomized
`Fix the overall scale factor [ Y / N ]`	`No`	If No (recommended), then the overall scale factor, as well as individual relative scale factors for all the data sets will be determined automatically. If the scale factor is known (data on absolute scale) in may be fixed and entered manually.
`Volume fraction penalty weight`	`50`	How much the Volume Fraction Penalty should influence the acceptance or rejection of phase changes.
`Rg penalty weight`	`0.0`	How much the radius of gyration penalty should influence the acceptance or rejection of phase changes. A value of 0.0 disables the penalty.
`Center penalty weight (negative = WeiPer):`	`0.0`	How much the Center Penalty shall influence the acceptance or rejection of phase changes. A value of 0.0 disables the penalty. If unsure, use the default value.
`Initial annealing temperature :`	`10`	If the value is too high, it could take ages for the system to cool down. If the value is too low, the system can be trapped in a local minimum. If unsure use the default value.
`Annealing schedule factor :`	`0.9`	Factor by which the temperature is decreased; 0.95 is a good average value. Faster cooling for smaller systems is possible (0.9), but slower cooling (0.99) needs to be applied more often.
`Max # of iteration at each T:`	`var`	Finalize temperature step and cool after this many iterations at the latest.
`Max # of successes at each T:`	`var`	Finalize temperature step and cool after at most this many successful phase changes.
`Min # of successes to continue:`	`var`	Stop if not at least this many successful state changes within a single temperature step can be done.
`Number of annealing steps:`	`100`	Stop after this number of steps if did not cooled down before.
`Plot the final fits [ Y / N ]:`	`No`	Display the final fits.

Runtime Output

On runtime, two lines of output will be generated for each temperature step:

jAnn: 1 T: 0.100E+02 iSuc: 11718 nEva: 12542 CPU: 0.4056E+02

SqfVal: 22.8539 Rf: 22.25999 Los: 0.1312 Dis: 0.0464 Sca: 0.342E+01

The fields can be interpreted as follows, top-left to bottom-right:

Field	Description
`jAnn`	Step number. Starts at 1, increases monotonically.
`T`	Temperature measure, starts at an arbitrary high value, decreases each step by the temperature schedule factor
`iSuc`	Number of successful phase changes in this temperature step. The number of successes should slowly decrease, the first couple of steps should be terminated by the maximum number of successes criterion. If instead the maximum number of iterations per step are done, or the number of successes drops suddenly by a large amount, the system should probably be cooled more slowly.
`nEva`	Accumulated number of function evaluations.
`CPU`	Elapsed CPU cycles since the annealing procedure was started.
`SqfVal`	Goodness of the model (fit + penalties).
`Rf`	Goodness of fit of simulated data versus experimental data, does not take penalties into account.
`Los`	Contribution of Looseness Penalty, not taking the Looseness Penalty Weight into account.
`Dis`	Contribution of Discontiguity Penalty, not taking the Discontiguity Penalty Weight into account.
`Sca`	Scale factor

MONSA Input Files

Master File

The master file contains the general phase information: volumes of the different phases, radii of gyration, connectivity etc. It has the following structure:

 Line 1    Title (up to 80 characters)
 Line 2    Four theoretical volumes
           of individual phases (required)
 Line 3    Four theoretical radii of gyration in Ångström (even if your data are in nm-1)
           of individual phases (optional)
 Line 4    Connectivity indicators of phases (required):
           '1' for 'interconnected', '0' for 'disconnected', '-1' for 'symmetry defined'
 Line 5    Control file name and Npts for Guinier fit
           (no fit if the latter is equal to '-1')
  ... OPTIONAL ...
 Line 6    Control file name and Npts for Guinier fit
           (no fit if the latter is equal to '-1')
...

  etc      Erroneous lines skipped; read to the end

The program works with up to four-component particles. If the number of components (phases) is less than four, just put zeroes for the values required for this phase.

Control File

The control file contains the smearing information for the given setting, information about contrasts and references to the data file. It has the following structure:

 Line 1    Resolution file name, resolution setting number (free format)
 Line 2    Output file name for the fits (not used)        (free format)
 Line 3    Title                                           (character*80)
 Line 4    Number of points in the setting                 (free format)
           (put negative number to indicate nm-1 as angular units)
 Line 5    Data file name, contrasts and constants         (free format)
  etc      Erroneous lines skipped; read to the end

The information about the data sets is given in the format:

Filename    Dro1        Dro2       Dro3       Dro4      Mult  Const   Weight

Field	Description
`Filename`	Filename of the scattering pattern (up to 15 characters).
`Dron`	Contrast of the nth phase.
`Mult`	The scattering pattern is multiplied by this factor after constant substraction.
`Const`	Constant subtracted to the scattering pattern.
`Weight`	Relative weight of the data set.

Smearing

If required, MONSA smears the theoretical curves using the resolution function introduced by J. Skov Pedersen et al. (1990), J. Appl. Cryst., 23, 321. Several subroutines for data smearing are provided by J. Skov Pedersen and modified for the use in MONSA. The resolution file must have the following format (the numbers describe a setting at RISOE SANS instrument):

Row	Value	Description
1	`0.8`	Effective collimation slit diameter in cm.
2	`0.35`	Effective sample diameter in cm.
3	`300`	Collimation distance in cm.
4	`105`	Sample-detector distance in cm.
5	`3`	λ in Å
6	`0.18`	δ(λ)/λ
7	`1.1`	Pixel size in cm.
8	`0.0000`	Averaging error (accounted for in Pixel size).

Diagram of a SANS instrument showing the lengths required for the ill.res file

If the file is corrupted or does not exist, no smearing is performed. An example of the resolution file is given below. The resolution setting number is the number of column in the resolution file.

 0.00001, 0.00001,   0.00001 , 0.8    , 0.8
 0.00001, 0.00001,   0.00001 , 0.30   , 0.35
 1100.  , 200.   ,    100.   , 300.   , 100
  180.  , 125.   ,    100.   , 110.   , 100
  6.0   ,  5.6   ,     1.    ,  3.22  , 6.
  0.10  ,  0.09  ,    0.01   , 0.18   , 0.18
 0.0001 ,  1.57  ,    0.01   , 1.1    , 1.1
 0.0000 , 0.0000 ,    0.0000 , 0.0000 , 0.0000

Data Files

The experimental data files must have the following structure:

            1st line     - comment
            2nd line etc - s, I(s), Err(s) in free format

where s = 4 × π × sin ( θ ) / λ in Å^-1, I(s) is the experimental intensity and Err(s) is the standard deviation

Search Volume File

The input file defining the search volume is a PDB-like file containing the coordinates of dummy atoms with the extra "phase" information telling to which phase the atom belongs. The file looks like this:

0         1         2         3         4         5         6         7
01234567890123456789012345678901234567890123456789012345678901234567890123456

ATOM      1  CA  ASP    1      -17.000 -16.957-101.666  1.00 20.00   3 3012
ATOM      2  CA  ASP    1      -17.000   -.957-101.666  1.00 20.00 1 3 3012
ATOM      3  CA  ASP    1      -17.000  15.043-101.666  1.00 20.00 0 1 3012
ATOM      4  CA  ASP    1       -1.000 -16.957-101.666  1.00 20.00   2 3012
ATOM      5  CA  ASP    1       -1.000   -.957-101.666  1.00 20.00   1 202

The characters 1 to 65 in a line are as in a normal PDB file.

Column 67 (iCore): if '1', the phase of this atom is fixed and will not be changed during the search ("core atom"); if iCore='0' or ' ', the phase of the atom is free to change. The core indicators may be re-computed automatically when loading the model to that iCore will be put to 1 ff an atom is surrounded by the atoms of the same phase only. In this case, the program will change interface atoms. This option may be useful if a preliminary model is available.

Column 69(iPhas): is the phase indices of the atom (ordinal number in the iAllo array).

Column 71 (nAllo): is the number of phases allowed for the given atom.

Columns 72 etc ( iAllo): are the indices of the allowed phases such that iAllo(iPhas) is the phase of the atom.

This system allows one, if required, to select the phases which can occupy any given point. In the above example of a two-phase system

Atom 1: free  atom of phase 2
Atom 2: fixed atom of phase 2
Atom 3: free  atom of phase 0 (solvent)
Atom 4: free  atom of phase 1
Atom 5: free  atom of phase 0 (solvent; could be only solvent or phase 2)

In most cases, however, the user does not need to learn the structure of this file. A program BODIES is available to generate an ellipsoidal (or spherical) search volume for the given number of phases and given number of dummy atoms. In a general case, one can always use the spherical search volume with the diameter equal to Dmax, as in DAMMIN. MONSA will automatically calculate the number of phases in the search model when reading this file. The number of dummy atoms in the search volume must not exceed 10000!

The distribution package includes an example of a batch file containing the required answers. Typing monsa144 < test.ans will run the structure determination for the supplied example in the batch mode (may take a day of CPU on a PC!).

The example is taken from the article (a 30S ribosomal subunit-like particle with simulated proteins inside). The model is given in the file model.pdb (phase 1 - proteins, phase 2 - RNA), the initial search volume in the file sph105-2.pdb (a sphere with diameter 210A, two-phase system; generated by BODIES). The scattering curves *.dat are computed from this model (see above example of test control data file) and randomized.

NOTE that for any solution obtained using this method, an enantiomorph would yield the same scattering patterns! It was also observed (quite seldom) in test examples that one phase was enantiomorphous whereas the others not.

MONSA Output Files

With each successful run, MONSA creates a set of output files, each filename starts with a customizable prefix that gets an extension appended. If a prefix has been used before, existing files will be overwritten without further note.

Extension	Description
`.log`	Contains the same information as the screen output and is updated during execution of the program.
`-0.pdb`	This pdb file contains the beads of the solvent (a.k.a. the search volume).
`-1.pdb : -n.pdb`	These pdb files contain the beads of each individual phases.
`.pdb`	This pdb file contains the beads of all the phases and the solvent (a.k.a. the search volume). The beads of the different phases and the solvent are distinguished by their chain number. The header of the file contains information about the application used and about invariants of the particle, e.g. R_g, volume and molecular mass of the particle.
`.fit`	Fit of the simulated scattering curve versus a smoothed-out version of the real-data. Columns in the output file are: '`s`', '`c.I_exp`', '`c.ErrI_exp`' and '`I_FIT`'.

Generating a Search Volume

In previous releases two helper applications, DAMESV and DAMEMB, were included to generate suitable search volumes for MONSA. This functionality was integrated into the search-volume mode of BODIES.

Example

Master file for the test example: contrast variation simulated data of a 30S ribosomal subunit-like particle consisting of "RNA" (phase 2, density = 4.0) with some "proteins" inside (phase 1; density = 2.0)

Master file for quazi-30S model randomized data to s=0.2
 3.7e5   8.7e5    0.00  0.0              ! Desired Volumes
 49.0     61.0    0.00  0.0              ! Desired Rgs
  0        1      0      0               ! Connectivity
'test.con'    10                         ! Control file name; Rgs will be
                                         ! computed from 10 first points

Control file for the test example

  'Point collimation'   1                             !! No smearing
  'test.fit'                                          !! Output fits
  Test for 30S -- use randomized data up to 0.2       !! Title
   98                                                 !! Number of points
'0r1.dat'    2.00       4.00       0.00     0.00      1.000    0.0    1.00    0
'2r1.dat'    0.00       2.00       0.00     0.00      1.000    0.0    1.00    0
'4r1.dat'   -2.00       0.00       0.00     0.00      1.000    0.0    1.00    0
'6r1.dat'   -4.00      -2.00       0.00     0.00      1.000    0.0    1.00    0
'infr1.dat'  1.00       1.00       0.00     0.00      1.e-6    0.0    1.00    0

Here, the data sets '?r1.dat' correspond to the scattering patterns from the test body in solvents with density 0.0, 2.0, 4.0, 6.0. The set 'infr1.dat' corresponds to "shape scattering" (infinite contrast). Note that the test would have worked also without the 'infinite contrast' data. Please note:

filename should be given in quotes (up to 15 characters);
put zeroes as contrasts for phases, which are not present;
all files in the setting MUST have the same number of points and the same angular axis; if you have data set(s) on another angular grid(s), put them as another setting(s);
from each data set, a constant "Const" will be subtracted and the result will be multiplied by "Mult";
the data sets will be weighted with the relative weight "Weight" in the total discrepancy; reducing the weight is equivalent to increasing errors in the data file;
number of points must not exceed 2048. Choose the value, so that the maximal s value becomes 2.5 nm^-1.

Example of input data

 Randomized data,   RELERR=  3.00 %, file 0.dat         12-NOV-1998   13:22:35
   .600000E-02   .176494E+14   .504240E+12
   .800000E-02   .168392E+14   .486090E+12
   .100000E-01   .159999E+14   .463710E+12
.....  SKIPPED FOR BREVITY ......
   .194000E+00   .628594E+10   .184596E+09
   .196000E+00   .582946E+10   .179298E+09
   .198000E+00   .591612E+10   .173796E+09
   .200000E+00   .570405E+10   .168174E+09

After the configuration, the program computes the parameters for the initial state and the simulated annealing procedure starts:

     ---  Starting values  ---
  Total scale factor      :    3.51404919007708
  Function value          :    733.688635068192
  Overall discrepancy     :    696.618264908644
  SQRT(Overall discr.)    :    26.3935269509144
  DAM looseness           :   0.137137795235494
  DAM discontiguity       :   6.681519817391703E-002
  Overall penalty         :    37.0703701595471
 jAnn:   1  T: 0.100E+02  iSuc: 11718  nEva:    12513  CPU:  0.4555E+02
  SqfVal: 23.3509  Rf: 22.69190  Los: 0.1314 Dis: 0.0517  Sca: 0.338E+01
 jAnn:   2  T: 0.900E+01  iSuc: 11718  nEva:    25119  CPU:  0.9059E+02
  SqfVal: 22.7818  Rf: 22.15299  Los: 0.1243 Dis: 0.0272  Sca: 0.341E+01
 jAnn:   3  T: 0.810E+01  iSuc: 11718  nEva:    37867  CPU:  0.1366E+03
  SqfVal: 22.5775  Rf: 22.01942  Los: 0.1295 Dis: 0.0268  Sca: 0.327E+01
 jAnn:   4  T: 0.729E+01  iSuc: 11718  nEva:    50732  CPU:  0.1830E+03
  SqfVal: 22.5775  Rf: 22.01942  Los: 0.1295 Dis: 0.0268  Sca: 0.327E+01
 jAnn:   5  T: 0.656E+01  iSuc: 11718  nEva:    63648  CPU:  0.2309E+03
  SqfVal: 22.5775  Rf: 22.01942  Los: 0.1295 Dis: 0.0268  Sca: 0.327E+01
 jAnn:   6  T: 0.590E+01  iSuc: 11718  nEva:    76727  CPU:  0.2778E+03
  SqfVal: 22.3977  Rf: 21.72769  Los: 0.1368 Dis: 0.0467  Sca: 0.330E+01
 jAnn:   7  T: 0.531E+01  iSuc: 11718  nEva:    89852  CPU:  0.3235E+03
  SqfVal: 22.2560  Rf: 21.66409  Los: 0.1292 Dis: 0.0197  Sca: 0.329E+01
 jAnn:   8  T: 0.478E+01  iSuc: 11718  nEva:   103078  CPU:  0.3704E+03
  SqfVal: 21.9930  Rf: 21.34937  Los: 0.1377 Dis: 0.0354  Sca: 0.322E+01