0
EMBL Hamburg Biological
Small Angle Scattering
BioSAXS
 

DAMCLUST manual

damclust

Written by M. Petoukhov.
Post all your questions about DAMCLUST to the ATSAS Forum.

© ATSAS Team, 2009-2012

Table of Contents

Manual

The following describes the program DAMCLUST to cluster multiple models. Details are given on how to run the program and what the input and output files are.

If you use results from DAMCLUST in your own publication, please cite: Petoukhov, M.V., Franke, D., Shkumatov, A.V., Tria, G., Kikhney, A.G., Gajda, M., Gorba, C., Mertens, H.D.T., Konarev, P.V. and Svergun, D.I. (2012)
New developments in the ATSAS program package for small-angle scattering data analysis. J. Appl. Cryst. 45, 342-350 © International Union of Crystallography DOI

Introduction

The program DAMCLUST is applied for the post-processing of SAS-based models. It clusters the models obtained in multiple ab initio low resolution reconstructions (e.g. by DAMMIN and/or GASBOR) or the results of rigid body modelling (e.g. by SASREF/ BUNCH/CORAL). The clustering algorithm is based on the approach described in Kelley, L. A., Gardner, S. P. & Sutcliffe, M. J. (1996). Protein Eng. 9, 1063-1065. Clustered into groups containing similar models and comparison between their representatives allows one to assess the abiguity of SAS-driven 3D modelling results.

In general case, the distances beteween the individual models/clusters (clustering criteria) are expressed in terms of a normalized spatial discrepancy (NSD, see also Kozin & Svergun (2001) Automated matching of high- and low-resolution structural models. J. Appl. Cryst. 34, 33-41). For rigid body models (or arbitrary atomic structures with one-to-one correspondence) clustering can also be done based on the R.M.S.D. values between the atomic coordinates.

Running DAMCLUST

Usage:

$ damclust <INPUTFILE(S)> [OPTIONS]

Here, if a single INPUTFILE is provided, it is considered as a list of N pdb file names (one per line) to be analyzed, or overwise the multiple model file names are given directly in the command line. OPTIONS for damclust are described in the following section.

Command-Line Arguments and Options

Damclust recognizes the following command-line options.

Short optionLong optionDescription
-p --prefix [PREFIX] Prefix to prepend to output filenames. May include absolute or relative paths, all directory components must exist.
-a --atomic Consider input files as atomic models, use RMSD instead of NSD
-t --type <TYPE> Atomic types used for distance calculation, one of 'ALL' (default) or 'BACKBONE'.
-s --symmetry=<PXY> where PXY is one of: Pn (n=1, ..., 19), Pn2 (n=2, ..., 12), default: P1
--superposition=<S> Either ALL (to superpose the whole molecule) or BACKBONE to consider just the backbone atoms (default: ALL).
-e --enantiomorphs=<YES|NO> Also search enantiomorphs, either YES or NO (default: YES).
-q --quiet Suppress screen output. By default, the runtime information is printed to screen.
-v --version Print version information and exit.
-h --help Print a summary of arguments and options and exit.

Runtime Output

On runtime, the following information is typically output to the screen unless the quiet option is specified:

 Number of files to cluster: ............................ : 11
 t02.pdb vs t01.pdb ..................................... : 1.097
 t03.pdb vs t01.pdb ..................................... : 1.139
 t03.pdb vs t02.pdb ..................................... : 1.117
 t04.pdb vs t01.pdb ..................................... : 1.208
 ...
 t11.pdb vs t07.pdb ..................................... : 1.151
 t11.pdb vs t08.pdb ..................................... : 1.432
 t11.pdb vs t09.pdb ..................................... : 1.121
 t11.pdb vs t10.pdb ..................................... : 1.219

  -- Clustering --
 Min average spread of non-isolated clusters ............ : 1.030
 Max average spread of non-isolated clusters ............ : 1.204

  -- Averaging --
 Read file .............................................. : t10.pdb
 Read file .............................................. : qwet04r.pdb
 Read file .............................................. : qwet07r.pdb
  -- Filtering --
 Read file .............................................. : qwet10-avr.pdb
 Number of atoms ........................................ : 959
 Number of phases ....................................... : 1
 Minimum number of contacts ............................. : 2
 Maximum number of contacts ............................. : 12
 Selected contact threshold ............................. : 4
 Atomic radius .......................................... : 2.500
 Excluded volume per atom ............................... : 88.45
 Maximum radius ......................................... : 64.92
 Average excluded volume ................................ : 0.0
 Selected cut-off volume ................................ : 4.241e+4
 Final contact threshold ................................ : 4
 Final cut-off volume ................................... : 4.241e+4
 Final number of atoms .................................. : 482
 Final volume ........................................... : 4.263e+4
 Wrote file ............................................. : qwet10-flt.pdb

  -- Averaging --
 Read file .............................................. : t02.pdb
 Read file .............................................. : qwet11r.pdb
 Read file .............................................. : qwet09r.pdb
 Read file .............................................. : qwet06r.pdb
 Read file .............................................. : qwet03r.pdb
 Read file .............................................. : qwet01r.pdb
 Read file .............................................. : qwet05r.pdb
  -- Filtering --
 Read file .............................................. : qwet02-avr.pdb
 Number of atoms ........................................ : 1146
 Number of phases ....................................... : 1
 Minimum number of contacts ............................. : 3
 Maximum number of contacts ............................. : 12
 Selected contact threshold ............................. : 5
 Atomic radius .......................................... : 2.750
 Excluded volume per atom ............................... : 117.7
 Maximum radius ......................................... : 66.18
 Average excluded volume ................................ : 0.0
 Selected cut-off volume ................................ : 6.745e+4
 Final contact threshold ................................ : 5
 Final cut-off volume ................................... : 6.745e+4
 Final number of atoms .................................. : 574
 Final volume ........................................... : 6.757e+4
 Wrote file ............................................. : qwet02-flt.pdb

As a first step, all the models are compared pairwise using either NSD or RMSD criteria. At the next stage, the actual clustering is done based on the obtained distances between the models. Finally, the models within each non-isolated cluster (i.e. the one containing more than just one model) are averaged using the DAMAVER approach.

DAMCLUST Input Files

If an input file for DAMCLUST is specified, it shall contain a list of PDB files with the following format:

filename1.pdb
filename2.pdb
[...]
filenameN.pdb

Alternatively, the input file names of the models in pdb format can be specified from the command line by running

$ damclust filename1.pdb filename2.pdb [...] filenameN.pdb

DAMCLUST Output Files

The application creates a set of output files, where each filename starts with a customizable prefix. If a particular prefix has been used before, existing files will be overwritten without further notification.

ExtensionDescription
damclust.log Contains the information about the clustering.
<filenameI>r.pdb A model filenameI.pdb superimposed with the representative (most typical model, filenameJ.pdb) of the corresponding cluster.
<filenameJ>-avr.pdb Total occupied volume of the cluster.
<filenameJ>-flt.pdb Average model of the cluster.

Example

Cluster 11 ab initio models, t01.pdb, ..., t11.pdb. The files are available in the documentation directory of the ATSAS package.

$ damclust t*.pdb -p qwe -t backbone 
 ...

The program considers all the pdb files beginning with 't' in the working directory and use the prefix 'qwe'. The superpositions are made using the backbone atoms only (CA or P). Three clusters will be found based on t08.pdb (isolated), t02.pdb and t10.pdb The content of the resulting qwedamclust.log looks like this:


Models to cluster:
 t01.pdb
 t02.pdb
 t03.pdb
 t04.pdb
 t05.pdb
 t06.pdb
 t07.pdb
 t08.pdb
 t09.pdb
 t10.pdb
 t11.pdb

 Step number ............................................ : 2
 Distance between merged clusters ....................... : 1.030
 Average dist within new cluster ........................ : 1.030
 Distance: Cluster members
   0.0000:  1
   0.0000:  2
   0.0000:  3
   0.0000:  5
   0.0000:  6
   0.0000:  7
   0.0000:  8
   0.0000:  9
   1.0301: 10  4
   0.0000: 11
 Average spread of non-isolated clusters ................ : 1.030

 Step number ............................................ : 3
 Distance between merged clusters ....................... : 1.036
 Average dist within new cluster ........................ : 1.036
 Distance: Cluster members
   0.0000:  1
   0.0000:  3
   0.0000:  5
   1.0364:  6  2
   0.0000:  7
   0.0000:  8
   0.0000:  9
   1.0301: 10  4
   0.0000: 11
 Average spread of non-isolated clusters ................ : 1.033

 Step number ............................................ : 4
 Distance between merged clusters ....................... : 1.080
 Average dist within new cluster ........................ : 1.065
 Distance: Cluster members
   0.0000:  1
   0.0000:  3
   0.0000:  5
   0.0000:  7
   0.0000:  8
   1.0654:  9  6  2
   1.0301: 10  4
   0.0000: 11
 Average spread of non-isolated clusters ................ : 1.048

 Step number ............................................ : 5
 Distance between merged clusters ....................... : 1.105
 Average dist within new cluster ........................ : 1.085
 Distance: Cluster members
   0.0000:  1
   0.0000:  5
   0.0000:  7
   0.0000:  8
   1.0852:  9  6  2  3
   1.0301: 10  4
   0.0000: 11
 Average spread of non-isolated clusters ................ : 1.058

 Step number ............................................ : 6
 Distance between merged clusters ....................... : 1.116
 Average dist within new cluster ........................ : 1.098
 Distance: Cluster members
   0.0000:  5
   0.0000:  7
   0.0000:  8
   1.0976:  9  6  2  3  1
   1.0301: 10  4
   0.0000: 11
 Average spread of non-isolated clusters ................ : 1.064

 Step number ............................................ : 7
 Distance between merged clusters ....................... : 1.132
 Average dist within new cluster ........................ : 1.098
 Distance: Cluster members
   0.0000:  5
   0.0000:  8
   1.0976:  9  6  2  3  1
   1.0977: 10  4  7
   0.0000: 11
 Average spread of non-isolated clusters ................ : 1.098

 Step number ............................................ : 8
 Distance between merged clusters ....................... : 1.159
 Average dist within new cluster ........................ : 1.118
 Distance: Cluster members
   0.0000:  5
   0.0000:  8
   1.0977: 10  4  7
   1.1182: 11  9  6  2  3  1
 Average spread of non-isolated clusters ................ : 1.108

 Step number ............................................ : 9
 Distance between merged clusters ....................... : 1.166
 Average dist within new cluster ........................ : 1.132
 Distance: Cluster members
   0.0000:  8
   1.0977: 10  4  7
   1.1318: 11  9  6  2  3  1  5
 Average spread of non-isolated clusters ................ : 1.115

 Step number ............................................ : 10
 Distance between merged clusters ....................... : 1.185
 Average dist within new cluster ........................ : 1.155
 Distance: Cluster members
   0.0000:  8
   1.1546: 11  9  6  2  3  1  5 10  4  7
 Average spread of non-isolated clusters ................ : 1.155

 Step number ............................................ : 11
 Distance between merged clusters ....................... : 1.428
 Average dist within new cluster ........................ : 1.204
 Distance: Cluster members
   1.2043: 11  9  6  2  3  1  5 10  4  7  8
 Average spread of non-isolated clusters ................ : 1.204

 #, Target:            2   11.000000000000000
 #, Target:            3   10.163438099352419
 #, Target:            4   9.9123836436851640
 #, Target:            5   9.4226964087284255
 #, Target:            6   8.7426749447360272
 #, Target:            7   9.4893537087629056
 #, Target:            8   9.0217354577394140
 #, Target:            9   8.3748909486974661
 #, Target:           10   9.4305501739640558
 #, Target:           11   11.000000000000000
 Best-Cut step .......................................... : 9

 Cluster  1 (isolated): t08.pdb
 Cluster  2 (representative, deviation): t10.pdb   1.0790361484879400
 Cluster  3 (representative, deviation): t02.pdb   1.0905511694672707

 Distances between the representatives
       (Cluster1, Cluster2, Distance):
           1           2   1.5301106568497673
           1           3   1.3252443394045939
           2           3   1.1638338063874381


  Last modified: July 18, 2017

© BioSAXS group 2017