The following describes the program DAMCLUST to cluster multiple models.
Details are given on how to run the program and what the
input and output files are.
The program DAMCLUST is applied for the post-processing of SAS-based models. It clusters
the models obtained in multiple ab initio low resolution reconstructions
(e.g. by DAMMIN and/or
GASBOR) or the results of rigid body
modelling (e.g. by SASREF/
BUNCH/CORAL).
The clustering algorithm is based on the approach described in Kelley, L. A., Gardner, S. P.
& Sutcliffe, M. J. (1996). Protein Eng.9, 1063-1065.
Clustered into groups containing similar models and comparison between their representatives
allows one to assess the abiguity of SAS-driven 3D modelling results.
Here, if a single INPUTFILE is provided, it is considered as a list
of N pdb file names (one per line) to be analyzed, or overwise
the multiple model file names are given directly in the command line.
OPTIONS for damclust are described in the following section.
On runtime, the following information is typically output to the screen
unless the quiet option is specified:
Number of files to cluster: ............................ : 11
t02.pdb vs t01.pdb ..................................... : 1.097
t03.pdb vs t01.pdb ..................................... : 1.139
t03.pdb vs t02.pdb ..................................... : 1.117
t04.pdb vs t01.pdb ..................................... : 1.208
...
t11.pdb vs t07.pdb ..................................... : 1.151
t11.pdb vs t08.pdb ..................................... : 1.432
t11.pdb vs t09.pdb ..................................... : 1.121
t11.pdb vs t10.pdb ..................................... : 1.219
-- Clustering --
Min average spread of non-isolated clusters ............ : 1.030
Max average spread of non-isolated clusters ............ : 1.204
-- Averaging --
Read file .............................................. : t10.pdb
Read file .............................................. : qwet04r.pdb
Read file .............................................. : qwet07r.pdb
-- Filtering --
Read file .............................................. : qwet10-avr.pdb
Number of atoms ........................................ : 959
Number of phases ....................................... : 1
Minimum number of contacts ............................. : 2
Maximum number of contacts ............................. : 12
Selected contact threshold ............................. : 4
Atomic radius .......................................... : 2.500
Excluded volume per atom ............................... : 88.45
Maximum radius ......................................... : 64.92
Average excluded volume ................................ : 0.0
Selected cut-off volume ................................ : 4.241e+4
Final contact threshold ................................ : 4
Final cut-off volume ................................... : 4.241e+4
Final number of atoms .................................. : 482
Final volume ........................................... : 4.263e+4
Wrote file ............................................. : qwet10-flt.pdb
-- Averaging --
Read file .............................................. : t02.pdb
Read file .............................................. : qwet11r.pdb
Read file .............................................. : qwet09r.pdb
Read file .............................................. : qwet06r.pdb
Read file .............................................. : qwet03r.pdb
Read file .............................................. : qwet01r.pdb
Read file .............................................. : qwet05r.pdb
-- Filtering --
Read file .............................................. : qwet02-avr.pdb
Number of atoms ........................................ : 1146
Number of phases ....................................... : 1
Minimum number of contacts ............................. : 3
Maximum number of contacts ............................. : 12
Selected contact threshold ............................. : 5
Atomic radius .......................................... : 2.750
Excluded volume per atom ............................... : 117.7
Maximum radius ......................................... : 66.18
Average excluded volume ................................ : 0.0
Selected cut-off volume ................................ : 6.745e+4
Final contact threshold ................................ : 5
Final cut-off volume ................................... : 6.745e+4
Final number of atoms .................................. : 574
Final volume ........................................... : 6.757e+4
Wrote file ............................................. : qwet02-flt.pdb
As a first step, all the models are compared pairwise using either
NSD or RMSD criteria. At the next stage,
the actual clustering is done based on the obtained distances between
the models. Finally, the models within each non-isolated cluster
(i.e. the one containing more than just one model) are averaged using
the DAMAVER approach.
The application creates a set of output files, where each filename starts
with a customizable prefix. If a particular prefix
has been used before, existing files will be overwritten without further
notification.
Cluster 11 ab initio models, t01.pdb, ..., t11.pdb.
The files are available in the documentation directory of the ATSAS package.
$ damclust t*.pdb -p qwe -t backbone
...
The program considers all the pdb files beginning with 't' in the working
directory and use the prefix 'qwe'. The superpositions are made using
the backbone atoms only (CA or P). Three clusters will be
found based on t08.pdb (isolated), t02.pdb and t10.pdb
The content of the resulting qwedamclust.log looks like this:
Models to cluster:
t01.pdb
t02.pdb
t03.pdb
t04.pdb
t05.pdb
t06.pdb
t07.pdb
t08.pdb
t09.pdb
t10.pdb
t11.pdb
Step number ............................................ : 2
Distance between merged clusters ....................... : 1.030
Average dist within new cluster ........................ : 1.030
Distance: Cluster members
0.0000: 1
0.0000: 2
0.0000: 3
0.0000: 5
0.0000: 6
0.0000: 7
0.0000: 8
0.0000: 9
1.0301: 10 4
0.0000: 11
Average spread of non-isolated clusters ................ : 1.030
Step number ............................................ : 3
Distance between merged clusters ....................... : 1.036
Average dist within new cluster ........................ : 1.036
Distance: Cluster members
0.0000: 1
0.0000: 3
0.0000: 5
1.0364: 6 2
0.0000: 7
0.0000: 8
0.0000: 9
1.0301: 10 4
0.0000: 11
Average spread of non-isolated clusters ................ : 1.033
Step number ............................................ : 4
Distance between merged clusters ....................... : 1.080
Average dist within new cluster ........................ : 1.065
Distance: Cluster members
0.0000: 1
0.0000: 3
0.0000: 5
0.0000: 7
0.0000: 8
1.0654: 9 6 2
1.0301: 10 4
0.0000: 11
Average spread of non-isolated clusters ................ : 1.048
Step number ............................................ : 5
Distance between merged clusters ....................... : 1.105
Average dist within new cluster ........................ : 1.085
Distance: Cluster members
0.0000: 1
0.0000: 5
0.0000: 7
0.0000: 8
1.0852: 9 6 2 3
1.0301: 10 4
0.0000: 11
Average spread of non-isolated clusters ................ : 1.058
Step number ............................................ : 6
Distance between merged clusters ....................... : 1.116
Average dist within new cluster ........................ : 1.098
Distance: Cluster members
0.0000: 5
0.0000: 7
0.0000: 8
1.0976: 9 6 2 3 1
1.0301: 10 4
0.0000: 11
Average spread of non-isolated clusters ................ : 1.064
Step number ............................................ : 7
Distance between merged clusters ....................... : 1.132
Average dist within new cluster ........................ : 1.098
Distance: Cluster members
0.0000: 5
0.0000: 8
1.0976: 9 6 2 3 1
1.0977: 10 4 7
0.0000: 11
Average spread of non-isolated clusters ................ : 1.098
Step number ............................................ : 8
Distance between merged clusters ....................... : 1.159
Average dist within new cluster ........................ : 1.118
Distance: Cluster members
0.0000: 5
0.0000: 8
1.0977: 10 4 7
1.1182: 11 9 6 2 3 1
Average spread of non-isolated clusters ................ : 1.108
Step number ............................................ : 9
Distance between merged clusters ....................... : 1.166
Average dist within new cluster ........................ : 1.132
Distance: Cluster members
0.0000: 8
1.0977: 10 4 7
1.1318: 11 9 6 2 3 1 5
Average spread of non-isolated clusters ................ : 1.115
Step number ............................................ : 10
Distance between merged clusters ....................... : 1.185
Average dist within new cluster ........................ : 1.155
Distance: Cluster members
0.0000: 8
1.1546: 11 9 6 2 3 1 5 10 4 7
Average spread of non-isolated clusters ................ : 1.155
Step number ............................................ : 11
Distance between merged clusters ....................... : 1.428
Average dist within new cluster ........................ : 1.204
Distance: Cluster members
1.2043: 11 9 6 2 3 1 5 10 4 7 8
Average spread of non-isolated clusters ................ : 1.204
#, Target: 2 11.000000000000000
#, Target: 3 10.163438099352419
#, Target: 4 9.9123836436851640
#, Target: 5 9.4226964087284255
#, Target: 6 8.7426749447360272
#, Target: 7 9.4893537087629056
#, Target: 8 9.0217354577394140
#, Target: 9 8.3748909486974661
#, Target: 10 9.4305501739640558
#, Target: 11 11.000000000000000
Best-Cut step .......................................... : 9
Cluster 1 (isolated): t08.pdb
Cluster 2 (representative, deviation): t10.pdb 1.0790361484879400
Cluster 3 (representative, deviation): t02.pdb 1.0905511694672707
Distances between the representatives
(Cluster1, Cluster2, Distance):
1 2 1.5301106568497673
1 3 1.3252443394045939
2 3 1.1638338063874381