Determination of optimal X-ray data collection strategy for protein crystals

User's manual for Version 3.1 /29.01.2007/


General Installation
CCP4I interface
Program output Command line interface




BEST is a program for optimal planning of X-ray data collection from protein crystals. The method employed in the program is based on the modelling of statistical characteristics of the data yet to be collected using the information derived from a few initial images. Diffraction anisotropy, crystal radiation damage, anomalous, geometrical restrictions (e.g. spot overlapping) and hardware limitations is taken into account. The functions of the program may be summarised as follows:

Before run

One or (better) two initial images at 90 degrees have to be measured and processed by XDS, DENZO or MOSFLM.

How to run

BEST can be run using the CCP4I , choose module "Data Collection".
BEST can also run from the command line.


All BEST predictions are limited by the resolution of initial images. BEST assumes that the exposed volume of crystal does not strongly vary with the spindle axis position. The program also believes that initial images were processed correctly, the space group is known and crystal mosaicity is estimated properly. The detector is considered to have a circular area with the direct beam projecting onto the centre. No detector offsets, linear or 2-theta , are supported. The detector must be described in the file.

Author information

Users are welcome to report any bugs or suggested changes to the authors:
Alexander N. Popov EMBL Hamburg Outstation,
c/o DESY, Notkestrasse 85,
22603 Hamburg, Germany
Tel. +49-40-89902-183, Fax +49-40-89902-149,
Gleb P. Bourenkov
EMBL Hamburg Outstation,
c/o DESY, Notkestrasse 85,
22603 Hamburg, Germany
Tel. +49-40-89902-120, Fax +49-40-89902-149,


If you use BEST please refer to one of these papers:
A.N. Popov and G.P. Bourenkov "Choice of data-collection parameters based on statistic modelling" Acta Cryst. (2003). D59, 1145-1153
G.P. Bourenkov and A.N. Popov "A quantitative approach to data-collection strategies" Acta Cryst. (2006). D62, 58-64


Configuring best

The latest version of BEST can be downloaded from here and installed following the instructions therein.
Following environment is compulsory:
Following environment variables are optional and only useful when BEST is used with data processed by HKL/DENZO:

Configuring detectors

The file $besthome/detector-inf.dat contains specific information on the area detectors used by BEST. In the distributed file the examples are pre-configured for most commonly used area detectors (ADSC, MAR, RAXIS). Nevertheless, the exact values of the parameters (like exact pixel size) should be re-defined for every particular physical detector.

plotmtv program

BEST generates some useful graphs which can be displayed using plotmtv program.
plotmtv is a multipurpose X11 plotting program, freely available on the net for all X11-capable platforms.
Provided plotmtv is in the path, BEST can invoke it automatically (see Show Graphs ).

CCP4I interface to BEST

Run BEST to: Optimize data collection or Estimate data statistics
Optimize data collection Supplies optimal plan of data collection and statistical predictions according to the plan
Estimate data statistics Estimate data statistics according to a set of data collection parameters defined by the user
Show Graphs The graphs generated by BEST will be displayed (requires plotmtv)

Input data

Input from XDS or HKL/DENZO or MOSFLM
Change Symmetry
BEST can optionally calculate strategy in a different symmetry as compared to the space group used in processing reference data. Note that this is only possible if the unit cell parameters (in a standard setting exactly) and the lattice type permit this operation. E.g., I432 can be reduced to I23, I422, I4, I222 but not to P2 or to P1), P3 can be increased to P321, P312, P6, P622, or reduced to P1, but not to P2. Otherwise, please re-index. Note that, e.g., HKL2000 would always write a lowest possible space group of a syngony to the .x-file.

Reference image parameters
ESRF .xml in The reference image parameters can optionally be loaded from the xml file. Following tags are resolved: Detector
Name of the detector is in use. In addition to the type of detector, the name encodes unique properties of each device and/or data format (see detector-inf.dat file).
Exposure time
Exposure time of initial image(s) in seconds.
Pre-set counts
If exposure is timed by counts ("dose mode" in some data collection software), the pre-set counts (kHz-sec) should be specified in addition to equivalent time in seconds. In the output data collection plan the exposure time per image will be replaced by the pre-set counts per image.
If 'multi-read' (or 'dezingering') option was used for measuring the image(s), give a number of detector read outs.

Radiation damage parameters

BEST uses a simple parametric model to describe diffraction intensity variation as a function of X-ray dose. The model takes into account both overall decay, i.e. resolution-dependent falloff in <I>(D) as a function of X-ray dose D, and radiation-induced non-isomorphism, i.e. the increase in the <|I(D1)-I(D2)|> for symmetry-equivalent reflections measured after different cumulative doses D1 and D2. If radiation damage correction is toggled on, BEST will calculate the strategy in such a way that the scan speed is steadily increased in order to compensate the falloff in <I>(D) and make <I/SigI>(D) in the last resolution shell approximately constant with D. For a given rotation range (e.g. providing a complete data set), such a compensation is only possible down to a certain resolution limit and <I/SigI> value. This method necessarily requires collecting consecutive wedges of data with different exposure times, so the resulting data collection plans are complicated (see Output plan parameters ). Radiation-induced non-isomorphism is taken into account in calculating predicted statistical characteristics of the data like Chi2, Rmerge and Ranom (see Program output).
enables the dose rate calculations using RADDOSE (Courtesy E. Garman and R. Ravelli). Requires the crystal composition, photon flux, beam and crystal dimensions to be known. Fill in (self-explaining) required data fields, press "Run RADDOSE" button. The Dose rate and Shape Factor fields will be filled in automatically.
Note: Complete log file of the last RADDOSE run will appear under CCP4I "View files from job" pull-down menu only after BEST job has been started.

Dose rate
X-ray dose rate in Gray/second. This is defined by the photon flux density and crystal composition.
Shape factor
Equals 1 if the crystal is smaller then the beam, or equals ratio of crystal size to the beam size (in the direction normal to the rotation axis) if beam is smaller then crystal.
The default value of 1 means that average decay rate as determined empirically on fairly large number of structures will be applied. Increase (up to 2) if you know that your samples are significantly more sensitive to radiation then others, or reduce (down to 0.5) for radiation-hard crystals (like trypsin).

Optimize data collection

Major optimization parameters

Target <I/SigI> in the last shell.
This is compulsory input. The strategy will be calculated as to provide required <I/SigI> (for merged data ) in the last resolution shell. Choose a value of e.g. 2 for collecting high resolution data (to be used for structure refinement). Higher value, e.g. 10 to 20, may require for anomalous scattering phasing, depending on the strength of anomalous signal etc. There is a special option max for target <I/SigI> (see the notes on maximizing <I/SigI> bellow). Anomalous data button toggles the Friedel law on/off in all calculations (data completeness and redundancy) as well as the calculation of a (special) Rano merging statistics (see Output section ).
Maximum resolution (Å)
This is compulsory input. It defaults to the resolution limit dminref at the edge of detector as for the reference frames. No calculations will be done at the resolution higher then dminref. If the value less then dminref is input, it will be reset to dminref in BEST. Changing this value to anything > dminref will enforce strategy calculations down to the resolution not higher then that the input value. I.e. the output strategy will be calculated for the requested resolution if this can be reached at all (subject to radiation damage, total exposure time / measurement time limit, and/or dynamic range of the detector). If requested <I/SigI> may not be achieved at the resolution limit specified, BEST will define the resolution limit automatically. In any case, BEST will report on how the resolution limit has been chosen in the log file.
Minimize exposure/measurement time, and limit it to {value} {units}
Choosing exposure will make BEST minimizing the total exposure time, choosing measurement - total measurement time (i.e. including the detector readout+overhead time). The input time limit will affect the choice of the resolution limit. With radiation damage correction enabled, the minimization option should stay on exposure, and usually there is no need to input the time limit. BEST will automatically find the limiting exposure time that does not degrade the data quality. You may still specify the time limit if necessary. If radiation damage correction is disabled and no time limit is given, BEST will calculate the strategy for the resolution limit specified regardless of how long the data collection will take (not recommended). If the time limit is given, BEST will determine the resolution limit according to it (only if the data collection with requested <I/SigI> down to dminref may not be done faster, of course).
Note on maximizing <I/SigI> Typing max in the Target <I/SigI> field will switch on the optimization of <I/SigI> at a maximum resolution, subject to radiation damage (when specified), time limits (when specified) or the dynamic range of detector. This may result in data collections strategies that are not really useful - e.g. ridiculously low <I/SigI> at the resolution limit, or extreme overloads problems at low resolution. Please analyze the output carefully before using strategies created in this way!

Rotation range parameters

By the default, the optimal (with respect to the spot overlaps and data statistics) total rotation range is chosen automatically to provide not less then 99% data completeness. If increasing the redundancy is necessary to achieve requested <I/SigI> , or if it is more efficient as compared to increasing the exposure time, BEST will extend the rotation range automatically. You may affect this behaviour by specifying target completeness or redundancy, or specifying the rotation range you want to measure explicitly. In this case, give starting angle and rotation range (not the final angle!).
Notes on rotation range:
1) BEST counts completeness not as a fraction of all unique reflections, but as a fraction of unique reflections that lie outside the blind region.
2) BEST optimizes the data multiplicity (or redundancy) for the highest resolution shells only. This may not necessarily be optimal for obtaining very accurate low-resolution data. If you want to collect data in one go both for phasing at lower resolution and structure refinement at high resolution (not recommended!), please request high redundancy explicitly.
3) Increasing the redundancy just to have it slightly higher may be severely counter- productive, e.g. it may enforce intruding in to orientations with heavy overlaps. One can see from the plots that BEST produces if this is the case.
4) If redundancy lower then optimal is requested, BEST will neglect the request. This can be overcome by specifying rotation range explicitly.
Minimum rotation range/frame {number} This sets the lower limit BEST will use. Increasing this value results in less frames to collect, but it may lead to overlap problems and degrade data resolution.

Output plan parameters

Complexity level of data collection strategy {single line | few lines | complicated}
With the complicated option chosen, BEST calculates optimal rotation range/frame and exposure time in small wedges in such a way that average signal-to-noise ratio stays constant on all the frames in the data set. This is the only truly correct way of using BEST, especially when the radiation damage correction is enabled. The resulting data collection plan may consist of many wedges; such that collecting data manually via GUI (and processing as well) may become painful. Turning to few lines would simplify the plan so that it would be possible to type it into a data collection GUI, without sacrificing the data quality strongly. The single line would simplify down to a single line. This may still be reasonable, but only for low-dose data collection with radiation damage correction disabled.
Save strategy to file
The strategy may optionally be stored in an separate ASCII file. This is meant to be loadable (via an appropriate filter) into automatic data collection and/or processing software. It can also be loaded in BEST (in Estimate Data Statistics mode), in order to compare the predictions with the results of data processing later on.

Rotation speed/exposure time limitations

Maximum scan speed
Minimum exposure time/frame
In case when the exposures required are too short or scan speed is too fast for the hardware, BEST will calculate appropriate transmission factor for attenuator. The transmission is relative to the conditions used for measuring reference frames.

Estimate data statistics

This will calculate predicted data statistics on the basis of reference data, radiation damage parameters (if enabled), and the data collection parameters specified in the table. The table can be filled from the file "bst input" and/or edited manually. Reset button cleans up the table. The successive wedges must be continuous in Phi (scan axis). Pressing one of the Start Phi, #frames or Width buttons will modify the corresponding column to make the total range continuous.

Program output

When using CCP4I interface, do ->View Files from Job->View Log File to access the output of BEST (Log file). Further Output files assigned to the BEST CCP4I Job after successful completion are {job#}_best.mtv (plotmtv file, see Graphs) and, optionally, #_raddose.out (if RADDOSE has been used), and an ASCII strategy output file.

Log file

The data collection strategy and the expectated data statisatics are summarized in the log file.
First off, the program prints the warnings if the quality of input data - e.g. if there are suspects on the consystency in the input data, or if the amount and quality of input data is not sufficient for accurate strategy determination. If the data do not permit to determine anisotropic B-factor, BEST will continue while using isotropic scaling, but suggest a spindle position where additional reference image(s) should be collected. Using isotropic approximations adversly affects both the quality of predictions and the quality of resultant data. Persistences of the warnings despite adding recommended data usially indicates either incositent indexing between different wedges of refernce data, or more severe problems with the data, like crystal misscentering or extreme anisotropy in the data etc. One should take the warnings into account.

Plan of data collection

First, BEST reports how the resolution limit for data collection has been chosen: There are following possibilities:
1) "by the initial image resolution"
this indicates that the crystal may diffract to higher resolution. It may be worse re-collecing reference images using shorter detector distance and re-calulating strategy. BEST generates a plot of total exposure/data collection time required for requested statistics versus resolution, with an exponential extrapolation. This can be used in order to estimate how far the crystal may diffract, and to choose the resolution limit for re-characterizing the crystal
2) "by the detector dynamical range"
it is impossible to collect any data at higher resolution due to detector overload by the background intensity;
3) "by the radiation damage"
- the data can be collected with requested statistics only up to this resolution due to the radiation damage; the resolution is lower then that requested by the user.
4) "according to user's choice"
5) "according to the given max. time"
- data collection at higher resolution will require longer total exposure/measurement time then that input by the user .

Attenuation = Factor
The data collection plan bellow assumes that the incident X-Ray beam will be attenuated by a Factor relative to the intensity used for measuring reference frames, in order to fulfill the rotation speed/exposure time limitations.

The plan of data collection is presented in form of a table. Each line corresponds to a wedge of the data, and contains all necessary parameters: strating Phi, number of frames, oscillation width, exposure time, and the detector distance. The last column contains "Yes" or "No" to lable the wedges where spacial overlap of relections will occur. The number of wedges (single, few or many) is controlled by the 'complexity level of data collection strategy' parameter or '-e none/mean/full' command line argument.
The wedges do form a continiouse rotation range, i.e. the starting Phi of every consequitive wedge equals exactly the end of the previose one, e.g. :

Following table contains overall plan parameters and basic overall/outer shell data statistics, e.g.:
Resolution limit:2.17 Angstrom
Anomalous data:No
Phi_start-Phi_finish :60.00 - 135.45
Total rotation range: 75.45 degree
Total N.of images:136
Overal Completeness:98.4%
Redundancy :3.03
R-factor(outer shell):6.6% (43.5%)
I/Sigma (outer shell):21.0 (2.6)
Rel.decrease of intensity:0.243 for outer resolution shell
Total Exposure time:620.3 sec (0.172 hour)
Total Data Collection time:960.3 sec (0.267 hour)

The following table, "Data collection statistics according to the plan" represents the estimated data statistics in the resolution shells:
Resolution Compl. Average I/Sigma I/Sigma Chi**2 R-fact Overload
Lower Upper % Intensity Sigma /Chi % %
All data98.41219.758.

Columns contents:
"Average intensity"
average intensity (LP-corrected) on a scale correspondig to the first wedge in the data collection plan. There is an integration software-dependent factor applied, such that the scale would approximately match that of the output by the corresponding scaling software (XSCALE/SCALA/SCALEPACK).
"Average sigma"
expected value of standard uncertainty in merged data, i.e. < SigI > = <Sum ( SigI hkl -2 ) -1/2 > . Here SigIhkl is expected standard deviation for an individual intensity observation I hkl, summation over the symmetry-equivalent reflections, averaging in the resolution shells. Average sigma includes no contribution induced by radiation damage.
approximates I/SigI in the merged data after correcting sigma's for variation arizing from radiation dmage (e.g. by Chi 2 or Normal Probability Analysis).
"Chi 2 "
expected value for a Chi 2 test applied when merging the data, Chi 2 = < Var(Ihkl - < Ihkl > ) / SigI hkl 2 > . Var(Ihkl - < Ihkl >) is an expected apparent variance of equivalent observations about an "unique" hkl intensity produced by merging. Chi 2 = 1 when radiation damage correction is disabled, and ommited from the table in this case.
Expectation value for R-merge, i.e. expected absolute deviation <|I hkl - < I > hkl| / < I hkl >. This includes contributions of both measurement errors and radiation damage-induced variance.
characterizes the expected noice in the merged anomalous difference data, Ranom= | < I >hkl - < I > -h-k-l | / ( <I > hkl + < I > -h-k-l ), This is calulated with an assumption that there is no anomalous scattering in the data, i.e. the measurement errors and radiation-induced errors are the only source of apparent differences between I hkl and I -h-k-l. Ranom, rather then Rmerge should be used for comparing with the estimates of expected anomalous signal (as, e.g., offered by a number of web services).
Overload %
Expected percentage of overloaded reflections. If expected percentage of overloads exceeds 5% at any resolution, BEST will propose separate plan for low resolution pass with at an apropriate resolution with short exposure times and/or attenuation.
NOTE: It is possible that the plan for a low-resolution pass is generated despite the fact that all values in an "Overload" column for high-resolution pass are bellow 5%. That means that the value > 5% is expected for a thinner resolution shell as compared to the binning used in the table; low-resolution pass is required.

ResolutionCompl. Redund.DataCol_TimeExposure_TimeI/IoAttenuation
% h:min h:min
12.0 7.2490.82.910:1.600:0.021.0000.1E+01
7.24 5.6695.52.940:1.620:0.040.9990.1E+01
5.66 4.8193.32.980:1.610:0.030.9990.1E+01
4.81 4.2598.62.910:1.620:0.030.9990.1E+01
4.25 3.8596.72.990:1.630:0.040.9980.1E+01
3.85 3.5496.13.040:1.650:0.060.9970.1E+01
3.54 3.3096.63.030:1.680:0.100.9940.1E+01
3.30 3.1097.43.020:1.750:0.160.9890.1E+01
3.10 2.9497.82.960:2.040:0.250.9820.1E+01
2.94 2.7998.63.000:2.520:0.400.9680.1E+01
2.79 2.6798.63.040:3.000:0.580.9490.1E+01
2.67 2.56100.3.070:3.670:0.840.9220.1E+01
2.56 2.47100.3.090:4.240:1.200.8810.1E+01
2.47 2.3899.53.010:4.990:1.740.8210.1E+01
2.38 2.3099.23.060:6.350:2.600.7310.1E+01
2.30 2.23100.2.960:9.600:4.600.5530.1E+01
2.23 2.17100.3.020:16.00:10.30.2430.1E+01

To help user to make choice which data collection resolution is most suitable for current structure study task, BEST estimates and prints the table of total data collection time and total exposure time which are needed to provide target statistics (ration I/Sigma) for different outer shells of resolution. If the radiation damage modeling is activated, BEST will also print the estimate of relative decrease of intensity for given resolution shell at the end of data collection.


About Graphs

XML output

Most of the information appearing in standard/log file output generated by best can optionally be written in an XML formatted file. This option (-dna {file}) is only available on the command line .

Command line interface

Compulsory arguments1 FILES


Measuring initial images

Exposure time

The initial image(s) would usually be measured with a short exposure time. The presence of overloaded spots severely biases the predictions. The images that contain no visible reflections at the edge of the detector are adequate for BEST. However, in total there should be enough measurable reflections to index the pattern.

The detector distance

BEST will calculate neither plans nor statistics predictions for the resolution extending beyond the limit defined by the detector distance at which the initial image(s) were measured. Thus, the initial image(s) would usually be measured at a relatively short distance (see )Overlaps. BEST is then used to define the resolution of the final data collection at a distance longer than that for the initial frames(s).

Rotation range

For DENZO input, unless the Use sca file (keyword is ) option is being used, BEST reconstructs the intensities from partials. This works well only when the rotation range is wider then half of the crystal mosaicity. The initial image(s) would usually be measured with a relatively large rotation (see Overlaps).


Ideally, the reflections should not overlap. If overlaps appear at low resolution, the distance should be increased until they are resolved. High-resolution data collection will not be possible anyway. A small fraction of overlapping reflections at high resolution can be tolerated. If severe overlapping cannot be avoided and the rotation range is wider than half of the mosaicity, several thin frames covering a range of about twice the mosaiciy should be measured and the partials summed (e.g., using SCALEPACK). In this case the Use sca file (keyword is ) option of BEST should be used.

Number of frames

For triclinic and monoclinic crystals, a minimum of two frames measured at c.a. orthogonal spindle axis positions is are always required. For higher symmetries a single frame may be sufficient depending on the number of reflections. If there is not enough input data, BEST will suggest a spindle value at which further frame(s) should be measured. However, the more data that are input the higher the accuracy of predictions. Every plan is supplied with an estimate of the uncertainty in the expected error Scaling error in a log file). The necessary amount of the input data can be guided by this estimate.

Evaluating initial image(s)

The symmetry of the crystal must be known prior to running BEST. The initial image(s) should be indexed and integrated correctly. BEST will issue a warning if it suspects that the indexing is wrong (with DENZO input), but it might not necessarily be able to detect misindexing. No checks are done with either XDS or MOSFLM input. If more than one frame is used, the indexing must not necessarily be rigorously consistent (e.g., turning a trigonal lattice by 60 degrees is permitted, permuting orthorhombic axes is not). Reasonable estimates of mosaicity and spot size should be made, according to the processing software you are using. For example, overestimating mosaicity by ten per cent will translate in a signal-to-noise <5% higher then predicted. A proper value for the air absorption length in DENZO (equivalent 1/AIR in XDS). Note that both DENZO and XDS defaults corresponds to a wavelength of 1.54 Å .