Quality of X-ray Data



One of the commonly used indicators of the quality of X-ray diffraction data is the so-called symmetry R-factor (Rsym) or merging R-factor (Rmerge), which is nothing but the sum of the differences of all measurements from the average value of the measurement divided by the sum of all measurements (see equation 1).

Eq. 1 : R $_{\rm merge}$ = $\sum_{\rm hkl}\sum_{\rm i}$ $\mid $ I$_{\rm i}$(hkl) - $\overline{\rm I(hkl)}$ $\mid $ / $\sum_{\rm hkl}\sum_{\rm i}$ I$_{\rm i}$(hkl)

However, the principal problem with Rmerge as a quality indicator is, that it is inherently dependent on the redundancy (or multiplicity) of the data. The more often a given reflection is observed the higher the Rmerge will be, even though by simple statistical reasoning the average value of the measurements becomes more precise.

Furthermore, if one compares the equations for Rmerge and the canonical standard deviation, one can derive that in the case a reflection is only observed often enough that the Rmerge for this reflection is 0.7979 times the standard deviation divided by the mean intensity of this reflection. In other words if I/sigma(I) is about 2.0 (what is commonly used to define the high resolution limit of a data set) the Rmerge can really not be better than 40% just assuming statistical errors.

Two other R-factors that should be better suited to describe the quality of diffraction data, are : the so-called redundancy-independent merging R-factor (Rr.i.m.) and the precision-indicating merging R-factor (Rp.i.m.). Rr.i.m. contains the redundancy N or the multiplicity of the observed reflection and is basically the conventional Rmerge made independent of how often a given reflection has been observed. For that reason it gives higher values than Rmerge especially at low redundancy. (Maybe that is the reason why it hasn't become popular ... just guessing.) Rp.i.m. also contains the redundancy N and indicates how precisely the average measurement has been measured. Just assuming statistical errors this should probably be the one to use, since structures are usually solved and refined using averaged measurements. The equations for the two R-factors are given below (equations 2 and 3).

Eq. 2 : R $_{\rm r.i.m.}$ = $\sum_{\rm hkl}\sqrt{\frac{\rm N}{\rm N-1}}\sum_{\rm i}$ $\mid $ I$_{\rm i}$(hkl) - $\overline{\rm I(hkl)}$ $\mid $ / $\sum_{\rm hkl}\sum_{\rm i}$ I$_{\rm i}$(hkl)

Eq. 3 : R $_{\rm p.i.m.}$ = $\sum_{\rm hkl}\sqrt{\frac{1}{\rm N-1}}\sum_{\rm i}$ $\mid $ I$_{\rm i}$(hkl) - $\overline{\rm I(hkl)}$ $\mid $ / $\sum_{\rm hkl}\sum_{\rm i}$ I$_{\rm i}$(hkl)

Unfortunately some scaling programs such as SCALEPACK don't provide these data. SCALA, however, calculates both Rr.i.m. and Rr.i.m. (Rr.i.m. is also called Rmeas following the paper by Diederichs and Karplus). In order to provide these numbers, I have written a program in (almost) standard Fortran 77 that can be downloaded from this site, and together with the short set of instructions it should be possible to calculate all of them very easily. The program has been tested extensively on a Silicon Graphics platform (IRIX 6.2) and it compiles alright under Digital UNIX V4.0D. It should compile ok on other platforms such as Linux and Windows using the compilers ifort, gfortran and g95 as well, but we are not sure.



Case 1 : SCALEPACK is your favorite scaling program

- Once you have scaled and merged your data using SCALEPACK, re-run SCALEPACK with the line NO MERGE ORIGINAL INDEX added. This writes out a reflection file with scaled but unmerged intensities.

- run an executable version of the program RMERGE, which you can download here. You need a FORTRAN compiler to make the executable of course. The input to the program is pretty much self-explanatory. The output will be written to the screen and to a file called rmerge.data.



Case 2 : SCALA is your favorite scaling program

You are lucky, because SCALA will do the job for you automatically.



Case 3 : XDS/XSCALE

XDS/XSCALE will provide Rr.i.m. (called Rmeas) but not Rp.i.m.. Thanks to Kay Diederichs, the current version is now also able to read and process the file XDS_ASCII.HKL.



Just one last thing: I would really appreciate if you would drop me a line at msweiss when you have picked up the program. Also, let me know if there are any problems.



More information about this can be found in the following papers:

M. S. Weiss (2001). Global indicators of X-ray data quality. J. Appl. Cryst. 34, 130-135.

M.S. Weiss and R. Hilgenfeld (1997). On the use of the merging R factor as a quality indicator for X-ray data. J. Appl. Cryst. 30, 203-205.

K. Diederichs and P.A. Karplus (1997). Improved R-factors for diffraction data analysis in macromolecular crystallography. Nature Struct. Biol. 4, 269-275.