DARA is a DAtabase for RApid search of structural neighbours using solution small angle X-ray scattering (SAXS) data. DARA is a web-server that queries over 150 000 scattering patterns pre-computed from the high resolution structures of macromolecules and biological assemblies in the Protein Data Bank, to find nearest neighbours of a given experimental or theoretical SAXS data. Structures with identical or very similar scattering patterns are grouped into over 85 500 clusters in order to provide a more user-friendly output. Three types of macromolecules are taken into account: proteins, nucleic acids and protein:nucleic acid complexes. The search combines principal component analysis of the scattering patterns and k-d trees for almost instantaneous identification of similar scattering patterns. Identification of the best scattering equivalents provides a rapid automatic structural assessment of macromolecules based on the experimental SAXS profile.
Angular units of the input data: either inverse nanometres (1/nm) or inverse angstroms (1/Å). Angular units are typically not stored in the input file. Once an input file in GNOM format is chosen, an attempt to guess the angular units is made based on the Dmax value in the GNOM file: if Dmax > 20 then 1/Å is recommended; 1/nm otherwise.
Number of neighbours to show
How many neighbours to show in the output. One of: 1, 5, 10, 25, 50, 100. Default is 10.
It is recommended to submit experimental data in GNOM format (*.out). Both the "old" (version 4.x) and the "new" (version 5.0) formats are supported. For the search of neighbours DARA uses regularized data IREG(s) where s = 4πsin(θ)/λ, 2θ is the scattering angle, λ is the wavelength. If the input data file contains results of multiple GNOM runs then the last run is taken into account.
Experimental data in a three-column format (*.dat) can be submitted as well. The first column should contain the momentum transfer s, the second column—the background-subtracted experimental intensities I(s), the third column—the experimental errors. In this case an automatic GNOM run will be performed to obtain IREG(s) extrapolated to s = 0.
Theoretical data (*.int) can be submitted as well. The first column should contain the momentum transfer s, the second column—the theoretical intensities I(s), other columns are ignored.
The input data range should be up to smax > 0.4 nm-1. Wide angle data above s = 10 nm-1 are ignored.
Atomic models in PDB format (*.pdb) can be submitted as well. In this case an automatic CRYSOL run is performed to obtain the theoretical intensities. Submitted files should be smaller than 1MB. For larger models the theoretical intensities should be computed by CRYSOL locally.
Logarithmic plot of the experimental input data (blue) and the scattering pattern computed from the neighbour structure (red).
Reduced chi-square goodness of fit, i.e. mean square weighted deviation between the experimental input data and the scattering pattern computed from the neighbour structure. A value close to 1.0 indicates a good fit (assuming the experimental errors are correct). A value much higher than 1.0 indicates a poor fit (or that the experimental errors are underestimated). A value below 1.0 indicates overestimated errors.
In case of simulated input data (*.int or *.pdb) the R-factor is computed instead of chi-square. Zero value indicates an identical fit.
ID of the structure and a link to the respective RCSB PDB page. If the selected biological assembly is different from the one shown by default at the RCSB page - this is indicated in the title of the link (put the mouse pointer on top of the link to see it). Structures with very similar scattering patterns are clustered; in case the selected model has neighbours with identical or very similar scattering patterns - this is indicated with a plus sign to the left of the PDB ID. Click the plus sign to expand the list of all PDB IDs.
A thumbnail image of the selected biological assembly structure (taken from RCSB). On click the particular biological assembly (or the first model from the ensemble in case of NMR) will be downloaded in *.gz format. In case of proteins and protein:nucleic acid complexes the percentage of alpha helices (α) and beta sheets (β) is shown below.
Molecular weight of the structure in kDa.
Hydrated volume of the structure in nm3.
Radius of gyration of the structure in nm - average of square center-of-mass distances in the structure weighted by the scattering length density.
Maximum intra-particle distance of the structure in nm.
DARA web server provides a simple mechanism to try out sample data. To see the sample input data click Sample input: SAXS data from protein solutions. A list of six data sets will be shown. Each data set has a button for automatic loading of the data and a link to download the data. The following data are available:
Bovine serum albumin (BSA) in HEPES Download