ATSAS online | Forum | User information | EMBL Hamburg

DARA manual

dara

Written by A.G. Kikhney, A. Panjkovich and A.V. Sokolova.
Post all your questions about DARA to the ATSAS Forum.

Manual
- Introduction
- Running DARA
Examples

Manual

The following sections shortly describe the method implemented in DARA, how to use DARA web interface, detail the required input and the produced output.

If you use results from DARA in your own publication, please cite:

Kikhney, A.G., Panjkovich, A., Sokolova, A.V. & Svergun, D.I. (2016) DARA: a web server for rapid search of structural neighbours using solution small angle X-ray scattering data. Bioinformatics 32(4), 616-618.

Introduction

DARA is a DAtabase for RApid search of structural neighbours using solution small angle X-ray scattering (SAXS) data. DARA is a web-server that queries over 150 000 scattering patterns pre-computed from the high resolution structures of macromolecules and biological assemblies in the Protein Data Bank, to find nearest neighbours of a given experimental or theoretical SAXS data. Structures with identical or very similar scattering patterns are grouped into over 85 500 clusters in order to provide a more user-friendly output. Three types of macromolecules are taken into account: proteins, nucleic acids and protein:nucleic acid complexes. The search combines principal component analysis of the scattering patterns and k-d trees for almost instantaneous identification of similar scattering patterns. Identification of the best scattering equivalents provides a rapid automatic structural assessment of macromolecules based on the experimental SAXS profile.

Running DARA

DARA is a web server available at dara.embl-hamburg.de. It is free and open to all users and there is no login requirement.

Arguments and Options

DARA takes experimental data in three-columns (*.dat) or in GNOM (*.out) format; simulated data (*.int) or atomic models (*.pdb) as input.

Option	Description
Angular units	Angular units of the input data: either inverse nanometres (1/nm) or inverse angstroms (1/Å). Angular units are typically not stored in the input file. Once an input file in GNOM format is chosen, an attempt to guess the angular units is made based on the D_max value in the GNOM file: if D_max > 20 then 1/Å is recommended; 1/nm otherwise.
Macromolecule type	One of: protein; nucleic acid; protein:nucleic acid.
Number of neighbours to show	How many neighbours to show in the output. One of: 1, 5, 10, 25, 50, 100. Default is 10.

DARA Input Files

It is recommended to submit experimental data in GNOM format (*.out). Both the "old" (version 4.x) and the "new" (version 5.0) formats are supported. For the search of neighbours DARA uses regularized data I_REG(s) where s = 4πsin(θ)/λ, 2θ is the scattering angle, λ is the wavelength. If the input data file contains results of multiple GNOM runs then the last run is taken into account.

Experimental data in a three-column format (*.dat) can be submitted as well. The first column should contain the momentum transfer s, the second column—the background-subtracted experimental intensities I(s), the third column—the experimental errors. In this case an automatic GNOM run will be performed to obtain I_REG(s) extrapolated to s = 0.

Theoretical data (*.int) can be submitted as well. The first column should contain the momentum transfer s, the second column—the theoretical intensities I(s), other columns are ignored.

The input data range should be up to s_max > 0.4 nm^-1. Wide angle data above s = 10 nm^-1 are ignored.

Atomic models in PDB format (*.pdb) can be submitted as well. In this case an automatic CRYSOL run is performed to obtain the theoretical intensities. Submitted files should be smaller than 1MB. For larger models the theoretical intensities should be computed by CRYSOL locally.

DARA Output

Field	Description
Fit	Logarithmic plot of the experimental input data (blue) and the scattering pattern computed from the neighbour structure (red).
Χ²	Reduced chi-square goodness of fit, i.e. mean square weighted deviation between the experimental input data and the scattering pattern computed from the neighbour structure. A value close to 1.0 indicates a good fit (assuming the experimental errors are correct). A value much higher than 1.0 indicates a poor fit (or that the experimental errors are underestimated). A value below 1.0 indicates overestimated errors.
R-factor	In case of simulated input data (.int or .pdb) the R-factor is computed instead of chi-square. Zero value indicates an identical fit.
PDB ID	ID of the structure and a link to the respective RCSB PDB page. If the selected biological assembly is different from the one shown by default at the RCSB page - this is indicated in the title of the link (put the mouse pointer on top of the link to see it). Structures with very similar scattering patterns are clustered; in case the selected model has neighbours with identical or very similar scattering patterns - this is indicated with a plus sign to the left of the PDB ID. Click the plus sign to expand the list of all PDB IDs.
Download model	A thumbnail image of the selected biological assembly structure (taken from RCSB). On click the particular biological assembly (or the first model from the ensemble in case of NMR) will be downloaded in *.gz format. In case of proteins and protein:nucleic acid complexes the percentage of alpha helices (α) and beta sheets (β) is shown below.
MW	Molecular weight of the structure in kDa.
Volume	Hydrated volume of the structure in nm³.
R_g	Radius of gyration of the structure in nm - average of square center-of-mass distances in the structure weighted by the scattering length density.
D_max	Maximum intra-particle distance of the structure in nm.

Examples

DARA web server provides a simple mechanism to try out sample data. To see the sample input data click Sample input: SAXS data from protein solutions. A list of six data sets will be shown. Each data set has a button for automatic loading of the data and a link to download the data. The following data are available:

Bovine serum albumin (BSA) in HEPES Download
Experimental data
Apoferritin in PBS Download
Experimental data
Glucose isomerase in PBS Download
Experimental data from SASBDB ID: SASDAK6
Catalase in PBS Download
Experimental data from SASBDB ID: SASDA92
SRB2m RNA in HEPES Download
Experimental data from SASBDB ID: SASDA54
LytTR-comcde (DNA:protein) complex in MES Download
Experimental data from SASBDB ID: SASDAC7
Ubiquitin Download
Theoretical data from PDB ID: 1UBQ
Alcohol dehydrogenase (ADH) Download
Theoretical data from PDB ID: 4W6Z

Last modified: October 7, 2016