SREFLEX manual

SREFLEX

Written by A. Panjkovich.
Post all your questions about SREFLEX to the ATSAS Forum.

Manual
Examples

Manual

The following sections shortly describe the method implemented in SREFLEX, how to run SREFLEX from the command-line, the required input and the produced output files.

If you use SREFLEX in your work, please cite:
Panjkovich A. and Svergun D.I. (2016) Deciphering conformational transitions of proteins by small angle X-ray scattering and normal mode analysis. Phys. Chem. Chem. Phys., 18, 5707-5719 DOI:10.1039/c5cp04540a

Introduction

The SREFLEX program uses normal mode analysis (NMA) in Cartesian space to estimate the flexibility of high-resolution models of biological macromolecules and improve their agreement with experimental small angle X-ray and neutron scattering (SAXS and SANS) data. The method starts from a given structural conformation and a corresponding SAS profile in relative disagreement. The structure is partitioned into pseudo-domains based on user input or automatically from the protein dynamics as predicted by NMA. The algorithm proceeds hierarchically to first probe large rearrangements and progresses into smaller and more localized movements. The output consists in a set of structural models representing possible conformational changes that improve the agreement with the experimental SAS profile. A mode to generate a pool of conformers from an initial structure file ("pool" mode) was also added.

Running sreflex

Usage:

$ sreflex [OPTIONS] <SAS FILE> <COORD FILE>

SREFLEX expects the following command line arguments:

Argument	Description
`SAS FILE`	The filename of a SAS experimental curve.
`COORD FILE`	High-resolution atomic model in PDB format.

SREFLEX recognizes the following command-line options. Mandatory arguments to long options are mandatory for short options too.

Short Option	Long Option	Description
`-p`	`--prefix=<PREFIX>`	Prefix to prepend to the output directory, the default is `wd_sreflex`.
`-P`	`--pool=<INT>`	Specify maximum number of conformers to generate from initial structure. Independent of scattering data
`-q`	`--quiet`	Suppress screen output.
`-t`	`--threads=<INT>`	Select the number of CPU cores/threads to use. All available cores will be used by default, that number also represents the upper limit for this parameter.
`-r`	`--ratio=<FLOAT>`	Convergence ratio, default 0.7, range [0.5:0.9].
`-f`	`--first=<INT>`	First SAS input data point to consider, default 1.
`-N`	`--neutron`	Work with SANS data.
`-n`	`--nmtop=<INT>`	Top normal mode to consider, default is 16, range [9:64].
`-s`	`--skip=<STAGE>`	Skip RESTRAINED or UNRESTRAINED refinement stage.
`-v`	`--version`	Print version information and exit.
`-h`	`--help`	Print a summary of options and exit.

Additionally, the following arguments can be forwared to CRYSOL/CRYSON to modify how theoretical intensities are calculated: --lm, --fb, --sm, --ns, --un, --dns, --dro, --cst and --eh. Please see the CRYSOL manual for more details.

sreflex Input Files

SREFLEX reads SAS experimental data files (*.dat) in ascii format containing 3 columns: (1) experimental scattering vector, (2) experimental intensity and (3) experimental errors.

SREFLEX expects atomic coordinates in Protein Data Bank (PDB) format. NMA calculations are based on backbone atoms (centroids): alpha carbons (CA) for proteins and sugar C1' for nucleotides. Please note that SREFLEX parses centroid atoms from the ATOM record. The program will not work properly if input coordinates lack backbone atoms. Non-solvent HETATM entries are grouped by residue identifier and number. These are then associated to the closest ATOM centroid in the structure for the application of rotations and translations. Theoretical scattering calculations consider all non-H ATOM and non-solvent HETATM entries.

Definition of pseudo-domains

The restrained refinement stage that is performed initially by SREFLEX considers the input structure as a set of rigid bodies or pseudo-domains. By default, SREFLEX will partition the structure based on its NMA and define (pseudo-) domains that can move with relative independence of each other. The algorithm is explained in detail in the SREFLEX article. This automatic partitioning procedure is the default behaviour if a single PDB file is provided as the <COORD FILE> argument.

Results may be improved if the user defines custom domains beforehand, these definitions can be based for example on structural or evolutionary information. In such case, the coordinates for user-defined domains should be provided to SREFLEX as different PDB files in a comma separated list (no spaces) in place of the <COORD FILE> argument. This will deactivate the automatic partitioning procedure and each PDB file provided by the user will be considered as a pseudo-domain or rigid-body during the initial restrained refinement stage.

sreflex Output Files

Upon execution, SREFLEX will create a directory to store results. The name of the output directory can be set through the prefix argument. If the directory exists already, a number will be appended to create a new directory.

Output	Description
`log.txt`	Contains the same information as the screen output and is updated during execution of the program.
`report.txt`	Summarizes execution details and results for each generated model.
`models/*.pdb`	Models are created for both restrained (`rc.pdb`) and unrestrained (`uc.pdb`) stages in PDB format. Within each coordinates file, the `REMARK` section contains further details.
`fits/*.fit`	Fit of the corresponding model's simulated scattering curve versus the experimental data. Columns in the output file are: '`s`', '`I_exp`' and '`I_sim`'.

Examples

The examples are based on the known conformational change that adenylate kinase undergoes upon catalytic activity. PDB entry 1ake (chain A) was used to simulate a SAXS profile for the closed conformation (available as example file closed1akeA.dat). PDB entry 4ake (chain A) corresponds to the open conformational state (available as example file open4akeA.pdb). SREFLEX will refine the 4ake open conformation until it matches the SAXS profile of the closed conformation. The complete run should take a few minutes on a normal desktop computer.

Default run

Use SREFLEX to model the conformational change of adenylate kinase:

$ sreflex closed1akeA.dat open4akeA.pdb

SREFLEX will create a directory called wd_sreflex where the output files (report.txt, log.txt, models and fits) can be found upon completion of the run.

Custom domain definitions

To illustrate the definition of custom structural domains, open4akeA.pdb was separated into two files (open1.pdb and open2.pdb) corresponding to the SCOP domain classification of the 4ake PDB entry. In this case, SREFLEX will refine the structure considering both PDB files as rigid-bodies for the initial restrained refinement stage, disabling the automatic structure partitioning approach. The --prefix option is used to store the results in a new directory named customDomains. Note that there are no spaces separating the PDB filenames at the <COORD FILE> argument:

$ sreflex --prefix=customDomains closed1akeA.dat open1.pdb,open2.pdb

Crysol batch arguments

Command line arguments can be forwarded to CRYSOL to directly modify how theoretical intensities are calculated during refinement.

$ sreflex --lm 20 --sm 1.0 closed1akeA.dat open4akeA.pdb

Please use these and other options with caution, as they can prevent the program from working properly.

SREFLEX manual

Table of Contents