ARP/wARP User Guide
Version 7.2
August 21, 2011
Contents
1 General information
1.1 Introduction
1.2 Major changes in Version 7.2
1.3 Latest News, Bug Reports and Troubleshooting
1.4 Distribution
2 Installing ARP/wARP
2.1 Intel Mac OSX Installation
2.2 Command Line Installation on Mac OSX or Linux
3 Using ARP/wARP
3.1 Automated Model Building
3.1.1 Running protein model building from the GUI, ARP/wARP
Classic
3.1.2 Command line model building, auto_tracing.sh
3.1.3 Remote submission of a model building task
3.1.3.1 Submitting from the GUI
3.1.3.2 Submitting from a web browser
3.1.4 Output files, short log file
3.1.5 Running protein model building from the GUI, ARP/wARP
Expert System
3.1.6 Running flex-wARP from command line
3.2 Automated Construction of Helical and Beta-Stranded Fragments
3.2.1 Building secondary structure from the GUI, ARP/wARP
Quick Fold
3.2.1.1 Output files, short log file
3.2.2 Building secondary structure from the command line,
auto_albe.sh
3.3 Automated Loop Building
3.3.1 Running loop building from the GUI, ARP/wARP Loops
3.4 Automated Building of Poly-Nucleotides
3.4.1 Running nucleotide building from the GUI, ARP/wARP
DNA/RNA
3.4.1.1 Output files, short Log File
3.4.2 Running nucleotide building from the command line,
auto_nuce.sh
3.5 Automated Ligand Building
3.5.1 Running ligand building from the GUI, ARP/wARP Ligands
3.5.1.1 Output files, short Log File
3.5.2 Running ligand building from the command line,
auto_ligand.sh
3.6 Automated Solvent Building
3.6.1 Running solvent building from the GUI, ARP/wARP Solvent
3.6.1.1 Output files, short log file
3.6.2 Running solvent building from command line, auto_solvent.sh
3.7 ARP/wARP molecular graphics: ARP Navigator
3.7.1 Main Menu
3.7.2 Mouse and Keyboard functions
3.7.2.1 Rotation
3.7.2.2 Translation
3.7.2.3 Scaling
3.7.2.4 Clip planes
3.7.2.5 Map contouring
3.7.2.6 Map extent
3.7.2.7 Mouse Actions
3.7.2.8 Keyboard Actions
3.7.3 Object Buttons
3.7.4 Quick Actions
4 Additional Remarks
4.1 Quality of the X-ray Data
5 Citing ARP/wARP
6 Other References
7 Acknowledgements
Chapter 1
General information
1.1 Introduction
ARP/wARP is a software project for automated protein model building and
structure refinement. It is based on a unified approach to the structure solution
process. It combines electron density interpretation using the concept of the
hybrid model, pattern recognition in an electron density map and maximum
likelihood model parameter refinement with REFMAC.
The ARP/wARP software is under continuous development. Its present
release, version 7.2, can be used in the following ways:
- Automated protein chain tracing in the density map and model
building (GUI modules ARP/wARP Classic, ARP/wARP Expert System,
command line modules auto_tracing.sh and auto_flex_warp.sh).
This constructs polypeptide fragments (both main and side chains)
for the cases of MR solutions or MAD/M(S)IR(AS) phases. The Classic
and the auto_tracing.sh modules use a pre-defined number of cycles
for model update, refinement and chain tracing, while Expert System
and auto_flex_warp.sh define the sequence of steps on the fly. X-ray
data to 2.7 Å resolution or higher are required although partial model
building can sometimes be achieved at a resolution of 3.0 Å or lower.
- Free atoms density modification (GUI module ARP/wARP Classic).
This method provides improvement in a density map by free-atoms
update and requires an input PDB and X-ray data to about 2.5 Å
resolution or higher.
- Automated building of alpha-helical and beta-stranded fragments
(GUI module ARP/wARP Quick Fold, command line module auto_
albe.sh). This constructs helical and beta-stranded polypeptide
fragments (main chain and CB atoms) in low-resolution density maps.
Phased X-ray data to 4.5 Å resolution or higher are required. This
module is automatically invoked in protein chain tracing (#1 above)
when the resolution of the data is 2.7 Å or higher.
- Building poorly defined loops in a protein model (GUI module Loops).
This will generate a set of candidate loops for a short stretch of
missing residues given the anchors and the sequence of the missing
residues. Each candidate will have both main chain and side chain
atoms. The user can ask the module to choose a single solution or
to suggest several loops. A protein model and an X-ray data to 3.0
Å resolution or higher are required. This module is automatically
invoked in protein chain tracing (1 above).
- A prototype software for building poly-nucleotide fragments, DNA
or RNA (GUI module ARP/wARP DNA/RNA, command line module
auto_nuce.sh). This will produce a set of poly-nucleotide chains with
guessed bases (A or C, i.e. large or small), the nucleotide sequence is
not yet used. Phased X-ray data to about 3.5 Å resolution or higher are
required.
- Building bound ligands (GUI module ARP/wARP Ligands, command
line module auto_ligand.sh). This constructs a ligand in a difference
electron density map, after the protein model has been completed
and refined, given a template search ligand or a list of putative
ligands (cocktail screening). X-ray data to 3.0 Å resolution or higher
are required.
- Building the solvent structure (GUI module ARP/wARP Solvent,
command line module auto_solvent.sh). This builds a solvent
structure after the protein model has been refined. X-ray data to 2.5 Å
resolution or higher are required.
- A molecular graphics ARP/wARP front-end, which allows the
display of molecules and electron densities (GUI module ARP
Navigator, executable program arpnavigator). It is a high-quality
3D molecular viewer and a user-friendly interface to ARP/wARP
functionalities, allowing macromolecular models, ligands and
solvents to be viewed as they are built.
1.2 Major changes in Version 7.2
- The functionality of the graphics front-end, Arp Navigator, has
been considerably extended. New file formats and display styles
of molecules are supported. Publication quality images in standard
colours are easier to produce.
- Version 7.2 uses new techniques and approaches for building models
of large, low-resolution protein structures using non-crystallographic
symmetry (NCS) and motif searching (GUI module ARP/wARP
Classic, command line module auto_tracing.sh). NCS-related parts
of a structure are rarely built in the same way during model
building. A beneficial side effect of this is that each copy provides
information that is not present in another copy. By combining this
intrinsic information, the model building process is improved and the
overall completeness of built structures at low resolution is increased.
Furthermore, identified NCS-relations are used as restraints for
refinement in REFMAC. Coupled with novel implementations for
enhanced protein chain tracing, the use of NCS provides, on average,
the following improvement: 5% more built residues, 25% longer built
fragments and 10% more residues docked.
- Refinement procedures during automated model building have been
enhanced in the new versions of our preferred refinement engine,
REFMAC.
- The automated ligand building can model bound ligands that are
partially ordered. The electron density map can be automatically
screened for density corresponding to any of a cocktail of potential
ligands and those that fit best are automatically built. Furthermore,
in instances where the density is insufficient to allow accurate
identification of atomic coordinates of particular ligands, ”partial”
ligands, comprised only of the fragments whose atoms are in
unequivocal density, can be modelled.
- Supported computer platforms are Mac powerpc, Mac Intel and
Linux (including 32 and 64-bit versions).
- The ARP/wARP installer has been updated on all platforms. The Mac
OSX installer is now a native application packaged in a DMG file.
Installation on Linux has also been simplified and users no longer
need to manually edit their shell startup files.
- CCP4 6.1.13 or higher is the recommended version to use with
ARP/wARP 7.2, the recommended version of Refmac is 5.5.0109 or
higher.
1.3 Latest News, Bug Reports and Troubleshooting
For the latest news and announcements please visit the ARP/wARP page
(www.arp-warp.org). Some problems and tips can be found on the Frequently
Asked Questions link. The developers will greatly appreciate all bug reports or
suggested changes from the users.
1.4 Distribution
The ARP/wARP package (either for download or for remote execution of protein
model building) is freely available to academic users provided that they agree to
the ARP/wARP license conditions and the applications of ARP/wARP are
properly cited. Please consult the ARP/wARP log file for the most relevant
citation.
Industrial users are requested to obtain a commercial license via the ARP/wARP web
page.
Chapter 2
Installing ARP/wARP
It is recommended that CCP4 is first installed from a dmg file (available
from the CCP web page at http://www.ccp4.ac.uk). There could be
problems installing ARP/wARP onto a copy of CCP4 installed using 64-bit
Fink.
CCP4 6.1.13 (and higher) is the recommended version to use with
ARP/wARP 7.2, although older versions of CCP4 may be compatible with some
of the ARP/wARP modules.
2.1 Intel Mac OSX Installation
- Download arpwarp_7.2.dmg from the ARP/wARP web page.
- Double click on the downloaded file.
- Double click on the ARPwARP installer.
- Agree to the ARP/wARP license.
- Select a destination drive.
- Choose destination directory if the default /Applications is not
suitable.
- ARP/wARP should install and start automatically.
If there are any problems, we encourage you to save the installation log that is
displayed and send it to the ARP/wARP developers using the link on the
ARP/wARP homepage. Reporting bugs improves the software and helps other
researchers.
2.2 Command Line Installation on Mac OSX or Linux
- Download the full ARP/wARP package arp_warp_7.2.tar.gz from
the ARP/wARP web page and save it in a location of your choice.
Next, type:
% gunzip arp_warp_7.2.tar.gz
% tar xvf arp_warp_7.2.tar
The distribution will unpack under the directory called arp_warp_7.2 that
will contain all the required files and subdirectories. install.sh is an
installation script to help you set the appropriate environmental variables.
The ‘README‘ will walk you through the installation process.
ARP_wARP_CCP4I-v5.tar.gz includes everything necessary to run ARP/wARP
from the CCP4i interface of version 5; ARP_wARP_CCP4I-v6.tar.gz
includes everything to run tasks from the CCP4i interface of version
6.
- Go to the directory arp_warp_7.2 and run there the install.sh script by
simply typing
% ./install.sh
Unless you are already an experienced ARP/wARP user, you should try to
get started with the test files provided in the directory arp_warp_7.2/examples.
These include the data for protein chain tracing (also with NCS), helix/strands
search, nucleotides, ligand and solvent building. A README file is included
which gives more detailed information, which data are to be used for
what.
If things do not work as expected please consult your more experienced
colleagues, system manager or the ARP/wARP developers.
Chapter 3
Using ARP/wARP
3.1 Automated Model Building
3.1.1 Running protein model building from the GUI, ARP/wARP
Classic
This module of ARP/wARP provides the execution of the following
tasks:
- automated protein building starting from experimental phases
- automated protein building starting from existing model
- improvement of maps by atoms update and refinement
Applications 1 and 2 (the so-called warpNtrace protocol) start with input
experimental / density modified phases or an available (preliminary refined or
partially autotraced) model. The warpNtrace protocol aims to deliver an
essentially complete model and obviously an improved map by utilising the idea
of the hybrid model in which protein and free atoms can co-exist. warpNtrace
keeps whatever was recognised as protein (in a form of polypeptide fragments)
and the rest as free atoms and refines this hybrid model during a ‘big’ cycle,
consisting of several (default is 5) ARP/REFMAC update/refinement
cycles. At the end of each ‘big’ cycle the map is interpreted anew - a
completely new polypeptide model is constructed with hopefully more
residues in less fragments. This whole procedure is iterated (default is 10
times).
The output of warpNtrace is a set of refined polypeptide fragments. If the
sequence is available, the traced fragments will be docked in sequence and side
chains will be built during the iterative refinement procedure. Loops
will be built during the procedure, if possible. After the last building
cycle the fragments will be arranged to form a globular structure (or, for
a case of NCS, several NCS-related structures). The remainder of the
structure (cis-prolines, poorly ordered loops and terminal residues for each
fragment) will have to be completed by the user manually. Since the output
model is refined, its accuracy is comparable to that of the final refined
structure. Mis-tracing (incorrect tracing of polypeptide fragments) is not
impossible but should normally not exceed 1% of the whole structure
with X-ray data to about 2.5 Å resolution or higher. A rough guess of the
correctness of the model is printed after every model building cycle,
e.g.:
% Chains 12, Residues 434, Estimated correctness of the model 99.1 %
Application 3 includes no model building but still may provide improvement
of the density map. The map is first interpreted as a pseudo protein model,
consisting of unconnected free atoms. This model is then refined and updated
with iterative cycles of ARP/REFMAC.
Below the application 1 is described in detail, input to applications 2 and 3 is
very similar and should be straightforward to figure out.
- Launch the ARP/wARP Classic window within the CCP4i GUI.
- Provide required input:
-
ARP/wARP for
- Choose applications 1 to 3 as described above.
-
in
- X-ray data in the MTZ format containing structure factor
amplitudes, their standard deviations, phases and figures of
merit. If pre-weighted structure factor amplitudes are to be used
to construct initial map, please check the corresponding box in
ARP/wARP flow parameters (see below).
-
Sigma PHIB FOM
- If the MTZ column labels for structure factor
amplitudes, their standard deviations, phases and figures of
merit have obvious names, they will be recognised automatically.
Otherwise please use the scrolling button, navigate to List All
Labels and chose the appropriate ones.
-
file in
- Provide the sequence file in the following format (pir):
- The first line should start with ‘>’
- The second line should be blank
- The sequence (1 letter code) starts from the third line. The
space characters hereafter are ignored.
-
Dock the autotraced chains
- Should the sequence be not available, please
un-check this box in ARP/wARP flow parameters.
-
Total residues in the AU / number of molecules
- Provide the total number
of residues in the asymmetric unit. The number of molecules
is obviously 1 for a monomer. In a case of NCS the number
molecules should be the number of NCS related molecules (e.g. if
you have 2 molecules in the AU with 200 residues each, enter
400 for the total number of residues and 2 for the number of
molecules). If you have a hetero-multimer, e.g. 3α/3β structure, the
NCS order is 3 but please make sure that the sequence file
contains both α and β sequences separated by about 10 alanines:
SEQUENCE_OF_α_SUBUNIT_AAAAAAAAAA_SEQUENCE_OF_β_SUBUNIT
-
Cycles of autobuilding / total cycles
- The default is 10 building cycles
separated with 5 ARP/REFMAC atom update cycles (thus
making 50 cycles in total). In cases of good starting phases the
autobuilding may converge faster; in cases of poorer phases more
cycles may be required. You can always submit warpNtrace
for further cycles using the output of the previous tracing
(protocol automated model building starting from existing
model).
-
Protocol for REFMAC5 / Rfree
- The refinement target gives three
choices:
- The default is to use maximum likelihood target.
- The second choice allows the user to use the SAD target. This
function is based on REFMAC5 developments by Skubak
& Pannu, and allows to refine against the F+/F- data,
when these are available. A prerequisite when this option is
activated, is to also provide a PDB file with the anomalous
scatterers, and define the extent of the ‘anomalous signal’
either by providing the wavelength, or measured f′ and f′′
values. At this point we only provide the ability for one type
of atom to be defined when
values are used. If you have
more than one atom, you just choose the wavelength to fetch
theoretical values - that should in practice work well.
- The third choice is the ’Phased ML’ function, which we
would strongly recommend to NOT use in case of SAD data.
If MAD or MIRAS data are available, you should use it in
conjunction with good quality phase error estimates in the
form of HL coefficients, preferably calculated in a reliable
manner, by e.g. SHARP.
The default is not to use Rfree, since the number of traced residues
serves as excellent indicator of the success of the job. You can certainly
turn the use of Rfree on.
-
Se-Methionine
- If you have Se-methionine substituted protein, regardless
of the use of the refinement function, you can check the box thus
asking ARP/wARP to build and refine Se-Met residues and use these
for better refinement results.
- Now you are ready to start the job: Click on Run and choose Run
now.
There are a number of additional parameters that you normally should not
worry about. A brief description is given below.
- ARP/wARP flow parameters:
-
Use conditional restraints for free atoms
- This allows restraints to be
used to keep free atoms in reasonable places, and it is on by
default.
-
Use Non-Crystallographic Symmetry Restraints
- Indicate
to REFMAC that it should use NCS restraints. This option is new
in version 7.2 and is set by default for X-ray data with resolution
less than 2.3 .
-
Use Non-Crystallographic Symmetry information to extend chains
-
Extend chains using information provided by related parts of the
structure. This option is new in version 7.2 and is set by default
for X-ray data with resolution worse than 2.3 Å.
-
Use Loopy to build loops
- This option allows the loop-filling mode to
be invoked throughout the iterations. The default is on.
-
Dock the autotraced chains to sequence
- The default is to dock the
fragments starting from building cycle 0. This may be changed,
although may not be advantageous. Should the sequence be not
available, the docking can be disabled un-checking the box.
-
Search for helices and strands before each building cycle
-
This is the default for resolution of 2.7 Å or worse. Enabling this
option may also be advantageous for model building at higher
resolution at modest expense of the CPU. Should the model from
helix/strands tracing be more complete than the model from
warpNtrace, the appropriate message will be printed at the end
of the short log file.
-
Pre-weighted Fobs for initial map calculation
- Checking this box
will result in a pool-down menu asking for FBEST label.
-
Number of ARP/REFMAC refinement cycles between autobuilding
-
The default is 5. In cases of poor convergence you can try to
increase this number to 10.
-
Skip the autobuilding for the first cycles
- Checking this box will
disable the autotracing for the provided number of cycles. This
was sometimes advantageous with earlier ARP/wARP versions
when the initial phases were poor.
-
Randomisation of atomic positions
- This also was sometimes
advantageous with earlier ARP/wARP versions when the initial
model bias was high. The default is not to randomise.
-
Iterate the tracing
- Each main chain tracing is carried out in several
rounds. The module will decide on its own how many iterations
are needed. The default maximum number is 5 and it is not
recommended to change this value.
-
Density thresholds for atom removal and addition
-
These parameters are fixed to 3.2 and 1.0, respectively. In cases
of poor convergence, particularly when the number of both
added and removed atoms is considerably less than the number
requested (as can be seen from the log file), the threshold for
atom removal can be slightly increased. Also, at resolution of 2.5
and lower it may be advantageous to decrease the threshold for
atom addition from 3.2 to 3.0 or 2.8.
-
Increase in the number of atoms to be added and removed as compared to the automatically set values
-
The default is 1 (no increase) and it is not recommended to change
this. This option is provided primarily for experienced users.
-
Disable Wilson plot statistics check
- The current Wilson
plot checking routine is probably too stringent. You may disable
the check and the warnings if you are sure that the X-ray data is
of high quality. However, we strongly recommend to not disable
the check and in case of warnings, inspect the plot and only then
proceed.
- Refmac parameters:
-
Attempt to correct for data collected from a twinned crystal
-
Refmac will attempt fully automated twinning (new in version
7.2). This option is incompatible with SAD refinement.
-
Cycles of refinement in each Refmac run
-
Refmac is invoked to refine the hybrid model before the density
maps are computed. The default is 1 cycle if the data extend to a
resolution of 2.3 Å or better, otherwise 3 cycles. There is usually
no need to change this parameter.
-
Damp shifts
- The default is 1.00 for both types of shifts. There is
usually no need to change these parameters.
-
Matrix weight for Xray / Geometry
- The default
is automatic weighting. This proved to work well and there is no
need to change this parameter.
-
Scaling model
- The default is to use simple scaling of the low angle
part of the X-ray data. You can change this to bulk solvent
correction if you are sure that your low angle data below about 8
Å resolution are complete and correct.
-
Scaling B factor
- The default is to use anisotropic B factor for scaling
the X-ray data. You can choose isotropic scaling B factor if your
data are systematically incomplete (e.g. a cone is missing in
reciprocal space).
-
Data with free R label
- This option appears if the free R flag has been
chosen for refinement of the protein part of the model. Here you
can provide a column label for the free R flag.
-
Use of free R reflections
- This option also appears if the free R flag
has been chosen. The scaling and calculation of A coefficients by
Refmac can be computed on the basis of the free reflections (this
is the default) or using all reflections.
-
Solvent mask correction
- The default is to use solvent mask
correction within Refmac.
- Crystal parameters:
-
Space group, Cell, ARP/wARP asymmetric unit, Wilson B factor and Solvent content
-
are derived automatically from the MTZ file and the total number
of residues in the asymmetric unit. They are displayed for information
only and cannot be changed. However, you may want to check
whether their values conform to your expectations.
-
Resolution
- By default all data present in the MTZ file will be used.
You can check the box and then narrow the range if you are aware
of certain deficiencies of your data.
- Submit a remote job at the Hamburg Cluster:
- Checking this button will activate remote submission. This is
described below in a separate chapter.
3.1.2 Command line model building, auto_tracing.sh
The script auto_tracing.sh in the $warpbin directory allows running the
automated model building from the command line without the use of the GUI.
The use of auto_tracing.sh is fairly simple. If invoked without arguments the
script will print help information.
Usage:
auto_tracing.sh \
datafile {mtzfile} \
[residues {number_of_residues_in_AU}] \
[workdir {FULLPATH_WORKING_DIRECTORY}] \
[fp {fp_label}] [sigfp {sigfp_label}] [freelabin {freer_label}] \
[fbest {weighted_amplitude_label}] [phibest {phibest_label}] [fom {fom_label}] \
[modelin {input_PDB_file_to_use_as_initial_model}] \
[seqin {sequence_file_for_one_NCS_copy}] \
[cgr {number_of_NCS_copies (if seqin is provided, default is 1) }] \
[buildingcycles {the_number_of_autobuilding_cycles (default is 10) }] \
[resol {’rmin rmax’ (default is the full resolution range) }] \
[albe {1 to_always_invoke_albe, default is 0 for resol < 2.7A, else 1) }] \
[restraints {1 to use conditional restraints, default is 1 }] \
[twin {1 to try de-twining and twin refinement, default is 0 }] \
[sad {1 to turn on the SAD function refinement, \
needs also ’wavelength’ and ’heavyin’ on input, default is 0 }] \
[compareto {PDB_file_for_comparison}] \
[parfile {parfilename_if_only_parfile_is_to_be_created}] \
- Optional command line arguments are given in square parentheses
- Possible combinations of MTZ labels are:
For start from phases:
fp/sigfp/phibest/fom or fbest/sigfp/phibest to build initial free-atoms model
and fp/sigfp to refine the model
If ’fbest’ is given, ’fom’ will be ignored
For start from a model:
fp/sigfp to refine the model
- All input files are assumed to be located in working directory
unless they are given with full path
- If workdir is not given, the current directory will be assumed
- All output files will be written into workdir/subdirectory
Additional useful tips:
- Normally the job runs in a subdirectory called YYYYMMDD_HHMMSS
To run the job in the current directory use: auto_tracing.sh jobId ’.’
- If you invoke auto_tracing.sh from another script and the keywords with
double-word argument are not properly understood, e.g. resol ’20.0 2.5’,
try resol 20.0;2.5
- If you have a par file from an earlier version of ARP/wARP and would like to
re-run that job now, use: auto_tracing.sh defaults OLD_PAR_FILE
This will create a par file compatible with the current ARP/wARP version
and the keywords, which are new to OLD_PAR_FILE will take their default values
- NCS-based chain extension and NCS restraints with Refmac are applied
automatically if the resolution of the data is equal to or lower than 2.3 A.
Input ’ncsextension 1/0’ to apply / not apply NCS extension regardless of the
resolution of the data. Input ’ncsrestraints 1/0’ has similar effect
Required keyword is: datafile (followed by the mtz-file name with the full
path).
Optional keywords include: residues (followed by the number of residues),
workdir (followed by the absolute path to the working directory), fp (followed
by the fp label), sigfp (followed by the sigfp label), freelabin (followed by the
Rfree label), fbest (followed by the label for the fom-weighted structure factor
amplitudes to be used for initial map calculation), phibest (followed by the best
phi label), fom (followed by the figure of merit label), modelin (followed by a
starting pdb-file with the full path), seqin (followed by a sequence-file
name with the full path), cgr (followed by a number of NSC-related
copies), buildingcycles (followed by the number of building cycles),
resol (followed by the resolution limit), albe (followed by the flag to
enable or not helix/strands building), similarly for restraints, twin and
sad. There are additional parameters, which can be customised, and
an experienced user should have no problem in figuring out how to
do this. Alternatively, please contact the ARP/wARP developers for
advice.
If auto_tracing.sh is called with an option parfile, the script will create a
parameter file and a directory in the workdir whose name will be printed. The
job can subsequently be launched by:
% $warpbin/warp_tracing.sh NAME_OF_PARFILE
If auto_tracing.sh is called without an option parfile, it will also launch
the job. The log files and additional output files as well as the building results
can be found in the directory created.
3.1.3 Remote submission of a model building task
This option offers you the following possibilities:
- Your model building will run using external computational facilities,
where the CPU performance may be superior to your local
installation.
- You can be assured that the most recent working executables will be
used, should you have a problem with your local installation.
- Should the task crash, an automatic notification will be forwarded to
the ARP/wARP developers who can then promptly help you (unless
you have declared your task to be confidential, see below).
- Upon your wish you can share the results of the completed task with
software developers.
3.1.3.1 Submitting from the GUI
Clicking on the button with “Submit the job for remote execution at the
Hamburg cluster” within the main ARP/wARP Classic GUI panel allows one to
execute an autotracing task remotely. The panel will expand and ask for an
email address to be provided. Please also choose one of the options from
the drop down menu to indicate how you would like your data to be
handled.
The options are:
- The data must be kept confidential and deleted after the job has
finished.
- The data can be made available to ARP/wARP, AutoRickshaw or
Refmac developers.
- The data can be archived and made available to any software
developer that requests them (this is default).
Option 2 will only allow the data share to the ARP/wARP, Auto-Rickshaw
and Refmac development teams. Option 3 will extend the share to anyone who
requests the data. In case of option 1 only the short log file, Wilson/omega log
files and the parameter file will be kept by the ARP/wARP developers, all other
data (input PDB, PIR and MTZ files) as well as log files will be automatically
deleted one week after the job has finished.
Once the job has been submitted for remote execution, the GUI window will
indicate that the job has finished. Please inspect the log file from the pull-down
menu option “View files from job” for further instructions. An email will be sent
to you at the email address that you entered in the GUI window. Please follow
the instructions in the email (http link, login and password) to connect to the
Hamburg cluster. You can then monitor the log file in your browser window. As
soon as the job is finished, you will be provided with a link to the results that you
can then download. Keep in mind that once the job is finished, your data will be
kept for one week only. Make sure that you download your data within that
time.
The remote job submission relies on the curl software installed at your site.
Availability of curl is checked while installing ARP/wARP and a warning is
given if curl is not available.
3.1.3.2 Submitting from a web browser
Navigate your browser to:
or choose model building via the web at:
- View the Disclaimer as well as the ARP/wARP and the CCP4
licensing conditions.
- Proceed with the remote services to Step One.
- Choose the model building protocol (start from experimental phases
or existing model).
- Enter your Email address to which instructions on how to view the
results will be send.
- Provide your MTZ file by using the ‘Browse’ button, the file must have
an extension .mtz.
- Click ‘Proceed to Step Two’.
- Enter starting model (unless you have chosen a protocol to start from
experimental phases).
- Enter the total number of residues and the number of chemically
identical molecules in the asymmetric unit. Please make sure you
enter these two numbers right. If, for example, the asymmetric unit
contains a dimer with each subunit having 50 residues, then you enter
100 and 2, respectively.
- Enter MTZ labels. FP and SIGFP are compulsory for model building
starting from the existing model. PHI is additionally needed (and
FOM is optional) for start from experimental phases.
- Click on ‘I agree to cite the required references and would like to
proceed with ARP/wARP remote services’. This uploads the files to
the cluster in Hamburg, launches the job and, after a few minutes
delay, sends you an Email with instructions for viewing.
- Please follow the instructions in the email (http link, login and
password) to connect to the Hamburg cluster. You can then monitor
the log file in your browser window. As soon as the job is finished, you
will be provided with a link to the results that you can then download.
Keep in mind that once the job is finished, your data will be kept for one week only.
Make sure that you download your data within that time.
3.1.4 Output files, short log file
The following information could be useful when interpreting the log messages
that are produced when running ARP/wARP.
-
Checking the estimated content
- Should the solvent content be too high
or too low (e.g. you have mis-typed the total number of residues
expected in the AU), ARP/wARP will re-set it to approximately 50%.
The target number of residues will be reset accordingly.
-
Checking the provided sequence file
- Should the sequence length, the
number of molecules in the AU and the total number of residues in
the AU not match each other, the number of molecules in the AU will
be reset accordingly. Should the sequence file not be interpretable (e.g.
contain unexpected characters), an error message will be given.
-
Input MTZ file
- We have observed that sometimes the MTZ files do not
have proper headers, e.g. non-standard space group name or zero
space group number. ARP/wARP uses CAD programme to always
do a header fix, thus the MTZ file may have an extension .mtz.cad.
-
Space group number
- ARP/wARP version 7.2 supports all standard
non-centrosymmetric space groups, P1bar and several non-standard
space groups (e.g. 1017 or 2017). The space group is figured out solely
from the symmetry operators stored in the MTZ file header.
-
Input files
- The ASCII files (sequence, input PDB or input file with heavy
atoms) are always converted to a Unix line feed, thus they have an
extension _lf.
-
Checking whether input PDB contains ligands
- This check comes up if
the initial model is available. Should the model contain ligands
unknown to the Refmac library, they are renamed to the free DUM
atoms. This should not affect the model building performance, but the
warning is printed.
-
R factor after Refmac before model building
- If the initial model
is available, a number of restrained refinement cycles with Refmac is
carried out until R factor convergence.
-
Building cycle zero
- Normally one should expect a considerable part of the
structure built already at the starting building cycle. If this is not
the case, observe the situation for a few further building cycles. If,
however, there is essentially nothing autotraced for further building
cycles, please inspect whether the initial phases are sufficiently good.
-
Search for helices and strands
- The module for building helical and
beta-stranded fragments is invoked if requested or by default with
data at 2.7 Å resolution or lower. The number of built helical/stranded
residues and chain fragments is printed.
-
Rounds within building cycle
- Each cycle of the main chain tracing is
carried out in several rounds. Normally each successive round should
result in more residues and in fewer fragments. The maximum length
of the traced fragment and the score of the model building are also
printed for information.
-
Chains, residues and estimated correctness of the model
- The
output from the best tracing round is processed further. Fragments
of 4 residues or shorter are converted to free atoms. In addition,
the terminal residues of the fragments are removed. The rest is kept
and used to provide restraints for subsequent ARP/REFMAC cycles.
The value of the estimated correctness of the model should steadily
approach 100% if the tracing is successful.
-
Residues docked into sequence
- If the
sequence is provided, the autotraced fragments are docked into it and
the side chains are built and refined in real space. The results of this
are printed out. If the sequence is not provided, side chain guesses
only (GLY/ALA/SER/VAL) are built and refined.
-
Loop building
- This is invoked if the sequence is available and if the tracing
score is above 0.85. It is also invoked after the last building cycle.
-
R factor after Refmac during the iterations
- The value of the R factor
typically oscillates. It goes up after each tracing cycle (because
the model is entirely rebuilt) and then decreases during the
ARP/REFMAC refinement and update cycles. At the end of the
procedure it should reach a value typical for a restrained refinement.
-
Sequence coverage
- If the sequence is provided, the ratio of the number of
docked residues to the total number of traced residues is printed. A
value higher than 0.8 is deemed as good convergence. All free atoms
are then removed from the file and the task is directed into a few
cycles of restrained refinement with solvent search. If, however, the
value of sequence coverage is lower than 0.8, the free atoms (DUM)
are left in the file. You can inspect the density maps, start changing
the model on the graphics or, alternatively, submit another model
building task using the output of this job.
-
Job termination
- The statement Task completed successfully indicates that the
job is finished with no error. An error statement:
QUITTING ... ARP/wARP module stopped with an error message: name_of_the_program
indicates that one of the modules of the task has terminated with an error
message. Please refer to the specified log file.
-
CPU requirements
- Automated protein model building may be time consuming.
Using a standard protocol of 10 building cycles interspaced with 5
ARP/REFMAC cycles, one should expect a job for a structure of 500
residues to be completed within about 1 hour (subject to the power of the
computer you are using).
3.1.5 Running protein model building from the GUI, ARP/wARP Expert
System
This protocol has not changed since version 7.1 except the changes in the
underlying model building programs. The Expert System has the same aims as
ARP/wARP Classic: to automatically build protein structures, starting either
from molecular replacement models or experimental electron density
maps.
A main difference is that this module, when a model is ‘more or less
complete’, it will use the typically available second CPU core to start a new job to
clean up the model, add waters, and refine it, and make it available to the
user. In parallel, the old job will continue to see if it can find a better
solution, with more residues, but the user does not need to wait for that to
finish.
Another difference concerns the sequence file. If you have hetero-multimers
in the asymmetric unit of your crystals, you should add each sequence
separately, by clicking the Add Input PIR file button. Then, you can define any
stoichiometry for complicated hetero-multimers. For each defined sequence the
user can select from a pull-down menu the number of copies in the asymmetric
unit. Based on that and the contents of the PIR file the contents of the AU in
residues will be calculated automatically.
The input files are identical to those with the ARP/wARP Classic
module.
There is a dedicated option to select that the Methionines are Se-Met residues
if the dataset comes from a SAD or MAD experiment on a selenium edge; the
SAD and TWIN functions are also implemented.
The number of refinement and building cycles are not fixed, but are defined
on the fly based on the programs progression. The Decision parameters are
defining these limits. If you leave the mouse over one of the input fields, a help
text will appear explaining the use of each decision parameter.
The parameter maximum number of processes in parallel is important. When
Expert System decides that it has reached a more-or-less useful model, it will
spawn a ‘cleaning up and completion’ process. However it will continue the
iterative building in parallel. If the iterative building results in a better model, a
new ‘cleaning up and completion’ process will be requested, possibly before the
previous ‘cleaning up and completion’ process has finished. If you have only two
processors (typical these days in dual core systems) the new process
will be ’queued’; when the previous one is finished the new one will
start.
3.1.6 Running flex-wARP from command line
Please type to get on-line help:
python $pywarpbin/CAutoPyWARP.pyc --help
3.2 Automated Construction of Helical and Beta-Stranded Fragments
3.2.1 Building secondary structure from the GUI, ARP/wARP Quick
Fold
The procedure for building secondary structural elements is based on the use
of discriminant analysis in a successive filtering scheme taking into account
the geometry of alpha-helical and beta-stranded main-chain fragments.
The electron density map is first analysed and a suitable threshold is
automatically selected. In the next step stereochemical information on the
helix and strand geometry is used; sets of overlapping fragments are
constructed and filtered based on their geometric likelihood. All fragments that
overlap at a particular location of a helix or a strand undergo an ensemble
averaging process to provide the best estimate of CA positions. The output
fragments are then regularised and the chain direction is chosen on the
basis of their fit to the density. Finally the fragments are refined in real
space.
The accuracy of the resulting model depends on many parameters. The
module should be able to build helices and strands at resolutions as low as 4.5 Å.
However, it may not result in complete helical/stranded structure and it may
also contain parts that are mis-interpreted. The expected top performance is the
correct location of 90% of the helices and 50% of the strands. The procedure is
relatively fast and takes only a few minutes for proteins of moderate size (up to
500 residues).
The secondary structure recognition module is optimised to address lower
resolution data and hard cases where, e.g. the full model building protocol has
not been successful. For a resolution higher than 2.6 Å the module will
automatically trim the resolution and Wilson B-factor of the data to approach its
design conditions.
- Launch ARP/wARP Quick Fold window within the CCP4i GUI.
- Provide required input:
-
MTZ in
- X-ray data in the MTZ format containing structure factor
amplitudes and their standard deviations, phases and foms.
-
Fobs Sigma Phib FOM
- If the MTZ column labels for structure factor
amplitudes, their standard deviations, phases and figures of
merit have obvious names, they will be recognised automatically.
Otherwise please use the scrolling button, navigate to List All
Labels and choose appropriate ones.
-
Output PDB file
- Provide the PDB file name where the constructed
secondary structure fragments will be output to.
- Set parameters:
-
Number of residues
- Provide the expected number of residues in the
asymmetric unit. This is optional but, if given, should be a good
guess within ± 20% of the true number.
-
Do NOT build beta-strands
- If you have real doubts about your
structure having a fold with a significant content of beta-strands,
you can deactivate their construction by checking the box.
- Now you are ready to start the job: Click on Run and choose Run
now.
There are a number of additional parameters that you normally should not
worry about. A brief description is given below:
- Crystal parameters:
-
Space group, Cell, ARP/wARP asymmetric unit, Wilson B factor and Solvent content
-
are derived automatically from the MTZ file and the total number
of residues in the asymmetric unit. They are displayed for information
only and cannot be changed. However, you may want to check
whether their values conform to your expectations.
-
Resolution
- By default all data present in the MTZ file will be used.
You can check the box and then narrow the range if you are aware
of certain deficiencies of your data.
- Coordinate comparison:
-
Compare with an already deposited protein for validation or testing
-
If you have the final model and would like to check the installation
and the performance of the software, you can check this box.
You will then have to provide a PDB file that will be used for
comparison.
3.2.1.1 Output files, short log file
The following information could be useful when interpreting the log messages
that are produced when running Quick Fold.
-
Checking the estimated content
- Should the solvent content be too high
or too low (e.g. you have mis-typed the total number of residues
expected in the AU), ARP/wARP will re-set it to approximately 50%.
The target number of residues will be reset accordingly.
-
Residues and chain fragments
- The important numbers are highlighted in
red/bold in the short log file, indicating the number of residues and
the number of fragments into which these residues are arranged. The
higher the values of the Connectivity index and the Tracing score, the
more complete and reliable the resulting model is. The length of the
longest chain is also printed.
-
Further extension of the model
- You may try to feed the PDB output of the
module into Classic or flex-wARP. However, subject to the resolution
of the data, this may not provide enough seed for subsequent
automatic tracing of the full chain.
-
Job termination
- The statement Task completed successfully indicates that the
job has finished with no error. An error statement:
QUITTING ... ARP/wARP module stopped with an error message: name_of_the_program
indicates that one of the modules of the task has terminated with an error
message. Please refer to the specified log file.
3.2.2 Building secondary structure from the command line, auto_albe.sh
The script auto_albe.sh (where albe stands for alpha-beta) in the $warpbin
directory allows you to run the secondary structure building as a single-line
command without the use of the GUI. The use of auto_albe.sh is fairly
simple. The script prints out help information if it is invoked without
arguments.
Usage:
$warpbin/auto_albe.sh \
datafile {mtzfile} \
[residues {number_of_residues_in_AU}] \
[workdir {FULLPATH_WORKING_DIRECTORY}] \
[helixfileout {output_PDB_file}] \
[jobId {desired_job_id_used_for_subdirectory_naming}] \
[fp {fp label} sigfp {sigfp label} phib {phi label}] \
[fom {fom label}] (input ’fom none’ if no fom is to be used) \
[compareto {PDB_file_for_comparison}] \
[nostrands {0 or 1, default=0}] \
[parfile {parfilename_if_only_parfile_is_to_be_created}]
- Optional command line arguments are given in square parentheses
- All input files are assumed to be located in working directory
unless they are given with full path
- If workdir is not given, the current directory will be assumed
- All output files will be written into workdir/subdirectory
Required keyword is: datafile (followed by the mtz-file name with the full
path).
Optional keywords include: residues (the expected number of residues in the
asymmetric unit), workdir (followed by the full path to the working directory),
helixfileout (the name of the PDB file where the traced both helical and
stranded fragments will be output to), jobId (if you wish that the working
sub-directory has a particular name), fp (followed by the fp label), sigfp
(followed by the sigfp label), phib (followed by phibest label) and fom (followed
by the label to fom). The defaults are FP, SIGFP, PHI and FOM, respectively.
Alternatively, if the mtz file contains only one column for structure factor
amplitudes and only one column for their standard deviations, these
will be taken. If you wish FOM not to be used, please input fom none.
For test purposes, the constructed helices/strands can be compared to
known reference models (hand- or pre-fitted). The required keyword is
compareto (followed by the full-path name of a PDB file). You can also
enable/disable the construction of strands using the keyword nostrands, the
default is 0 (build the strands). If auto_albe.sh is called with an option
parfile, the script will create a parameter file and a directory in the
workdir whose name will be printed. The job can subsequently be launched
by:
% $warpbin/warp_albe.sh NAME_OF_PARFILE
If auto_albe.sh is called without an option parfile, it will also launch the job.
The log files and additional output files as well as the building results can be
found in the directory created.
3.3 Automated Loop Building
3.3.1 Running loop building from the GUI, ARP/wARP Loops
This module tries to find likely loops to connect fragments of a partial protein
structure based on the sequence and the density map. It builds the loops in three
phases. First a tree of possible CAs between the fragments is build, next the
unlikely ones are removed and the rest of the main chain atoms determined, and
finally the best loops are selected. The tree can be build either towards the
C-terminus of the N-terminus of the protein, or both. The built loops are ordered
(in descending order) according to the density correlation at the main chain
atoms (including CB if present) or the correlation of the side chains, or a
combination of both. If the number of loops exceeds the chosen number only the
best are saved to file.
- Launch the ARP/wARP Loops window within the CCP4i GUI
- Provide required input:
-
Building loops
- Select whether to start from a map or an mtz file.
-
Mode loop building
- Select whether to try to build all loops in the
PDB file (a sequence file will be needed) or to build a specific
loop
-
MTZ in
- X-ray data in the MTZ format containing structure factor
amplitudes and their standard deviations.
-
Fmap PHImap
- If the MTZ column labels for structure factor
amplitudes and their standard deviations have obvious names,
they will be recognised automatically. Otherwise please use
the scrolling button, navigate to List All Labels and chose
appropriate ones.
-
Protein model for loop building
-
Provide the PDB file with coordinates of the protein. Note that
the module will only attempt to build missing loops and will not
rebuild any of the existing residues.
-
New loops output file
- Provide the name of the PDB file where the
built loops will be written to.
-
Protein and new loops combined output
- Provide the name of the
PDB file where the protein model together with the built loops
will be written to.
- Click on Run and choose Run now
There are a number of options that can be added. A brief description is given
below.
- Definition of loop:
-
Build a loop
- Provide anchor residues of a fragment on the N and
the C terminus side of the protein. If you want to rebuild some
terminal residues, you need to remove them from the input PDB
file. Provide the length of the loop including the two anchor
points.
- Selecting best loops:
-
Deviation distance loop connection
- Set the allowed error in the
CA-CA distance.
-
CA density correlation threshold
- This number sets the number of
best loops kept based on the density correlation of the CAs only.
-
Structural threshold
- Set the threshold for the minimal value for the
log likelihood of this structure. Set the minimum value, if you
want to ensure to keep at least a certain number of loops after
pruning. Set the maximum value, if you want to ensure that the
number of loops doesn’t exceed a certain amount after structural
pruning.
-
Main chain density correlation
- This parameter sets the number of
best loops kept.
- Selecting best CAs:
-
Likelihood threshold
- This is the threshold for a CA to represent the
fifth CA of a penta-peptide, based on density correlation, CA-CA
distance and structure.
-
Minimum distance CAs
- Measures the minimal distance between
CAs from the same shell. The CA with the best likelihood is kept.
- Generating CAs:
-
Select generation CA shell
- By default a shell with a uniform and
regular distribution of CAs at exactly CA-CA distance is
generated. You can also choose for a uniform and random
distribution of the CAs. In that case the shell is generated with a
given thickness.
-
Number of CAs
- Number of CAs generated within a shell.
-
CA-CA distance
- Distance to use between successive CAs.
-
Keep CAs with negative density halfway
- Default for this option is
not to keep the atoms.
- Crystal parameters:
-
Space group and Cell
- are derived automatically from the MTZ and
the PDB files, displayed for information only and cannot be
changed. However, you may want to check whether their values
conform to your expectations.
- Log files of Loopy:
-
Message level
- Choose a value between 0 and 9, the default is 4
-
Abort level
- If a message at this level is encountered, the module will
abort. The default value is 8.
-
Message file
- Name for the message file (plain text).
-
XML output file
- Name for the XML message file (xml format).
3.4 Automated Building of Poly-Nucleotides
3.4.1 Running nucleotide building from the GUI, ARP/wARP DNA/RNA
This module builds fragments of DNA or RNA. The input is an MTZ file
containing the phases from which the map best describing the nucleotide region
can be computed. Thus the map could be a difference map (e.g. after the protein
model is completed) or a sigma-weighted map for the whole asymmetric
unit. The nucleotide building procedure within ARP/wARP Version 7.2
proceeds in several steps: first it locates putative phosphates in the density
map, then uses them in a manner analogous to the CA-candidates for
protein chain tracing. After the nucleotide fragments are obtained, a likely
base is built and refined in real space. The type of the base is currently
limited to A (large) or C (small) and the nucleotide sequence is not yet
used.
The produced poly-nucleotides are quite accurate, a typical rmsd for the built
backbone atoms is 0.6 Å with X-ray data extending to around 3.0 Å resolution.
The method is not sensitive to a particular DNA or RNA conformation. The
module is not very CPU efficient and may take about 10 minutes for a
20-nucleotide structure.
- Launch the ARP/wARP DNA/RNA window within the CCP4i GUI
- Provide required input:
-
MTZ in
- X-ray data in the MTZ format containing structure factor
amplitudes and their standard deviations.
-
Fobs Sigma PHIB FOM
- If the MTZ column labels for structure factor
amplitudes and their standard deviations have obvious names,
they will be recognised automatically. Otherwise please use
the scrolling button, navigate to List All Labels and chose
appropriate ones. FOM is optional and could be omitted if Fobs
are already FOM-weighted.
-
Output PDB file
- Provide the PDB file name where the constructed
polynucleotide fragments will be output to.
- Click on Run and choose Run now
There are a number of options that can be added. A brief description is given
below.
-
-
Space group, Cell, ARP/wARP asymmetric unit, Wilson B factor and Solvent content
-
are derived automatically from the MTZ file and the total number
of residues in the asymmetric unit. They are displayed for information
only and cannot be changed. However, you may want to check
whether their values conform to your expectations. Obviously, if
you entered zeros as the expected number of residues and nucleotides,
the solvent content will be displayed as 1.0 but you should not
worry about this.
-
Resolution
- By default all reflections present in the MTZ file will be
used. You can check the box (Use reflections between) and then
narrow the range if you are aware of certain deficiencies of your
data.
3.4.1.1 Output files, short Log File
The following information could be useful when interpreting the log messages
that are produced when building DNA/RNA.
-
Checking the estimated content
- Should the solvent content be too high
or too low (e.g. you have mis-typed the total number of residues
expected in the AU), ARP/wARP will re-set it to approximately 50%.
The target number of residues will be reset accordingly.
-
Phosphate candidates
- The identified number of phosphate candidates is
typically 100 times higher than the number of nucleotides in the
structure.
-
Nucleotides and chain fragments
- The important numbers are highlighted
in red/bold in the short log file, indicating the number of nucleotides
and the number of fragments into which these residues are arranged.
The length of the longest chain is also printed.
-
Job termination
- The statement Task completed successfully indicates that the
job has finished with no error. An error statement
QUITTING ... ARP/wARP module stopped with an error message: name_of_the_program
indicates that one of the modules of the task has terminated with an error
message. Please refer to the specified log file.
3.4.2 Running nucleotide building from the command line, auto_nuce.sh
The script auto_nuce.sh in the $warpbin directory allows you to run the
secondary structure building as a single-line command without the use of the
GUI. The use of auto_nuce.sh is fairly simple. The script prints out help
information if it is invoked without arguments.
Usage:
$warpbin/auto_nuce.sh \
datafile {mtzfile} \
[residues {number_of_protein_residues_in_AU}] \
[nucleotides {number_of_nucleotides_in_AU}] \
[workdir {FULLPATH_WORKING_DIRECTORY}] \
[fp {fp_label}] [sigfp {sigfp_label}] [fbest {weighted_amplitude_label}] \
[phib {phib_label}] [fom {fom_label}] \
[resol {’rmin rmax’ (default is the full resolution range) }] \
[compareto {PDB_file_for_comparison}] \
[parfile {parfilename_if_only_parfile_is_to_be_created}] \
- Optional command line arguments are given in square parentheses
- Possible combinations of MTZ labels for map calculation are:
fp/sigfp/phib/fom or
fbest/sigfp/phib if fbest is already fom-weighted.
- In the latter case if ’fbest’ is given, ’fom’ will be ignored
- All input files are assumed to be located in working directory
unless they are given with full path
- If workdir is not given, the current directory will be assumed
- All output files will be written into workdir/subdirectory
Required keyword is: datafile (followed by the mtz-file name with the full
path). In difference to the functionality offered from the CCP4 GUI, datafile can
also be a density map.
Optional keywords include: residues (the expected number of residues in the
asymmetric unit), nucleotides (the expected number of nucleotides in the
asymmetric unit), workdir (followed by the full path to the working directory),
fp (followed by the fp label), sigfp (followed by the sigfp label), phib (followed
by phibest label) and fom (followed by the label to fom). The defaults are FP,
SIGFP, PHI and FOM, respectively. Alternatively, if the mtz file contains only one
column for structure factor amplitudes and only one column for their standard
deviations, these will be taken. If you wish FOM not to be used, please fbest.
You can set resol (followed by the resolution limit). For test purposes, the
constructed model can be compared to known reference model. The
required keyword is compareto (followed by the full-path name of a PDB
file).
If auto_nuce.sh is called with an option ‘parfile’, the script will create a
parameter file and a directory in the workdir whose name will be printed. The
job can subsequently be launched by:
% $warpbin/warp_nuce.sh NAME_OF_PARFILE
If auto_nuce.sh is called without an option ‘parfile’, it will also launch the job.
The log files and additional output files as well as the building results can be
found in the directory created.
3.5 Automated Ligand Building
3.5.1 Running ligand building from the GUI, ARP/wARP Ligands
The ligand building procedure within ARP/wARP Version 7.2 proceeds
in three steps: first it locates the binding site in the difference density
map, then builds there a number of putative ligand models and, finally,
selects the best model, which is geometrised and real-space fit into the
density.
The binding region is selected automatically by matching ligands
shape-related properties to the regions of high density. The chosen region is
parameterised by a sparse set of putative positions (grid nodes) for the ligand
atoms. For the construction of the ligand into this sparse set two algorithms
are used. One exploits the combinatorial assignment of the ligand atom
identities to the grid nodes, ‘label swap’. Another algorithm maximises the
overlap between the sparse set and the ligand model by a random search in
conformational space. The output from both algorithms is merged and then
undergoes a last stage of real-space refinement before the final model is
selected.
The accuracy of ligand building is mainly dependent on ligand size and the
resolution of the X-ray data. As a rough guide, about 75% of well-ordered
ligands of a size around 20 to 40 non-hydrogen atoms should be built within
r.m.s.d. of 1.0 Å from their correct location. Thus the constructed models should
be accurate enough for REFMAC5 to straightforwardly refine the protein-ligand
complex. The procedure can be iterated to locate additional ligands, if any are
present.
The ARP/wARP ligand building module requires the X-ray data (in MTZ
format), the built protein without ligands (in PDB format) and a template model
of the ligand to build (in PDB format). Options include the possibility to specify
the binding site and the number of starting grids, the ability to compare the run
result to some reference ligand(s), and the possibility to build a ligand taken from
a list of candidates (‘cocktail’). In the latter case the coordinates of the ligand
candidates should be concatenated into a single PDB file. The different ligands
must be distinguished by their residue name (columns 18-20), chain identifier
(column 22) or residue sequence number (columns 23-26). ARP/wARP will
automatically choose the best-matching ligand candidate and will attempt to
build it at the binding site, either determined automatically or supplied
by the user. However, since this feature is new, the specification of the
binding site (see below) is recommended. One can also specify that only
well-resoloved parts of a partially occupied ligand are built and this can be done
automatically.
- Launch the ARP/wARP Ligands window within the CCP4i GUI
- Provide required input:
-
MTZ in
- X-ray data in the MTZ format containing structure factor
amplitudes and their standard deviations.
-
Fobs Sigma
- If the MTZ column labels for structure factor amplitudes
and their standard deviations have obvious names, they will
be recognised automatically. Otherwise please use the scrolling
button, navigate to List All Labels and chose appropriate ones.
-
Protein model without ligand
- Provide the PDB file with coordinates
of the protein only. If the file contains solvent atoms, free
atoms or fragments of other ligands, please make sure that their
location is not overlapping with the supposed location of the
ligand or have them removed prior to running ligand building.
-
Ligand molecule coordinates
- Stereochemical information about the
ligand to be built is read in a form of a PDB file. This file
should contain the ligand molecule only. The molecule can be in
any conformation. However the interatomic distances, bonding
angles and the chirality (if present) should in a sensible way
correspond to the target stereochemistry of the ligand to be
built. Please also check that there is atom-bonded connectivity
throughout the whole target ligand molecule (i.e. you do not
accidentally have several unconnected clusters of atoms) and
that there are no atoms that are too close to each other (distance
< 0.6 Å).
- Click on Run and choose Run now.
There are a number of options that can be added either in the main
GUI panel (scrolling bar Build the ligand) or under the Parameters section.
You normally should not need to worry about these (except you want
the ligand to be build around the known location or you would like to
screen a list of candidates, ‘ligand cocktail’). A brief description is given
below.
- Optional parameters:
-
Build the ligand
- (Binding site location)
-
In the most likely place of the complete asymmetric unit
-
(default)
-
around the same approximate place as a previous ligand
-
The binding site is defined by the position of a compound
known to bind at the desired location. If you use this option,
the region is required in form of a PDB file, (previous ligand
coordinates).
-
around an approximate XYZ position
- The
binding site is defined by (X, Y, Z) Cartesian coordinates and
a search radius in (option Search for the ligand around).
-
Refmac5
- By default the fast protocol is used (1 cycle of refinement). If your
PDB file needs considerable pre-refinement with Refmac before the
difference electron density map can be computed, you can chose the
slow protocol (3 cycles of refinement).
-
Free R Flag
- The default is not to use R-free for Refmac refinement. You can
chose to use R-free, this will cause additional options to appear within
the section Refmac parameters.
-
Ligand building cycles
- defines the number of grid parameterisations of
the binding region. The default value is 2. There is one run of each
ligand building algorithm for each starting grid, therefore the CPU
time required for building is proportional to this number of
cycles.
-
Assume partial occupancy of ligand
- Check this box if you wish to model
a partially occupied ligand.
- Refmac parameters:
-
Cycles of refinement for Refmac run
- Refmac is invoked to refine
your protein part of the structure before the difference density
map is computed. The default is 1 cycle for the fast protocol and
3 cycles for the slow protocol, see above.
-
Matrix weight for Xray / Geometry
- The default is automatic
weighting and there is no need to change this parameter.
-
Input a user-defined library file
- In case your input protein is already
a protein-ligand complex then Refmac will have to refine both
entities together in order to obtain a difference electron density
map. If you already have a Refmac-style cif library for your
already present ligand, you can input it here. Otherwise, Refmac
will use its own library if it knows the ligand. If it does not, it will
generate a cif file for the ligand and proceed.
- Crystal parameters:
-
Space group, Cell, ARP/wARP asymmetric unit, Wilson B factor and Solvent content
-
are derived automatically from the MTZ file and the total number
of residues in the asymmetric unit. They are displayed for information
only and cannot be changed. However, you may want to check
whether their values conform to your expectations.
-
Resolution
- By default all reflections present in the MTZ file will be
used. You can check the box (Use reflections between) and then
narrow the range if you are aware of certain deficiencies of your
data.
- Test and comparison parameters:
-
Compare with an already fitted ligand
- If you have the final model of
the ligand in the correct orientation and would like to check the
installation and the performance of the software, you can check
this box. You will then have to provide a PDB file that will be
used for comparison.
3.5.1.1 Output files, short Log File
The following information could be useful when interpreting the log messages
that are produced when building ligands.
-
Refinement with refmac
- The R factor (and R free if requested) are printed
after refinement of the protein part only with Refmac. Check that
the value of the R factor is reasonable. A value higher than about
30% may indicate that the computed difference map may be too
noisy for location of the ligand. A failure may indicate invalid atom
nomenclature in your PDB file.
-
The ligandbuild program
- The mapping of the difference density synthesis
parameterised with grid points onto the ligand atoms (ligandbuild
and M_ligandbuild) is run as many times as defined by the number of
ligand building cycles. A failure may indicate incorrect identification
of the binding site. This can be amended by defining the binding site
manually prior to the run (see above).
-
Real space fit
- Up to 108 top constructed ligand models undergo a
real-space refinement with respect to the difference density map. The
best solution is output. If the test and comparison option is selected,
the r.m.s.d. to the reference PDB file (XYZREF) is also printed. There
will be a warning given if the stereochemistry of the constructed
ligand is poor. Also a warning will be given if the constructed ligand
molecule has severe steric clashes, which may be a sign of an incorrect
ligand building. You may want to inspect the ligand and the density
and, if there is a clear part of the ligand that is disordered, try to
remove it from the ligand target PDB file and to re-run the job.
-
Job termination
- The statement Task completed successfully indicates that the
job has finished with no error. An error statement:
QUITTING ... ARP/wARP module stopped with an error message: name_of_the_program
indicated that one of the modules of the task has terminated with an error
message. Please refer to the specified log file.
3.5.2 Running ligand building from the command line, auto_ligand.sh
The script auto_ligand.sh in the $warpbin directory allows you to run the
ligand building as a single-line command without the use of the GUI. The use of
auto_ligand.sh is fairly simple. The script prints out help information if it is
invoked without arguments.
Usage:
auto_ligand.sh \
datafile {either mtzfile or mapfile} \
protein {starting_PDB_file_without_ligand} \
ligand {PDB_file_with_ligand_to_fit} \
[workdir {FULLPATH_WORKING_DIRECTORY}] \
[ligandfileout {output_PDB_file}] \
[fp {fp_label}] [sigfp {sigfp_label}] [freer {freer_label}] \
[nligandcycles {number_of_ligandbuild_cycles (default is 2)}] \
[search_model {PDB_file_with_model_at_expected_ligand_site}] \
[search_position {X Y Z}] \
[search_radius {radius_in_angstroms}] \
[reflist {textfile_with_FULLPATHnames_of_fitted_ligands_for_comparison}] \
[extralibrary {user_defined_library_for_Refmac5}] \
[partial {0 for modelling the whole ligand and 4 or higher number to \
model partially occupied ligand (giving 4 would mean to consider \
4-atoms as the smallest ligand fragment)] \
[parfile {parfilename_if_only_parfile_is_to_be_created}]
- Optional command line arguments are given in square parentheses
- All input files are assumed to be located in working directory
unless they are given with full path
- If workdir is not given, the current directory will be assumed
- All output files will be written into workdir/subdirectory
Required keywords are: datafile (followed by the mtz-file name with the
full path), protein (followed by the pdb-file name of the protein model
without the ligand with the full path) and ligand (followed by the pdb-file
containing the ligand(s) description with the full path). In difference to the
functionality offered from the CCP4 GUI, datafile can also be a density
map.
Optional keywords include: workdir (followed by the full path to the
working directory), fp (followed by the fp label), sigfp (followed by the
sigfp label). The defaults are FP and SIGFP, respectively. Alternatively, if
the mtz file contains only one column for structure factor amplitudes
and only one column for their standard deviations, these will be taken.
The number of ligand building cycles (default is 2) can be changed with
keyword nligandcycles. The approximate location of the binding site can be
supplied by the user either by providing the pdb-file(s) of a ligand (or
a just a list of atoms) located at the binding site (search_model), or by
specifying the (XYZ) coordinates of a point defining the binding region using
search_position and search_radius (default value for the latter is 5 Å. For test
purposes, the constructed ligand can be compared to known reference models
(hand- or pre-fitted). The required keyword is reflist (followed by the
full-path name of a text file, containing a list of pdb-files with the reference
ligands and their absolute paths). Building of partially occupied ligand
can be requested using the keyword partial following by a number 4
or higher. A user-defined ligand library can be input using keyword
extralibrary.
To build the ligand from a list of candidates (‘cocktail’), the coordinates of the
ligand candidates should be concatenated into one file specified by the above
mentioned keyword ligand. The different ligands must be distinguished by their
residue name (columns 18-20) in the concatenated pdb file (different chain
identifier or residue sequence number will do as well, however we recommend
to use different residue names). ARP/wARP will automatically choose the
best-matching ligand candidate and will attempt to build it at the binding site,
either determined automatically or supplied by the user. Supplying the binding
site using search_model or search_position keywords is an alternative to this
method.
To build the partially occupied ligand enter keyword partial with the
appropriate parameter defining the size of the smallest ligand fragment.
ARP/wARP will automatically choose the best-matching ligand fragment and
will attempt to build it at the binding site, either determined automatically or
supplied by the user.
If auto_ligand.sh is called with an option parfile, the script will create a
parameter file and a directory in the workdir whose name will be printed. The
job can subsequently be launched by:
% $warpbin/warp_ligand.sh NAME_OF_PARFILE
If auto_ligand.sh is called without an option parfile, it will also launch the job.
The log files and additional output files as well as the building results can be
found in the directory created.
3.6 Automated Solvent Building
3.6.1 Running solvent building from the GUI, ARP/wARP Solvent
Within solvent building module restrained reciprocal space refinement is
carried out with REFMAC while ARP/wARP is performing automatic
adjustment of the solvent structure. Resolution of the data should be 2.5 Å or
higher. The output is the protein model with the solvent molecules transformed
with symmetry operations to lie around the protein.
The ARP/wARP solvent building module requires the X-ray data (in MTZ
format) and the protein model (in PDB format) without solvent or with a partial
solvent model.
- Launch the ARP/wARP Solvent window within the CCP4i GUI.
- Provide required input:
-
MTZ in
- X-ray data in the MTZ format containing structure factor
amplitudes and their standard deviations.
-
Fobs Sigma
- If the MTZ column labels for structure factor amplitudes
and their standard deviations have obvious names, they will
be recognised automatically. Otherwise please use the scrolling
button, navigate to List All Labels and chose appropriate ones.
-
Starting model in
- Provide the PDB file with coordinates of the
protein only. If the file already contains some solvent sites, these
will be updated during the iterative solvent building.
-
Output model
- Provide the name of the file where output PDB of the
protein with the built solvent will be written to.
- Click on Run and choose Run now
There are a number of options that can be added. A brief description is given
below.
- Required parameters:
-
ARP/REFMAC refinement cycles
- By default 20 cycles will be carried
out. However, the job may finish earlier if converged. Please
monitor R factor / R free for convergence.
-
Free R flag
- It is advantageous to use R free flag for solvent building.
Should you choose to use R-free, this will cause additional
options to appear within the section ’Refmac parameters’. The
default is not to use R free.
- ARP/wARP flow parameters:
-
Add atoms
- This is followed by two numbers defining the threshold
(in sigmas of the density above the mean) for addition and
removal of solvent atoms. The defaults are 3.4 and 1.0,
respectively, which should work for most cases.
-
Disable Wilson plot statistics check
- The current Wilson
plot checking routine is probably too stringent. You may disable
the check and the warnings if you are sure that the X-ray data is
of high quality. However, we strongly recommend to not disable
the check and in case of warnings, inspect the plot and only then
proceed.
- Refmac parameters:
-
Cycles of refinement in each Refmac run
- Refmac is invoked
to refine the model before the density maps are computed. The
default is 1 cycle and there is usually no need to change this.
-
Matrix weight for Xray / Geometry
- The default
is automatic weighting. This proved to work well and, probably,
there is no need to change this parameter.
-
Scaling model
- The default is to use simple scaling of the low angle
part of the X-ray data. You can change this to bulk solvent
correction if you are sure that your low angle data below about 8
Å resolution are complete and correct.
-
Scaling B factor
- The default is to use anisotropic B factor for scaling
the X-ray data. You can choose isotropic scaling B factor if your
data are systematically incomplete (e.g. a cone is missing in
reciprocal space).
-
Scaling and sigmaa calculations
- This parameter also appears if the
free R flag is chosen for refinement of the protein part of the
model. The scaling and calculation of A coefficients by Refmac
map can be computed on the basis of the free reflections (this is
the default) or using all reflections.
-
TLS refinement
- The default is not to do a TLS refinement of the
model.
-
Input a user-defined library file
- If you already have a Refmac-style
cif library for, e.g. your already present ligand, you can input it
here.
- Crystal parameters:
-
Space group, Cell, ARP/wARP asymmetric unit, Wilson B factor and Solvent content
-
are derived automatically from the MTZ and the PDB files, displayed
for information only and cannot be changed. However, you may
want to check whether their values conform to your expectations.
-
Resolution
- By default all reflections present in the MTZ file will be
used. You can check the box (Use reflections between) and then
narrow the range if you are aware of certain deficiencies of your
data.
3.6.1.1 Output files, short log file
-
Refinement with REFMAC
- The R factor (and R free if requested) are
printed after refinement of the protein with Refmac. Check that the
value of the R factor is decreasing upon solvent building.
-
Job termination
- The statement Task completed successfully indicates that the
job has finished with no error. An error statement
QUITTING ARP/wARP module stopped with an error message: name_of_the_program
indicates that one of the modules of the task has terminated with an error
message. Please refer to the specified log file.
3.6.2 Running solvent building from command line, auto_solvent.sh
The script auto_solvent.sh in the $warpbin directory allows you to run the
solvent building as a single-line command without the use of the GUI. The use of
auto_solvent.sh is fairly simple. The script prints out help information if it is
invoked without arguments.
$warpbin/auto_solvent.sh \
datafile {mtzfile} \
protein {starting_PDB_file} \
[workdir {FULLPATH_WORKING_DIRECTORY}] \
[solventfileout {output_PDB_file}] \
[fp {fp_label}] [sigfp {sigfp_label}] [freer {freer_label}] \
[restrcyc {number_of_cycles (default is 20) }] \
[extralibrary {user_defined_library_for_Refmac5}] \
[tlsin {fixed pre-refined TLS tensors from Refmac5}] \
[parfile {parfilename_if_only_parfile_is_to_be_created}]
- Optional command line arguments are given in square parentheses
- All input files are assumed to be located in working directory
unless they are given with full path
- If workdir is not given, the current directory will be assumed
- All output files will be written into workdir/subdirectory
Required keywords are: datafile (followed by the mtz-file name with the
full path) and protein (followed by the pdb-file name of the protein model with
the full path).
Optional keywords include: workdir (followed by the full path to the
working directory), solventfileout (followed by the name of the PDB file where
the output will be written), fp (followed by the fp label), sigfp (followed by the
sigfp label) and freer (followed by the Rfree label). The defaults for the first two
are FP and SIGFP, respectively. Alternatively, if the mtz file contains only one
column for structure factor amplitudes and only one column for their standard
deviations, these will be taken. The number of cycles (default is 20) can be
changed with keyword restrcyc. The user-defined library and the tls-tensor
for Refmac can be supplied by using the keywords extralibrary and
tlsin.
If auto_solvent.sh is called with an option parfile, the script will create a
parameter file and a directory in the workdir whose name will be printed. The
job can subsequently be launched by:
% $warpbin/warp_solvent.sh NAME_OF_PARFILE
If auto_solvent.sh is called without an option parfile, it will also launch the
job. The log files and additional output files as well as the building results can be
found in the directory created.
3.7 ARP/wARP molecular graphics: ARP Navigator
The graphical front-end to ARP/wARP Version 7.2 is an OpenGL/X-window
based graphics program that can be launched by pressing the ARP Navigator
button in the CCP4 gui. The program can also be started from the command line
by typing arpnavigator.
3.7.1 Main Menu
The main menu sits at the top of the ARP Navigator screen.
3.7.2 Mouse and Keyboard functions
3.7.2.1 Rotation
-
Left mouse button pressed and mouse moved
- the scene rotates about the
x and y axes (screen plane).
-
Left mouse button + r-key pressed and mouse moved left-right
- the
scene rotates about the z axis (perpendicular to screen plane).
3.7.2.2 Translation
-
Right mouse button pressed and mouse moved
- the scene is translated in
the xy-plane (screen plane; maps are infinitely repeated).
-
Left mouse button + t-key pressed and mouse moved
- an alternative way
to translate the scene in the xy-plane.
-
Left mouse button + z-key pressed and mouse moved up-down
- the
scene is translated in z-direction (perpendicular to screen plane).
3.7.2.3 Scaling
-
Middle mouse button pressed and mouse moved left-right
- zooming, the
scene is scaled and a scale-o-meter is shown on the right.
-
Left mouse button + s-key pressed and mouse moved
- an alternative way
to zoom.
3.7.2.4 Clip planes
-
Left mouse button + f-key pressed and mouse moved left-right
- changes
the front clip position.
-
Left mouse button + b-key pressed and mouse moved left-right
- changes
the back clip position.
-
Left mouse button + g-key pressed and mouse moved left-right
- changes
the front and back clip position together.
-
Left mouse button + d-key pressed and mouse moved left-right
- changes
the position of the rotation-center (similar to translation).
3.7.2.5 Map contouring
The mouse wheel is used for changing the contour level of a map. The map must
be activated by pressing the corresponding object button at the bottom of the
graphics window.
-
Left mouse button + c-key pressed and mouse moved up-down
- An
alternative way to change the contour level.
3.7.2.6 Map extent
-
Left mouse button + e-key pressed and mouse moved
- size increases.
3.7.2.7 Mouse Actions
-
Left mouse button pressed in graphics area
- marks atoms or density
(switch this in Options menu). Double-click will also centre on atoms.
-
Right mouse button pressed on top of an object button
- opens the Mini
menu of the related object (Parameters, close, save, etc.).
-
Right mouse button pressed in graphics area
- opens the Quick actions
menu.
3.7.2.8 Keyboard Actions
-
w
- Hide the menu and all attached information as long as pressed
-
W (=shift-w)
- Lock the function of ’w’ and do not show the menu when
released. To unlock, press ’w’ or ’shift-w’ again, then the menu will be
visible again.
-
G (=shift-g)
- Launch a goto-atom dialog (see ’goto atom’ below).
-
C (=shift-c)
- Center on the last mark set irrespective of whether this was an
atom or a density region.
-
D (=shift-d)
- Activate the display of distances between the most recent
mark and all other marks set so far.
-
m
- Toggle the control of a detached model: move the model only vs move
the crystal frame alone with the model fixed.
-
k
- Toggle the control of a detached model: move the model and the crystal
frame together vs move the crystal frame alone.
3.7.3 Object Buttons
When a file is loaded and put on display, there will be small boxes appearing in
the bottom left corner representing each of the graphical objects. Only one object
can be active at a time.
An object can be made active by clicking on the box with the left mouse
button. A little eye symbol shows whether this object is currently on display or if
it’s hidden. Clicking with the right mouse button on this box will pull out
the mini-menu with actions applied to this object only (see also Mini
menu).
3.7.4 Quick Actions
When the right mouse button is pressed with no movement, then a green button
box is displayed that contains functionalities to be applied ’ad-hoc’ and with no
input dialog.
-
Goto Atom
- This button launches the ’goto-atom’ dialog as ’shift-g’ does.
The goto-atom dialog expects that atoms are specified as e.g. CA/123/A
for the CA atom of residue 123 in chain A. Just specifying CA/123
means the first occurrence of CA in residue 123. Specifying /123/
means the first atom in residue 123. Typing //Z will be interpreted
as the first atom of chain Z. The program will centre on the atom if
found. In case the atom cannot be found, the dialog gets coloured in
pink.
-
Real Space Refine Ligand
- The ligand to be refined is a detached molecule
and there is one density map on display. The ligand gets refined
to that density map locally and the initial ligand position must be
in the radius of convergence. The output will replace the detached
model. Please note that the refinement is restrained to the ligand
stereochemistry which is derived from the input ligand model.
Thus continuous play with the ligand by taking it out and then
refining it back in to its density will successively change ligand’s
stereochemistry.
-
Find Ligand Binding Site
- The ligand to be located is a detached molecule
and there is one density map on display. Furthermore all other models
displayed are taken as occupants of space and the binding site can not
intersect with them. In return a dummy atom model of the located
density blob is shown.
-
Fit Ligand Here
- The ligand to be fit is the detached model, there is at least
one density map on display that has one of its blobs marked. The
output will replace the detached model.
-
Build Helices
- At least one density map must be on display (or activated).
Helices are built and side chains are modelled up to C-gamma atoms.
Chapter 4
Additional Remarks
4.1 Quality of the X-ray Data
The X-ray data should be as complete as possible, especially in the low resolution
range (5 Å and worse). If the low resolution strong data are systematically
incomplete (e.g. missing or overloaded reflections), the density map, even in the
case of a good model, may be discontinuous and inconsistent with the model.
Because ARP/wARP involves model building on the basis of density maps,
such discontinuity can lead to slower convergence or adversely affect the
performance.
ARP/wARP automatically checks the fit of your data to the expected Wilson
plot and will report if necessary. If suggested to cut the data from the high
resolution side - follow the suggestion. If suggested to cut the data from the low
resolution side - do so but do not cut to a resolution below 8 or 10 Å. If
suggested to ignore all data or there are still other complaints after the
cut - you may consider inspecting your data processing. The current
version of the ARP/wARP Wilson plot check might be too stringent.
Nevertheless the user is advised to visually inspect the Wilson plot and apply
his/her critical judgment whether or not the data should be cut. It has
sometimes proved beneficial to cut the data which were flagged as poor,
though in some cases the presence of these data were crucial for the model
building.
Chapter 5
Citing ARP/wARP
Please cite the applications of ARP/wARP that you have used. Please consult the
ARP/wARP log file for the most relevant citation.
Chapter 6
Other References
The most recent overview of ARP/wARP can be found in:
- Langer, G., Cohen, S.X., Lamzin, V.S. & Perrakis, A. (2008) Automated
macromolecular model building for X-ray crystallography using
ARP/wARP version 7. Nature Protocols. 3, 1171-1179.
Applications are presented in:
- Mooij, W.T., Cohen, S.X., Joosten, K., Murshudov, G.N. & Perrakis,
A. (2009) Conditional Restraints: Restraining the free atoms in
ARP/wARP. Structure. 17, 183-189 (protein model building)
- Hattne, J. & Lamzin, V.S. (2008) Patter recognition-based detection of
planar objects in 3D electron density maps. Acta Cryst. D64, 834-842
(nucleotide building)
- Joosten, K., Cohen, S.X., Emsley, P., Mooij, W., Lamzin, V.S. & Perrakis,
A. (2008) A knowledge-driven approach for crystallographic protein
model completion. Acta Cryst. D64, 416-424 (protein model building,
loops)
- Cohen, S.X., Jelloul M.B., Long, F., Vagin, A., Knipscheer, P., Lebbink,
J., Sixma, T.K., Lamzin, V.S., Murshudov, G.N. & Perrakis, A. (2008)
ARP/wARP and molecular replacement: the next generation, Acta
Cryst. D64, 49-60 (protein model building)
- Evrard, G.X., Langer, G.G., Perrakis, A. & Lamzin, V.S. (2007)
Assessment of automatic ligand building in ARP/wARP. Acta Cryst.
D63, 108-117. (ligand building)
- Zwart, P.H., Langer, G.G. & Lamzin, V.S. (2004) Modelling bound
ligands in protein crystal structures. Acta Cryst. D60, 2230-2239.
(ligand building)
- Cohen, S.X., Morris, R.J., Fernandez, F.J., Ben Jelloul, M., Kakaris,
M., Parthasarathy, V., Lamzin, V.S., Kleywegt, G.J. & Perrakis, A.
(2004) Towards complete validated models in the next generation of
ARP/wARP. Acta Cryst. D60, 2222-2229. (side chains)
- Morris, R.J., Zwart, P.H., Cohen, S., Fernandez, F.J., Kakaris, M.,
Kirillova, O., Vonrhein, C., Perrakis, A. & Lamzin, V.S. (2004) Breaking
good resolutions with ARP/wARP. J. Synchr. Rad. 11, 56-59. (helices
and strands, protein model building)
- Morris, R.J., Perrakis, A. & Lamzin, V.S. (2003) ARP/wARP and
automatic interpretation of protein electron density maps.In Meth.
Enz. (Carter, C. & Sweet, B. eds.) 374, 229-244. (protein model
building)
- Morris, R.J., Perrakis, A. & Lamzin, V.S. (2002) ARP/wARP’s
model-building algorithms. I. The main chain. Acta Crystallogr. D58,
968-975. (protein model building)
- Perrakis, A., Harkiolaki, M., Wilson, K.S. and Lamzin, V.S. (2001)
ARP/wARP and molecular replacement. Acta Cryst. D57, 1445-1450.
(protein model building)
- Lamzin, V.S., Perrakis, A. & Wilson, K.S. (2001) The ARP/WARP
suite for automated construction and refinement of protein models. In
Int. Tables for Crystallography. Vol. F: Crystallography of biological
macromolecules (Rossmann, M.G. & Arnold, E. eds.), Dordrecht,
Kluwer Academic Publishers, The Netherlands, pp. 720-722. (solvent)
- Perrakis, A., Morris, R. and Lamzin, V.S. (1999). Automated protein
model building combined with iterative structure refinement. Nature
Struct. Biol. 6, 458-463. (protein model building)
- Perrakis, A., Sixma, T.K., Wilson, K.S. and Lamzin, V.S. (1997) wARP:
improvement and extension of crystallographic phases by weighted
averaging of multiple refined dummy atomic models. Acta Cryst.
D53, 448-455. (protein model building)
- Lamzin, V.S. and Wilson, K.S. (1993) Automated refinement of protein
models. Acta Cryst. D49, 129-149. (model update and solvent)
For other publications please refer to the references therein or to the
ARP/wARP web page.
Chapter 7
Acknowledgements
The current ARP/wARP developers are:
The Hamburg team (European Molecular Biology Laboratory (EMBL)
Hamburg Outstation, c/o DESY, Notkestrasse 85, 22603 Hamburg, Germany):
- Victor S. Lamzin (tel +49-40-89902-121, email:
victor@embl-hamburg.de)
- Ciaran Carolan
- Saul Hazledine
- Philipp Heuser
- Tim Wiegels
The Amsterdam team (Molecular Carcinogenesis Programme, Netherlands
Cancer Institute, Plesmanlaan 121, 1066 CX Amsterdam, The Netherlands):
- Anastassis Perrakis (tel. +31-20-512-1951, email: a.perrakis@nki.nl)
- Krista Joosten
Former members
- Serge X. Cohen, Helene Doerksen, Guillaume X. Evrard, Francisco
Fernandez, Johan Hattne, Marouane Ben Jelloul, Matheos Kakaris,
Olga V. Kirillova, Gerrit G. Langer, Wijnand Mooij, Richard J. Morris,
Parthasarathy Venkataraman, Tilo Strutz, Petrus H. Zwart
The authors are especially grateful to:
- Keith S Wilson (York, UK) one of the originators of the software
and Zbyszek Dauter (Argonne, USA) for significant contributions at
earlier stages the software development.
- The REFMAC developers team lead by Garib Murshudov
(York-Cambridge, UK).
- The CCP4 developers currently lead by Eugene Krissinel (Didcot, UK)
- Many of our collaborators and active users — a comprehensive list is
very long !
We would also like to take this opportunity to thank for the support of
ARP/wARP: the EMBL and the NKI, for hosting the research groups; the EMBL
for hosting the ARP/wARP download servers and remote computational
infrastructure, funding agencies for research and infrastructure grants; our
industrial users, for generating a license income which strengthens our
ability to keep to our commitment for free distribution to the academic
community.