Copyright © 2009, The ARP/wARP development team

This user guide covers the fundamentals of the ARP/wARP software suite and most of its applications. If you cannot find the answers to your questions here, please visit the ARP/wARP web page http://www.arp-warp.org


Chapter 1. General information

Introduction

ARP/wARP is a software project for automated protein model building and structure refinement. It is based on a unified approach to the structure solution process by combining electron density interpretation using the concept of the hybrid model, pattern recognition in an electron density map and maximum likelihood model parameter refinement with REFMAC. The ARP/wARP software is under continuous development. Its present release, version 7.1, can be used in the following ways:

  1. Automated protein chain tracing in the density map and model building (GUI modules ARP/wARP Classic, ARP/wARP Expert System, command line modules auto_tracing.sh and auto_flex_warp.sh). This constructs polypeptide fragments (both main and side chains) for the cases of MR solutions or MAD/M(S)IR(AS) phases. The Classic and the auto_tracing.sh modules use a pre-defined number of cycles for model update, refinement and chain tracing, while Expert System and auto_flex_warp.sh define the sequence of stepson the fly. X-ray data to 2.7 Å resolution or higher are required although partial model building can sometimes be achieved at lower resolution.

  2. Free atoms density modification (GUI module ARP/wARP Classic). This provides improvement in density map by free-atoms update (no model building) and requires an input PDB and X-ray data to about 2.5 Å resolution or higher.

  3. Automated building of alpha-helical and beta-stranded fragments (GUI module ARP/wARP Quick Fold, command line module auto_albe.sh). This constructs helical and beta-stranded polypeptide fragments (main chain and CB atoms) in low-resolution density maps. Phased X-ray data to 4.5 Å resolution or higher are required.

  4. Building and real space refinement of side chains for a polypeptide model (GUI module Side Chains). This will keep the main chain atoms of the input model untouched, will dock its fragments in sequence and build side chains in most appropriate conformations. This application is almost independent on the resolution of the data.

  5. Building missing loops in a protein model (GUI module Loops). This will generate a set of candidate loops for a short stretch of missing residues given the anchors and the sequence of the missing residues. Each candidate will have both main chain and side chain atoms. The user can ask the module to choose a single solution or to suggest several loops. A proteinmodel and an X-ray data to 3.0 Å resolution or higher are required.

  6. A prototype software for building poly-nucleotide fragments, DNA or RNA (GUI module ARP/wARP DNA/RNA, command line module auto_nuce.sh). This will produce a set of polynucleotide chains with guessed bases (A or C, i.e. large or small), the nucleotide sequence isnot yet used. Phased X-ray data to about 3.5 Å resolution or higher are required.

  7. Building bound ligands (GUI module ARP/wARP Ligands, command line module auto_ligand.sh). This constructs a ligand in a difference electron density map, after the protein model has been completed and refined, given a template search ligand or a list of putativeligands (cocktail screening). X-ray data to 3.0 Å resolution or higher are required.

  8. Building the solvent structure (GUI module ARP/wARP Solvent, command line module auto_solvent.sh). This builds a solvent structure after the protein model has been refined. X-ray data to 2.5 Å resolution or higher are required.

  9. This is a handy prototype of the molecular graphics ARP/wARP front-end, which allows the display of molecules and electron densities (GUI module ARP Navigator, executable program arpnavigator). It also allows to execute some fast ARP/wARP tasks (building helices/strands, ligands or solvent) and observe the results in real time as they are computed.

Major changes in Version 7.1

Latest News and Bug Report

For the latest news and announcements please visit the ARP/wARP page (www.arp-warp.org). The developers will greatly appreciate all bug reports or suggested changes from the users.

Distribution

The ARP/wARP package is freely available to academic users provided that they agree to the ARP/wARP license conditions and the applications of ARP/wARP are properly cited. Please consult the ARP/wARP log file for the most relevant citation.

Industrial users are requested to obtain a commercial license via the ARP/wARP web page.


Chapter 2. Installing ARP/wARP

The installation of ARP/wARP should be straightforward; please follow the procedure described below:

  1. Install (or make sure you have installed) the CCP4 suite (it can be fetched from the CCP4 web page http://www.ccp4.ac.uk). CCP4 6.0 (and higher) is the recommended version to use with ARP/wARP 7.1, although older versions of CCP4 may be compatible with some of the ARP/wARP modules.

  2. Download the full ARP/wARP package arp_warp_7.1.tar.gz from the ARP/wARP web page and save it in a location of your choice. Next, type:

% gunzip arp_warp_7.1.tar.gz

% tar xvf arp_warp_7.1.tar

The distribution will unpack under the directory called arp_warp_7.1 that will contain all the required files and subdirectories. install.sh is an installation script to help you set the appropriate environmental variables. README will walk you through the installation process. ARP_wARP_CCP4I-v5.tar.gz includes everything necessary to run ARP/wARP from the CCP4i interface of version 5; ARP_wARP_CCP4I-v6.tar.gz includes everything to run tasks from the CCP4i interface of version 6.

3. Go to the directory arp_warp_7.1 and run there the install.sh script by simply typing

% ./install.sh

The script will check the following (please inspect its output, it should look like the one given in Appendix A):

Finally the install.sh script will output one line that must be added to your .cshrc, .bashrc or .zshrc file in order to setup the proper environment for ARP/wARP. The setup script will automatically recognise the machine type upon login and use the correct ARP/wARP binaries. The install.sh script should finish with no warnings and the statement of successful installation:

*** INSTALLATION OF ARP/wARP 7.1 HAS BEEN SUCCESSFUL ***

4. Modify your .cshrc / .bashrc as suggested in the output of install.sh. Important: source CCP4 first, ARP/wARP then, e.g.:

source /Users/testuser/ccp4/ccp4-6.0.2/include/ccp4.setup

source /Users/testuser/arp_warp_7.1/arpwarp_setup.csh

5. By default the install.sh script will attempt to uninstall previous ARP/wARP installation from the CCP4i GUI and install the 7.1 version there. Should this not be successful due to, e.g. ownership permissions (please inspect the output when running install.sh) or you explicitly tell the script to not

(re)install the GUI by typing ./install.sh no-gui, you may

need to setup CCP4i manually. To do this first start CCP4i by typing ccp4i and setup your project if you use CCP4i for the first time in your user account. Within CCP4i go to "System administration" and uninstall an earlier version of ARP/wARP (if there is any). Restart CCP4i. Go to "System administration" again and install the GUI by navigating to the file arp_warp_7.1/ARP_wARP_CCP4I-v5.tar.gz or arp_warp_7.1/ARP_wARP_CCP4I-v6.tar.gz depending on the CCP4 version that you use. The interface should gunzip, untar and install the content automatically. At this point, the installation is complete. Restart CCP4i and click on the desired ARP/wARP module’s button. The CCP4i Model Building GUI panel should show 9 ARP/wARP modules, see the figure to the right. If you observe additional ARP/wARP modules, the GUI may not function properly. Please un-install previous ARP/wARP versions.

We recommend that the installation of the CCP4 GUI be done by the

person who installed the CCP4 package, so that all users have an up-to-date interface and the correct permissions are set.

Unless you are already an experienced ARP/wARP user, you should try to get started with the test files provided in the directory arp_warp_7.1/examples. These include the data for protein chain tracing, helix/strands search, ligand and solvent building. A README file is included with the example data which gives more detailed information, which data are to be used for what. If things do not work as expected please consult your system manager first. If problems remain, please contact us at the ARP/wARP bulletin board (available from the ARP/wARP web page).


Chapter 3. Using ARP/wARP

Automated Model Building Running model building from the GUI, ARP/wARP Classic

This module of ARP/wARP provides the execution of the following model building tasks:

(a)
automated model building starting from experimental phases
(b)
automated model building starting from existing model
(c)
improvement of maps by atoms update and refinement

Applications (a) and (b) (the so-called warpNtrace protocol) start with input experimental / density modified phases or available (preliminary refined or partially autotraced) model and are aimed to deliver an essentially complete model and obviously an improved map. The warpNtrace protocol utilises the idea of the hybrid model in which protein and free atoms can co-exist. warpNtrace keeps whatever was recognised as protein (in a form of polypeptide fragments) and the rest as free atoms and refines this hybrid model during a 'big' cycle, consisting of several (default is 5) ARP/REFMAC update/refinement cycles. At the end of each ‘big’ cycle the map is interpreted anew -a completely new polypeptide model is constructed with hopefully more residues in less fragments. This whole procedure is iterated (default is 10 times).

The output of warpNtrace is a set of refined polypeptide fragments. If the sequence is available, the traced fragments will be docked in sequence and side chains will be built during the iterative refinement procedure. Loops will be built during the procedure, if possible. After the last building cycle the fragments will be arranged to form a globular structure (or, for a case of NCS, several NCS-related structures). The remainder of the structure (cis-prolines, poorly ordered loops and terminal residues for each fragment) will have to be completed by the user manually. Since the output model is refined, its accuracy is expected to be comparable to the one of the final refined structure. Mis-tracing (incorrect tracing of polypeptide fragments) is not impossible but should normally not exceed 1 % of the whole structure with X-ray data to about 2.5 Å resolution or higher. An estimate of the correctness of the model is printed after every model building cycle, e.g.:

% Chains 12, Residues 434, Estimated correctness of the model 99.1 %

Application (c) includes no model building but still may provide improvement of the density map. The map is first interpreted as a pseudo protein model, consisting of unconnected free atoms. This model is then refined and updated with iterative cycles of ARP/REFMAC.

Below is the application (a) is described in detail, input to applications (b) and (c) is very similar and should be straightforward to figure out.

  • Launch the ARP/wARP Classic window within the CCP4i GUI.

    • Provide required input:

      1. o Run ARP/wARP for -Choose applications (a) to (c) as described above.

      2. o MTZ in -X-ray data in the MTZ format containing structure factor amplitudes, their standard deviations, phases and figures of merit. If pre-weighted structure factor amplitudes are to be used to construct initial map, please check the corresponding box in ARP/wARP flow parameters (see below).

      3. o Fobs Sigma PHIB FOM -If the MTZ column labels for structure factor amplitudes, their standard deviations, phases and figures of merit have obvious names, they will be recognised automatically. Otherwise please use the scrolling button, navigate to List All Labels and chose the appropriate ones.

      4. o Sequence file in -Provide the sequence file in the following format (pir): The first line should start with “> ” The second line should be blank The sequence (1 letter code) starts from the third line. The space characters hereafter are ignored. Should the sequence be not available, please un-check the box ‘Dock the autotraced chains…’ in ARP/wARP flow parameters.

      5. o Total residues in the AU / number of molecules -For monomers provide the total number of residues in the asymmetric unit, the number of molecules is obviously 1. In a case of NCS, please still provide the total (!) number of residues in the asymmetric unit and also the number of NCS related molecules (e.g. if you have 2 molecules in the AU with 200 residues each, enter 400 for the number of residues). If you have a heteromultimer, e.g. 3α/3β structure, the NCS order is 3 but please make sure that the sequence file contains both α and β sequences separated by about 20 alanines: SEQUENCE_OF_α_SUBUNIT_AAAAAAAAAAAAAAAAAAAA_SEQUENCE_OF_β_SUBUNIT

      6. o Cycles of autobuilding / total cycles The default is 10 building cycles separated with 5 ARP/REFMAC cycles (thus making 50 cycles in total). In cases of good starting phases the autobuilding may converge faster; in cases of poorer phases more cycles may be required. You can always submit warpNtrace for further cycles using the output of the previous tracing (protocol automated model building starting from existing model).

        1. o Protocol for REFMAC5 / Rfree The refinement target gives three choices:

          1. The default is to use maximum likelihood target.

            1. The second choice (new in version 7.1) allows the user to use the SAD target. This function is based on REFMAC5 developments by Skubak & Pannu, and allows to

            2. refine against the F+/F-data, when these are available. A prerequisite when this option is activated, is to also provide a PDB file with the anomalous scatterers, and define the extent of the 'anomalous signal' either by providing the wavelength, or measured f' and f'' values. At this point we only provide the ability for one type of atom to be defined when f'/f'' values are used. If you have more than one atom, you just choose the wavelength to fetch theoretical values -that should in practice work well.
          2. The third choice is the 'Phased ML' function, which we would strongly recommend to NOT use in case of SAD data. If MAD or MIRAS data are available, you should use it in conjunction with good quality phase error estimates in the form of HL coefficients, preferably calculated in a reliable manner, by e.g. SHARP.

The default is not to use Rfree, since the number of traced residues serves as excellent indicator of the success of the job. You can turn the use of Rfree on but the authors have seen marginal cases (low resolution and hence low observation-toparameter ratio) when this adversely affected the model building.

o Se-Methionine -If you have Se-methionine substituted protein, regardless of the use of the refinement function, you can check the box thus asking ARP/wARP to build and refine Se-Met residues and use these for better refinement results.

• Now you are ready to start the job: Click on Run and choose Run now

There is a number of additional parameters that you normally should not worry about. A brief description is given below.

    • ARP/wARP flow parameters:

      1. o Use conditional restraints for free atoms. This is a new option in version 7.1 that allows restraints to be used to keep free atoms in reasonable places, and it is on by default.

      2. o Impose the use of conditional restraints for very large structures. This option will force the use of conditional restraints for very large structures, where ARP/wARP might otherwise choose to suppress them for CPU efficiency reasons. No need to use, unless you see warning message that they were turned off, and want to give them a try in the expense of increasing execution time. The default is not to use.

      3. o Use Loopy to build loops. This option, also new in version 7.1, allows the loop-filling mode to be invoked throughout the iterations. The default is on.

      4. o Dock the autotraced chains to sequence The default is to dock the fragments starting from building cycle 0. This may be changed, although may not be advantageous. Should the sequence be not available, the docking can be disabled un-checking the box.

      5. o Search for helices and strands before each building cycle -This is the default for resolution of 2.7 Å or lower. Enabling this option may also be advantageous for model building at higher resolution at modest expense of the CPU. Should the model from helix/strands tracing be more complete than the model from warpNtrace, the appropriate message will be printed at the end of the short log file.

      6. o Pre-weighted Fobs for initial map calculation Checking this box will result in a pool-down menu asking for FBEST label.

      7. o Number of ARP/REFMAC refinement cycles between autobuilding The default is 5. In cases of poor convergence you can try to increase this number to 10.

      8. o Skip the autobuilding for the first cycles Checking this box will disable the autotracing for the provided number of cycles. This was sometimes advantageous with earlier ARP/wARP versions when the initial phases were poor.

      9. o Randomisation of atomic positions This also was sometimes advantageous with earlier ARP/wARP versions when the initial model bias was high. The default is not to randomise.

      10. o Iterate the tracing -Each main chain tracing is carried out in several rounds. The module will decide on its own how many iterations are needed. The default maximum number is 5 and it is not recommended to change this value.

        1. o Density thresholds for atom removal and addition -These parameters are fixed to

        2. 3.2 and 1.0, respectively. In cases of poor convergence, particularly when the number of both added and removed atoms is considerably less than the number “requested” (as can be seen from the log file), the threshold for atom removal can be slightly increased. Also, at resolution of 2.5 Å and lower it may be advantageous to decrease the threshold for atom addition from 3.2 to 3.0 or 2.8.
      11. o Increase in the number of atoms to be added and removed as compared to the automatically set values -The default is 1 (no increase) and it is not recommended to change this. This option is provided primarily for experienced users.

      12. o Disable Wilson plot statistics check -The current Wilson plot checking routine is probably too stringent. You may disable the check and the warnings if you are sure that the X-ray data is of high quality. However, we strongly recommend to not disable the check and in case of warnings, inspect the plot and only then proceed.

    • Refmac parameters:

      1. o Attempt to correct for data collected from a twinned crystal -Refmac will attempt fully automated twinning (new in version 7.1). This option is incompatible with SAD refinement.

      2. o Cycles of refinement in each Refmac run -Refmac is invoked to refine the hybrid model before the density maps are computed. The default is 1 cycle if the data extend toa resolution of 2.3 Å or higher, otherwise 3 cycles. There is usually no need to change this parameter.

      3. o Damp shifts -The default is 0.99 for both types of shifts. There is usually no need to change these parameters.

      4. o Matrix weight for Xray / Geometry -The default is automatic weighting. This proved to work well and there is no need to change this parameter.

      5. o Scaling model -The default is to use simple scaling of the low angle part of the X-ray data. You can change this to bulk solvent correction if you are sure that your low angledata below about 8 Å resolution are complete and correct.

      6. o Scaling B factor -The default is to use anisotropic B factor for scaling the X-ray data. You can choose isotropic scaling B factor if your data are systematically incomplete (e.g. a cone is missing in reciprocal space).

      7. o Data with free R label -This option appears if the free R flag has been chosen for refinement of the protein part of the model. Here you can provide a column label for the free R flag.

      8. o Use of free R reflections -This option also appears if the free R flag has been chosen. The scaling and calculation of σA coefficients by Refmac can be computed on the basis of the free reflections (this is the default) or using all reflections.

      9. o Solvent mask correction The default is to use solvent mask correction within Refmac.

    • Crystal parameters:

      1. o Space group, Cell, ARP/wARP asymmetric unit, Wilson B factor and Solvent content are derived automatically from the MTZ file and the total number of residues in the asymmetric unit. They are displayed for information only and cannot be changed. However, you may want to check whether their values conform to your expectations.

      2. o Resolution -By default all data present in the MTZ file will be used. You can check the box and then narrow the range if you are aware of certain deficiencies of your data.

  • Submit a remote job at the Hamburg Cluster:

o Checking this button will activate remote submission. This is described below in a separate chapter.

Running model building from the command line, auto_tracing.sh

The script auto_tracing.sh in the $warpbin directory allows running the automated model building from the command line without the use of the GUI. The use of auto_tracing.sh is fairly simple. If invoked without arguments the script will print help information.

Usage:

$warpbin/auto_tracing.sh \ datafile {mtzfile} \ [residues {number_of_residues_in_AU}] \ [workdir {FULLPATH_WORKING_DIRECTORY}] \ [fp {fp_label}] [sigfp {sigfp_label}] [freelabin {freer_label}] \ [fbest {weighted_amplitude_label}] [phibest {phibest_label}] [fom {fom_label}] \ [modelin {input_PDB_file_to_use_as_initial_model}] \ [seqin {sequence_file_for_one_NCS_copy}] \ [cgr {number_of_NCS_copies (if seqin is provided, default is 1) }] \ [buildingcycles {the_number_of_autobuilding_cycles (default is 10) }] \ [resol {'rmin rmax' (default is the full resolution range) }] \ [albe {1 to_always_invoke_albe, default is 0 for resol < 2.7A, else 1) }] \ [restraints {1 to use conditional restraints, default is 1 }] \ [twin {1 to try de-twining and twin refinement, default is 0 }] \ [sad {1 to turn on the SAD function refinement, \

needs also 'wavelength' and 'heavyin' on input, default is 0 }] \ [compareto {PDB_file_for_comparison}] \ [parfile {parfilename_if_only_parfile_is_to_be_created}] \

-
Optional command line arguments are given in square parentheses
-
Possible combinations of MTZ labels are:

For start from phases: fp/sigfp/phibest/fom or fbest/sigfp/phibest to build initial free-atoms model and fp/sigfp to refine the model If 'fbest' is given, 'fom' will be ignored

For start from a model: fp/sigfp to refine the model

-
All input files are assumed to be located in working directory unless they are given with full path
-
If workdir is not given, the current directory will be assumed
-
All output files will be written into workdir/subdirectory

Additional useful tips:

-
Normally the job runs in a subdirectory called YYYYMMDD_HHMMSS To run the job in the current directory use: auto_tracing.sh jobId '.'
-
If you invoke auto_tracing.sh from another script and the keywords with double-word argument are not properly understood, e.g. resol '20.0 2.5', try resol 20.0;2.5
-
If you have a par file from an earlier version of ARP/wARP and would like to re-run that job now, use: auto_tracing.sh defaults OLD_PAR_FILE This will create a par file compatible with the current ARP/wARP version and the keywords, which are new to OLD_PAR_FILE will take their default values

Required keyword is: datafile (followed by the mtz-file name with the full path).

Optional keywords include: residues (followed by the number of residues), workdir (followed by the absolute path to the working directory), fp (followed by the fp label), sigfp (followed by the sigfp label), freelabin (followed by the Rfree label), fbest (followed by the label for the fom-weighted structure factor amplitudes to be used for initial map calculation), phibest (followed by the best phi label), fom (followed by the figure of merit label), modelin (followed by a starting pdb-file with the full path), seqin (followed by a sequence-file name with the full path), cgr (followed by a number of NSC-related copies), buildingcycles (followed by the number of building cycles), resol (followed by the resolution limit), albe (followed by the flag to enable or not helix/strands building), similarly for restraints, twin and sad. There are additional parameters, which can be castomised, and an experienced user should have no problem in figuring out how to do this. Alternatively, please contact the ARP/wARP developers for advice.

If auto_tracing.sh is called with an option ‘parfile’, the script will create a parameter file and a directory in the workdir whose name will be printed. The job can subsequently be launched by:

% $warpbin/warp_tracing.sh NAME_OF_PARFILE

If auto_tracing.sh is called without an option ‘parfile’, it will also launch the job. The log files and additional output files as well as the building results can be found in the directory created.

Remote submission of a model building task

This option offers you the following possibilities:

a) Your model building will run using external computational facilities, where the CPU performance may be superior to your local installation.

b) You can be assured that the most recent working executables will be used, should you have a problem with your local installation.

c) Should the task crash, an automatic notification will be forwarded to the ARP/wARP developers who can then promptly help you (unless you have declared your task to be confidential, see below).

d) Upon your wish you can share the results of the completed task with software developers.

Submitting from the GUI.

Clicking on the button with "Submit the job for remote execution at the Hamburg cluster" within the main ARP/wARP Classic GUI panel allows one to execute an autotracing task remotely. The panel will expand and ask for an email address to be provided. Please also choose one of the options from the drop down menu to indicate how you would like your data to be handled.

The options are:

a) the data must be kept confidential and deleted after the job has finished b) the data can be made available to ARP/wARP-AutoRickshaw-Refmac developers c) the data can be archived and made available to any software developer that requests them (this is default)

Option (b) will only allow the data share to the ARP/wARP, Auto-Rickshaw and Refmac development teams. Option (c) will extend the share to anyone who requests the data. In case of option (a) only the short log file, Wilson log files and the parameter file will be kept by the ARP/wARP developers, all other data and log files will be automatically deleted after the job has finished.

Once the job has been submitted for remote execution, the GUI window will indicate that the job has finished. Please inspect the log file from the pull-down menu option "View files from job" for further instructions. An email will be sent to you at the email address that you entered in the GUI window. Please follow the instructions in the email (http link, login and password) to connect to the Hamburg cluster. You can then monitor the log file in your browser window. As soon as the job is finished, you will be provided with a link to the results that you can then download. Keep in mind that once the job is finished, your data will be kept for one week only. Make sure that you download your data within that time.

The remote job submission relies on the curl software installed at your site. Availability of curl is checked while installing ARP/wARP and a warning is given if curl is not available.

Submitting from a web browser.

  1. Navigate your browser to http://cluster.embl-hamburg.de/ARPwARP/remote-http.html or choose ‘Model building via the web’ from http://www.arp-warp.org

  2. View the Disclaimer as well as the ARP/wARP and the CCP4 licensing conditions.

  3. Proceed with the remote services to Step One

  4. Choose the model building protocol (start from experimental phases or existing model).

  5. Enter your Email address to which instructions on how to view the results will be send.

  6. Provide your MTZ file by using the ‘Browse’ button, the file must have an extension ‘.mtz’.

  7. Click ‘Proceed to Step Two’.

    1. Enter starting model (unless you have chosen

    2. a protocol to start from experimental phases).
  8. Enter the total number of residues and the number of chemically identical molecules in the asymmetric unit. Please make sure you enter these two numbers right. If, for example, the asymmetric unit contains a dimer with each subunit having 50 residues, then you enter 100 and 2, respectively.

  9. Enter MTZ labels. FP and SIGFP are compulsory for model building starting from the existing model. PHI is additionally needed (and FOM is optional) for start from experimental phases.

  10. Click on ‘I agree to cite the required references and would like to proceed with ARP/wARP remote services’. This uploads the files to the cluster in Hamburg, launches the job and, after a few minutes delay, sends you an Email with instructions for viewing.

  11. Please follow the instructions in the email (http link, login and password) to connect to the Hamburg cluster. You can then monitor the log file in your browser window. As soon as the job is finished, you will be provided with a link to the results that you can then download.

Keep in mind that once the job is finished, your data will be kept for one week only. Make sure that you download your data within that time.

OUTPUT files, short log file:

  1. o Checking the estimated content – Should the solvent content be too high or too low (e.g. you have mis-typed the total number of residues expected in the AU), ARP/wARP will re-set it to approximately 50%. The target number of residues will be reset accordingly.

  2. o Checking the provided sequence file – Should the sequence length, the number of molecules in the AU and the total number of residues in the AU not match each other, the number of molecules in the AU will be reset accordingly. Should the sequence file not be interpretable (e.g. contain unexpected characters), an error message will be given.

  3. o Input MTZ file – We have observed that sometimes the MTZ files do not have proper headers, e.g. non-standard space group name or zero space group number. ARP/wARP uses CAD programme to always do a header fix, thus the MTZ file may have an extension .mtz.cad

  4. o Space group number – ARP/wARP version 7.1 supports all standard non-centrosymmetric space groups, P1bar and several non-standard space groups (e.g. 1017 or 2017). The space group is figured out solely from the symmetry operators stored in the MTZ file header.

  5. o Input files – The ASCII files (sequence, input PDB or input file with heavy atoms) are always converted to a Unix line feed, thus they have an extension _lf.

  6. o Had to go as low as XXX sigma to complete atoms search -The initial free-atoms model (in case you started from experimental phases) is built into the starting density map. The density threshold is successively reduced. A typical value that you can see in the log file is 0.6 to 0.8 sigma. A lower value may be an indication of too-much a flattened map or an overestimation of the number of residues in the asymmetric unit. If you suspect the latter, please check the derived solvent content.

  7. o Checking whether input PDB contains ligands – This check comes up if the initial model is available. Should the model contain ligands unknown to the Refmac library, they are renamed to the free DUM atoms. This should not affect the model building performance, but the warning is printed.

  8. o R factor after Refmac before model building – If the initial model is available, a number of restrained refinement cycles with Refmac is carried out until R factor convergence.

  9. o Building cycle zero -Normally one should expect a considerable part of the structure built already at the starting building cycle. If this is not the case, observe the situation for a few further building cycles. If, however, there is essentially nothing autotraced for further building cycles, please inspect whether the initial phases are sufficiently good.

  10. o Search for helices and strands – The module for building helical and beta-stranded fragments is invoked if requested or by default with data at 2.7 Å resolution or lower. The number of built helical/stranded residues and chain fragments is printed.

  11. o Rounds within building cycle -Each cycle of the main chain tracing is carried out in several rounds. Normally each successive round should result in more residues and in fewer fragments. The maximum length of the traced fragment and the score of the model building are also printed for information.

  12. o Chains, residues and estimated correctness of the model -The output from the best tracing round is processed further. Fragments of 4 residues or shorter are converted to free atoms. In addition, the terminal residues of the fragments are removed. The rest is kept and used to provide restraints for subsequent ARP/REFMAC cycles. The value of the estimated correctness of the model should steadily approach 100% if the tracing is successful.

  13. o Residues docked into sequence -If the sequence is provided, the autotraced fragments are docked into it and the side chains are built and refined in real space. The results of this are printed out. If the sequence is not provided, side chain guesses only (GLY/ALA/SER/VAL) are built and refined.

    1. o Loop building – This is invoked if the sequence is available and if the tracing score is above

    2. 0.85. It is also invoked after the last building cycle.
  14. o R factor after Refmac during the iterations -The value of the R factor typically oscillates. It goes up after each tracing cycle (because the model is entirely rebuilt) and then decreases during the ARP/REFMAC refinement and update cycles. At the end of the procedure it should reach a value typical for a restrained refinement.

  15. o Sequence coverage -If the sequence is provided, the ratio of the number of docked residues to the total number of traced residues is printed. A value higher than 0.8 is deemed as good convergence. All free atoms are then removed from the file and the task is directed into a few cycles of restrained refinement with solvent search. If, however, the value of sequence coverage is lower than 0.8, the free atoms (DUM) are left in the file. You can inspect the density maps, start changing the model on the graphics or, alternatively, submit another model building task using the output of this job.

    1. o Job termination -The statement Task completed successfully indicates that the job is finished with no error. An error statement

    2. QUITTING … ARP/wARP module stopped with an error message: name_of_the_programme indicated that one of the modules of the task has terminated with an error message. Please refer to the specified log file.
  16. o CPU requirements -The execution of an autotracing task is time consuming. Using a standard protocol of 10 building cycles interspaced with 5 ARP/REFMAC cycles, one should

expect a job for a structure of 500 residues to be completed within about 1 hour (subject to the power of the computer you are using).

Running model building from the GUI, ARP/wARP Expert System

This module remains experimental. It still has the same aims as ARP/wARP Classic: to automatically build protein structures, starting either from molecular replacement models or experimental electron density maps. The way that information is presented to the user, and some refinement choices are different than the 'Classic' module, but the functionality is similar.

A main difference is that this module, when a model is 'more or less complete', it will use the typically available second CPU core to start a new job to clean up the model, add waters, and refine it, and make it available to the user. In parallel, the old job will continue to see if it can find a better solution, with more residues, but the user does not need to wait for that to finish.

Another difference concerns the Sequence file. If you have hetero-multimers in the asymmetric unit of your crystals, you should add each sequence separately, by clicking the Add Input PIR file button. Then, you can define any stoichiometry for complicated hetero-multimers. For each defined sequence the user can select from a pull-down menu the number of copies in the asymmetric unit. Based on that and the contents of the PIR file the contents of the AU in residues will be calculated automatically.

The input files are identical to the ARP/wARP Classic module.

There is a dedicated option to select that the Methionines are Se-Met residues if the dataset comes from a SAD or MAD experiment on a selenium edge; the SAD and TWIN functions, new in

7.1 are implemented in this module as well.

The Decision parameters are where the innovative choices for controlling ARP/wARP are given. The number of refinement and building cycles are not fixed, but are defined on the fly based on the program’s progression. The Decision parameters are defining these limits. If you leave the mouse over one of the input fields, a help text will appear explaining the use of each decision parameter.

The parameter maximum number of processes in parallel is important and is briefly explained below. When Flex-wARP decides that it has reached a more-or-less useful model, it will spawn a 'cleaning up and completion' process. However it will continue the iterative building in parallel. If the iterative building results in a better model, a new 'cleaning up and completion' process will be requested, possibly before the previous 'cleaning up and completion' process has finished. If you have only two processors (typical these days in dual core systems) the new process will be 'queued'; when the previous one is finished the new one will start.

We emphasise that the Expert System module is still ‘experimental’. We hope that in the future we will be able to offer more tricks and tips.

Running flex-wARP from command line, CAutoPyWARP.pyc

Please type to get on-line help:

python $pywarpbin/CAutoPyWARP.pyc --help

Automated Construction of Helical and Beta-Stranded Fragments Building secondary structure from the GUI, ARP/wARP Quick Fold

The procedure for building secondary structural elements in ARP/wARP version 7.1 is based on the use of discriminant analysis in a successive filtering scheme taking into account the geometry of alpha-helical and beta-stranded main-chain fragments. The electron density map is first analysed and a suitable threshold is automatically selected. In the next step stereochemical information on the helix and strand geometry is used; sets of overlapping fragments are constructed and filtered based on their geometric likelihood. All fragments that overlap at a particular location of a helix or a strand undergo an ensemble averaging process to provide the best estimate of CA positions. The output fragments are then regularised and the chain direction is chosen on the basis of their fit to the density. Finally the fragments are refined in real space.

The accuracy of the resulting model depends on many parameters. The module should be able tobuild helices and strands at resolutions as low as 4.5 Å. However, it may not result in complete helical/stranded structure and it may also contain parts that are mis-interpreted. The expected top performance is the correct location of 90% of the helices and 50% of the strands. The procedure is relatively fast and takes only a few minutes for proteins of moderate size (up to 500 residues).

The secondary structure recognition module is optimised to address lower resolution data and hard cases where, e.g. the straight model building protocol (Classic or flex-wARP) has not been successful. For a resolution higher than 2.6 Å the module will automatically trim the resolution and Wilson B-factor of the data to approach its design conditions.

  • Launch ARP/wARP Quick Fold window within the CCP4i GUI.

    • Provide required input:

      1. o MTZ in -X-ray data in the MTZ format containing structure factor amplitudes and their standard deviations, phases and foms.

      2. o Fobs Sigma Phib FOM -If the MTZ column labels for structure factor amplitudes, their standard deviations, phases and figures of merit have obvious names, they will be recognised automatically. Otherwise please use the scrolling button, navigate to List All Labels and choose appropriate ones.

      3. o Output PDB file -Provide the PDB file name where the constructed secondary structure fragments will be output to.

Parameters:

  1. o Number of residues -Provide the expected number of residues in the asymmetric unit. This should at least be a good guess within ±20% of the true number. If the number is too low, the model completeness may be lower. If the number is too high, this may result in tracing mistakes and excessive CPU time.

  2. o Do NOT build beta-strands -If you have real doubts about your structure having a fold with a significant content of beta-strands, you can deactivate their construction by checking the box.

• Now you are ready to start the job: Click on Run and choose Run now

There is a number of additional parameters that you normally should not worry about. A brief description is given below

    • Crystal parameters:

      1. o Space group, Cell, ARP/wARP asymmetric unit, Wilson B factor and Solvent content are derived automatically from the MTZ file and the total number of residues in the asymmetric unit. They are displayed for information only and cannot be changed. However, you may want to check whether their values conform to your expectations.

      2. o Resolution -By default all data present in the MTZ file will be used. You can check the box and then narrow the range if you are aware of certain deficiencies of your data.

    • Coordinate comparison:

    • o Compare with an already deposited protein for validation or testing -If you have the final model and would like to check the installation and the performance of the software, you can check this box. You will then have to provide a PDB file that will be used for comparison.
    • OUTPUT files, short Log File:

        1. o Checking the estimated content – Should the solvent content be too high or too low

        2. (e.g. you have mis-typed the total number of residues expected in the AU), ARP/wARP will re-set it to approximately 50%. The target number of residues will be reset accordingly.
      1. o Residues and chain fragments -The important numbers are highlighted in red/bold in the short log file, indicating the number of residues and the number of fragments into which these residues are arranged. The higher the values of the Connectivity index and the Tracing score, the more complete and reliable the resulting model is. The length of the longest chain is also printed.

      2. o Further extension of the model You may try to feed the PDB output of the module into Classic or flex-wARP. However, subject to the resolution of the data, this may not provide enough seed for subsequent automatic tracing of the full chain.

      3. o Job termination The statement Task completed successfully indicates that the job has finished with no error. An error statement

QUITTING … ARP/wARP module stopped with an error message: name_of_the_program indicated that one of the modules of the task has terminated with an error message. Please refer to the specified log file.

Building secondary structure from the command line, auto_albe.sh

The script auto_albe.sh (where ‘albe’ stands for alpha-beta) in the $warpbin directory allows you to run the secondary structure building as a single-line command without the use of the GUI. The use of auto_albe.sh is fairly simple. The script prints out help information if it is invoked without arguments.

Usage:

$warpbin/auto_albe.sh \ datafile {mtzfile} \ [residues {number_of_residues_in_AU}] \ [workdir {FULLPATH_WORKING_DIRECTORY}] \ [helixfileout {output_PDB_file}] \ [jobId {desired_job_id_used_for_subdirectory_naming}] \ [fp {fp label} sigfp {sigfp label} phib {phi label}] \ [fom {fom label}] (input 'fom none' if no fom is to be used) \ [compareto {PDB_file_for_comparison}] \ [nostrands {0 or 1, default=0}] \ [parfile {parfilename_if_only_parfile_is_to_be_created}]

-
Optional command line arguments are given in square parentheses
-
All input files are assumed to be located in working directory unless they are given with full path
-
If workdir is not given, the current directory will be assumed
-
All output files will be written into workdir/subdirectory

Required keyword is: datafile (followed by the mtz-file name with the full path).

Optional keywords include: residues (the expected number of residues in the asymmetric unit), workdir (followed by the full path to the working directory), helixfileout (the name of the PDB file where the traced both helical and stranded fragments will be output to), jobId (if you wish that the working sub-directory has a particular name), fp (followed by the fp label), sigfp (followed by the sigfp label), phib (followed by phibest label) and fom (followed by the label to fom). The defaults are FP, SIGFP, PHI and FOM, respectively. Alternatively, if the mtz file contains only one column for structure factor amplitudes and only one column for their standard deviations, these will be taken. If you wish FOM not to be used, please input ‘fom none’. For test purposes, the constructed helices/strands can be compared to known reference models (hand-or pre-fitted). The required keyword is compareto (followed by the full-path name of a PDB file). You can also enable/disable the construction of strands using the keyword nostrands, the default is 0 (build the strands).

If auto_albe.sh is called with an option ‘parfile’, the script will create a parameter file and a directory in the workdir whose name will be printed. The job can subsequently be launched by:

% $warpbin/warp_albe.sh NAME_OF_PARFILE

If auto_albe.sh is called without an option ‘parfile’, it will also launch the job. The log files and additional output files as well as the building results can be found in the directory created.

Automated Side Chain Building Running side chain building from the GUI, ARP/wARP Side Chains

This side-chain building module can be used either to try and improve the density fit of existing side chains, or in the absence of side chains to perform side-chain docking and build the side chains from scratch:

  1. In the first mode, the amino-acid type from the PDB is used for each residue to determine the best fitting rotamer and to refine this rotamer in real-space. When the resulting side-chain conformer fits the density better than the original side chain (using a real space density correlation function as the criterion for the fit), the new side chain conformer will replace the old version in the PDB.

  2. In the second mode, no knowledge on the amino acid type of the residues is assumed, but the sequence is read from the sequence file(s). This mode docks the main-chain fragments into sequence. The side chains docking can be performed using free atoms or also by using the electron density only (slower but may provide better results for low resolution data). The module will then build rotamers and refine them in real space by torsional refinement if the user chooses to do so.

    • Launch the ARP/wARP SNOW window within the CCP4i GUI

      1. o Perform ... Select the mode in which you would like to run the program. The default is to rebuild the side chains, but you can also choose for sequence docking based on free atoms, density or a combination of the two.

      2. o Using ... Select whether the map will be input or will have to be calculated from an MTZ file.

    • Provide required input:

      1. o MTZ in -X-ray data in the MTZ format containing structure factor amplitudes and their standard deviations.

      2. o Fmap PHImap -If the MTZ column labels for structure factor amplitudes and their standard deviations have obvious names, they will be recognised automatically. Otherwise please use the scrolling button, navigate to List All Labels and chose appropriate ones.

      3. o FOM -Optional. Set this value, when ‘Use figure of merit for weighting amplitudes’ was selected

      4. o PDB in -Provide the PDB file with coordinates of the protein. For sequence docking, this might only contain poly-GLY fragments. When rebuilding the side chains, the PDB should be as complete as possible.

      5. o Sequence file -Provide the sequence file(s) of the protein structure. Only needed for sequence docking.

      6. o PDB out -Provide the name of the PDB file where the protein model together with the (re)built side chains will be written to.

  • Click on Run and choose Run now

There are a number of options that can be added. A brief description is given below.

    • Actions (side-chain rebuilding):

    • o Density fit -Select the method to compute the density fit during rotamer fitting and real-space torsional refinement. Note: ‘weighted means’ is implemented in a computationally efficient way and is very fast, while the correlation coefficient is a bit slower, but makes it easier to compare results for different parts of the protein structure.
    • Actions (sequence docking):

      1. o Sequence multiplicity -Provide the number of molecules in the asymmetric unit.

      2. o Methionine -Indicate whether Methionine is in fact Seleno-Methionine.

      3. o Build Rotamers -Indicate whether only the best rotamer should be found, or whether torsional refinement should be run afterwards.

      4. o Figure of merit -Indicate whether to use the figure of merit for weighting amplitudes.

      5. o Build free atoms -Indicate whether to place free atoms into the density map. Only use when free atoms are used to create the side-chains instead of the density fit alone.

      6. o Rearrange fragments -Select to rearrange the fragments in space to get a globular molecule

      7. o Accept fragments -Use this threshold to determine whether a docked fragment can be trusted.

    • Crystal parameters:

      1. o Cell -Set the cell parameters of the crystal

      2. o ARP/wARP asymmetric unit -This asymmetric unit is needed for building free atoms.

Automated Loop Building Running loop building from the GUI, ARP/wARP Loops

This module tries to find likely loops to connect fragments of a partial protein structure based on the sequence and the density map. It builds the loops in three phases. First a tree of possible CAs between the fragments is build, next the unlikely ones are removed and the rest of the main chain atoms determined, and finally the best loops are selected. The tree can be build either towards the C-terminus of the N-terminus of the protein, or both. The built loops are ordered (in descending order) according to the density correlation at the main chain atoms (including CB if present) or the correlation of the side chains, or a combination of both. If the number of loops exceeds the chosen number only the best are saved to file.

  • Launch the ARP/wARP Loops window within the CCP4i GUI

    • Provide required input:

      1. o Building loops -Select whether to start from a map or an mtz file.

      2. o Mode loop building -Select whether to try to build all loops in the PDB file (a sequence file will be needed) or to build a specific loop

        1. o Number of loops -In case of building a single loop, select the number of loops you would like the program to save. It might very well be that the number of loops left after

        2. pruning is less than this number. If no loops are found at all, play with the parameters, specifically those in the folder "Selecting best CAs".
      3. o MTZ in -X-ray data in the MTZ format containing structure factor amplitudes and their standard deviations.

      4. o Fmap PHImap -If the MTZ column labels for structure factor amplitudes and their standard deviations have obvious names, they will be recognised automatically. Otherwise please use the scrolling button, navigate to List All Labels and chose appropriate ones.

      5. o Protein model for loop building -Provide the PDB file with coordinates of the protein. Note that the module will only attempt to build missing loops and will not rebuild any of the existing residues.

      6. o New loops output file -Provide the name of the PDB file where the built loops will be written to.

      7. o Protein and new loops combined output -Provide the name of the PDB file where the protein model together with the built loops will be written to.

  • Click on Run and choose Run now

There are a number of options that can be added. A brief description is given below.

    • Definition of loop:

      1. o Build a loop -Provide anchor residues of a fragment on the N and the C terminus side of the protein. If you want to rebuild some terminal residues, you need to remove them from the input PDB file. Provide the length of the loop including the two anchor points.

      2. o The loop sequence -Provide amino acid sequence (one letter code) of the residues in the loop including the two anchor points.

      3. o Loop building in both directions -If selected (default), trees of possible loops are generated starting from the N terminus anchor as well as from the C terminus anchor. The best loops are selected from the combined set. If you do not choose to build both ways, you can indicate whether you want to build the tree towards the C terminus of the protein, or towards the N terminus.

    • Selecting best loops:

      1. o Deviation distance loop connection -Set the allowed error in the CA-CA distance.

      2. o CA density correlation threshold -This number sets the number of best loops kept based on the density correlation of the CAs only.

      3. o Structural threshold -Set the threshold for the minimal value for the log likelihood of this structure. Set the minimum value, if you want to ensure to keep at least a certain number of loops after pruning. Set the maximum value, if you want to ensure that the number of loops doesn't exceed a certain amount after structural pruning.

      4. o Main chain density correlation -This parameter sets the number of best loops kept.

      5. o Selecting the best loop -The two numbers define the weights for the main and side chain atoms, respectively.

    • Selecting best CAs:

      1. o Likelihood threshold -This is the threshold for a CA to represent the fifth CA of a penta-peptide, based on density correlation, CA-CA distance and structure.

      2. o Minimum distance CAs -Measures the minimal distance between CAs from the same shell. The CA with the best likelihood is kept.

    • Generating CAs:

      1. o Select generation CA shell -By default a shell with a uniform and regular distribution of CAs at exactly CA-CA distance is generated. You can also choose for a uniform and random distribution of the CAs. In that case the shell is generated with a given thickness.

      2. o Number of CAs -Number of CAs generated within a shell.

      3. o CA-CA distance -Distance to use between successive CAs.

      4. o Keep CAs with negative density halfway -Default for this option is false.

    • Density handling:

    • o Interpolation method -Choose the method used to determine the density correlation from the map. The quick, but less accurate option is Cubic Interpolation. More accurate, but significantly slower is Best, which is based on a Gaussian density correlation function to simulate the shape of an atom. The difference in execution speed between the options can be large. The side chain correlation at the end is always determined using the Gaussian function.
    • Crystal parameters:

    • o Space group and Cell are derived automatically from the MTZ and the PDB files, displayed for information only and cannot be changed. However, you may want to check whether their values conform to your expectations.
    • Log files of Loopy:

      1. o Message level -Choose a value between 0 and 9, the default is 4

      2. o Abort level -If a message at this level is encountered, the module will abort. The default value is 8.

      3. o Message file -Name for the message file (plain text).

      4. o XML output file -Name for the XML message file (xml format).

Automated Building of Poly-Nucleotides Running nucleotide building from the GUI, ARP/wARP DNA/RNA

This is a prototype module for building fragments of DNA or RNA. The input is an MTZ file containing the phases from which the map best describing the nucleotide region can be computed. Thus the map could be a difference map (e.g. after the protein model is completed) or a sigma-weighted map for the whole asymmetric unit. The nucleotide building procedure within ARP/wARP Version 7.1 proceeds in several steps: first it locates putative phosphates in the density map, then uses them in a manner analogous to the CA-candidates for protein chain tracing. After the nucleotide fragments are obtained, a likely base is built and refined in real space. The type of the base is currently limited to A (large) or C (small) and the nucleotide sequence is not yet used.

The produced poly-nucleotides are quite accurate, a typical rmsd for the built backbone atoms is

0.6 Å with X-ray data extending to around 3.0 Å resolution. The method is not sensitive to a particular DNA or RNA conformation. The module is not very CPU efficient and may take about 10 minutes for a 20-nucleotide structure.

  • Launch the ARP/wARP DNA/RNA window within the CCP4i GUI

    • Provide required input:

      1. o MTZ in X-ray data in the MTZ format containing structure factor amplitudes and their standard deviations.

      2. o Fobs Sigma PHIB FOM If the MTZ column labels for structure factor amplitudes and their standard deviations have obvious names, they will be recognised automatically. Otherwise please use the scrolling button, navigate to List All Labels and chose appropriate ones. FOM is optional and could be omitted if Fobs are already FOM-weighted.

      3. o Output PDB file -Provide the PDB file name where the constructed polynucleotide fragments will be output to.

  • Click on Run and choose Run now

There are a number of options that can be added. A brief description is given below.

  1. o Space group, Cell, ARP/wARP asymmetric unit, Wilson B factor and Solvent content are derived automatically from the MTZ file and the total number of residues in the asymmetric unit. They are displayed for information only and cannot be changed. However, you may want to check whether their values conform to your expectations. Obviously, if you entered zeros as the expected number of residues and nucleotides, the solvent content will be displayed as 1.0 but you should not worry about this.

  2. o Resolution -By default all reflections present in the MTZ file will be used. You can check the box (Use reflections between) and then narrow the range if you are aware of certain deficiencies of your data.

OUTPUT files, short Log File:

    1. o Checking the estimated content – Should the solvent content be too high or too low

    2. (e.g. you have mis-typed the total number of residues expected in the AU), ARP/wARP will re-set it to approximately 50%. The target number of residues will be reset accordingly.
  1. o Phosphate candidates – The identified number of phosphate candidates is typically 100 times higher than the number of nucleotides in the structure.

  2. o Nucleotides and chain fragments -The important numbers are highlighted in red/bold in the short log file, indicating the number of nucleotides and the number of fragments into which these residues are arranged. The length of the longest chain is also printed.

  3. o Job termination The statement Task completed successfully indicates that the job has finished with no error. An error statement

QUITTING … ARP/wARP module stopped with an error message: name_of_the_program indicated that one of the modules of the task has terminated with an error message. Please refer to the specified log file.

Running nucleotide building from the command line, auto_nuce.sh

The script auto_nuce.sh in the $warpbin directory allows you to run the secondary structure building as a single-line command without the use of the GUI. The use of auto_nuce.sh is fairly simple. The script prints out help information if it is invoked without arguments.

Usage:

$warpbin/auto_nuce.sh \ datafile {mtzfile} \ [residues {number_of_protein_residues_in_AU}] \ [nucleotides {number_of_nucleotides_in_AU}] \ [workdir {FULLPATH_WORKING_DIRECTORY}] \ [fp {fp_label}] [sigfp {sigfp_label}] [fbest {weighted_amplitude_label}] \ [phib {phib_label}] [fom {fom_label}] \ [resol {'rmin rmax' (default is the full resolution range) }] \ [compareto {PDB_file_for_comparison}] \ [parfile {parfilename_if_only_parfile_is_to_be_created}] \

-
Optional command line arguments are given in square parentheses
-
Possible combinations of MTZ labels for map calculation are: fp/sigfp/phib/fom or fbest/sigfp/phib if fbest is already fom-weighted.
-
In the latter case if 'fbest' is given, 'fom' will be ignored
-
All input files are assumed to be located in working directory unless they are given with full path
-
If workdir is not given, the current directory will be assumed
-
All output files will be written into workdir/subdirectory

Required keyword is: datafile (followed by the mtz-file name with the full path).

Optional keywords include: residues (the expected number of residues in the asymmetric unit), nucleotides (the expected number of nucleotides in the asymmetric unit), workdir (followed by the full path to the working directory), fp (followed by the fp label), sigfp (followed by the sigfp label), phib (followed by phibest label) and fom (followed by the label to fom). The defaults are FP, SIGFP, PHI and FOM, respectively. Alternatively, if the mtz file contains only one column for structure factor amplitudes and only one column for their standard deviations, these will be taken. If you wish FOM not to be used, please fbest. You can set resol (followed by the resolution limit). For test purposes, the constructed model can be compared to known reference model. The required keyword is compareto (followed by the full-path name of a PDB file).

If auto_nuce.sh is called with an option ‘parfile’, the script will create a parameter file and a directory in the workdir whose name will be printed. The job can subsequently be launched by:

% $warpbin/warp_nuce.sh NAME_OF_PARFILE

If auto_nuce.sh is called without an option ‘parfile’, it will also launch the job. The log files and additional output files as well as the building results can be found in the directory created.

Automated Ligand Building Running ligand building from the GUI, ARP/wARP Ligands

The ligand building procedure within ARP/wARP Version 7.1 proceeds in three steps: first it locates the binding site in the difference density map, then builds there a number of putative ligand models and, finally, selects the best model, which is geometrised and real-space fit into the density. The binding region is selected automatically by matching ligand’s shape-related properties to the regions of high density. The chosen region is parameterised by a sparse set of putative positions (grid nodes) for the ligand atoms. The stereochemical information and van der Waals repulsions in combination with the electron density allows one to obtain a suitable estimate of the position, orientation and conformation of the ligand. For the construction of the ligand into this sparse set two algorithms are used. One algorithm exploits the combinatorial assignment of the ligand atom identities to the grid nodes, ‘label swap’. Another algorithm maximises the overlap between the sparse set and the ligand model by a random search in conformational space. The output from both algorithms undergoes a last stage of real-space refinement before the final model is selected.

The accuracy of ligand building is mainly dependent on ligand size and the resolution of the X-ray data. As a rough guide, about 75% of well-ordered ligands of a size up to 20 non-hydrogen atomsshould be built within r.m.s.d. of 1.0 Å from their correct location. For ligands that are larger in size, such ‘success rate’ decreases to about 50%. With the r.m.s.d. of 1.0 Å or less the constructed models should be accurate enough for REFMAC5 to straightforwardly refine the protein-ligand complex. The procedure can be iterated to locate additional ligands, if any are present.

The ARP/wARP ligand building module requires the X-ray data (in MTZ format), the built protein without ligands (in PDB format) and a template model of the ligand to build (in PDB format). Options include the possibility to specify the binding site and the number of starting grids, the ability to compare the run result to some reference ligand(s), and the possibility to build a ligand taken from a list of candidates ('cocktail'). In the latter case the coordinates of the ligand candidates should be concatenated into a single PDB file. The different ligands must be distinguished by their residue name (columns 18-20), chain identifier (column 22) or residue sequence number (columns 23-26). ARP/wARP will automatically choose the best-matching ligand candidate and will attempt to build it at the binding site, either determined automatically or supplied by the user. However, since this feature is new, the specification of the binding site (see below) is recommended.

  • Launch the ARP/wARP Ligands window within the CCP4i GUI

    • Provide required input:

      1. o MTZ in -X-ray data in the MTZ format containing structure factor amplitudes and their standard deviations.

      2. o Fobs Sigma -If the MTZ column labels for structure factor amplitudes and their standard deviations have obvious names, they will be recognised automatically. Otherwise please use the scrolling button, navigate to List All Labels and chose appropriate ones.

      3. o Protein model without ligand -Provide the PDB file with coordinates of the protein only. If the file contains solvent atoms, free atoms or fragments of other ligands, please make sure that their location is not overlapping with the supposed location of the ligand or have them removed prior to running ligand building.

      4. o Ligand molecule coordinates -Stereochemical information about the ligand to be built is read in a form of a PDB file. This file should contain the ligand molecule only. The molecule can be in any conformation. However the interatomic distances, bonding angles and the chirality (if present) should in a sensible way correspond to the target stereochemistry of the ligand to be built. Please also check that there is atom-bonded connectivity throughout the whole target ligand molecule (i.e. you do not accidentally have several unconnected clusters of atoms) and that there are no atoms that are tooclose to each other (distance < 0.6 Å).

  • Click on Run and choose Run now

There are a number of options that can be added either in the main GUI panel (scrolling bar Build the ligand) or under the Parameters section. You normally should not need to worry about these (except you want the ligand to be build around the known location or you would like to screen a list of candidates, ‘ligand cocktail’). A brief description is given below.

Optional parameters:

o Build the ligand (Binding site location) In the most likely place of the complete asymmetric unit (default)  around the same approximate place as a previous ligand -The binding site is

defined by the position of a compound known to bind at the desired location. If you use this option, the region is required in form of a PDB file, (previous ligand coordinates).

around an approximate XYZ position -The binding site is defined by (X, Y, Z)Cartesian coordinates and a search radius in Å (option Search for the ligand around).

  1. o Refmac5 -By default the fast protocol is used (1 cycle of refinement). If your PDB file needs considerable pre-refinement with Refmac before the difference electron density map can be computed, you can chose the slow protocol (3 cycles of refinement).

  2. o Free R Flag -The default is not to use “R-free” for Refmac refinement. You can chose to use R-free, this will cause additional options to appear within the section “Refmac parameters”.

  3. o Ligand building cycles -defines the number of grid parameterisations of the binding region. The default value is 2. There is one run of each ligand building algorithm for each starting grid, therefore the CPU time required for building is proportional to this number

of cycles. If this matters for large ligands you can set the number of ligand building cycles to 1.

    • Refmac parameters:

      1. o Cycles of refinement in each Refmac run -Refmac is invoked to refine your protein part of the structure before the difference density map is computed. The default is 1 cycle for the fast protocol and 3 cycles for the slow protocol, see above.

      2. o Matrix weight for Xray / Geometry -The default is automatic weighting and there is no need to change this parameter.

      3. o Input a user-defined library file -In case your input protein is already a protein-ligand complex then Refmac will have to refine both entities together in order to obtain a difference electron density map. If you already have a Refmac-style cif library for your already present ligand, you can input it here. Otherwise, Refmac will use its own library if it knows the ligand. If it does not, it will generate a cif file for the ligand and proceed.

    • Crystal parameters:

      1. o Space group, Cell, ARP/wARP asymmetric unit, Wilson B factor and Solvent content are derived automatically from the MTZ file and the total number of residues in the asymmetric unit. They are displayed for information only and cannot be changed. However, you may want to check whether their values conform to your expectations.

      2. o Resolution -By default all reflections present in the MTZ file will be used. You can check the box (Use reflections between) and then narrow the range if you are aware of certain deficiencies of your data.

    • Test and comparison parameters:

    • o Compare with an already fitted ligand If you have the final model of the ligand in the correct orientation and would like to check the installation and the performance of the software, you can check this box. You will then have to provide a PDB file that will be used for comparison.
    • OUTPUT files, short Log File:

      1. o Refinement with refmac -The R factor (and R free if requested) are printed after refinement of the protein part only with Refmac. Check that the value of the R factor is reasonable. A value higher than about 30% may indicate that the computed difference map may be too noisy for location of the ligand. A failure may indicate invalid atom nomenclature in your PDB file.

      2. o The ligandbuild program -The mapping of the difference density synthesis parameterised with grid points onto the ligand atoms (ligandbuild and M_ligandbuild) is run as many times as defined by the number of ligand building cycles. A failure may indicate incorrect identification of the binding site. This can be amended by defining the binding site manually prior to the run (see above).

      3. o Real space fit -Up to 108 top constructed ligand models undergo a real-space refinement with respect to the difference density map. The best solution is output. If the test and comparison option is selected, the r.m.s.d. to the reference PDB file (XYZREF) is also printed. There will be a warning given if the stereochemistry of the constructed ligand is poor. Also a warning will be given if the constructed ligand molecule has severe steric clashes, which may be a sign of an incorrect ligand building. You may want to inspect the ligand and the density and, if there is a clear part of the ligand that is disordered, try to remove it from the ligand target PDB file and to re-run the job.

      4. o Job termination -The statement Task completed successfully indicates that the job has finished with no error. An error statement

QUITTING … ARP/wARP module stopped with an error message: name_of_the_program indicated that one of the modules of the task has terminated with an error message. Please refer to the specified log file.

Running ligand building from the command line, auto_ligand.sh

The script auto_ligand.sh in the $warpbin directory allows you to run the ligand building as a single-line command without the use of the GUI. The use of auto_ligand.sh is fairly simple. The script prints out help information if it is invoked without arguments.

$warpbin/auto_ligand.sh \ datafile {either mtzfile or mapfile} \ protein {starting_PDB_file_without_ligand} \ ligand {PDB_file_with_ligand_to_fit} \

[workdir {FULLPATH_WORKING_DIRECTORY}] \ [ligandfileout {output_PDB_file}] \ [fp {fp_label}] [sigfp {sigfp_label}] [freer {freer_label}] \

[nligandcycles {number_of_ligandbuild_cycles (default is 2)}] \ [search_model {PDB_file_with_model_at_expected_ligand_site}] \ [search_position {X Y Z}] \ [search_radius {radius_in_angstroms}] \ [reflist {textfile_with_FULLPATHnames_of_fitted_ligands_for_comparison}] \ [extralibrary {user_defined_library_for_Refmac5}] \ [parfile {parfilename_if_only_parfile_is_to_be_created}]

Required keywords are: datafile (followed by the mtz-file name with the full path), protein (followed by the pdb-file name of the protein model without the ligand with the full path) and ligand (followed by the pdb-file containing the ligand(s) description with the full path).

Optional keywords include: workdir (followed by the full path to the working directory), fp (followed by the fp label), sigfp (followed by the sigfp label). The defaults are FP and SIGFP, respectively. Alternatively, if the mtz file contains only one column for structure factor amplitudes and only one column for their standard deviations, these will be taken. The number of ligand building cycles (default is 2) can be changed with keyword nligandcycles. The approximate location of the binding site can be supplied by the user either by providing the pdb-file(s) of a ligand (or a just a list of atoms) located at the binding site (search_model), or by specifying the (XYZ) coordinates of a point defining the binding region using search_position and search_radius (default value for the latter is 5 Å). For test purposes, the constructed ligand can be compared to known reference models (hand-or pre-fitted). The required keyword is reflist (followed by the full-path name of a text file, containing a list of pdb-files with the reference ligands and their absolute paths). A user-defined ligand library can be input using keyword extralibrary.

To build the ligand from a list of candidates ('cocktail'), the coordinates of the ligand candidates should be concatenated into one file specified by the above mentioned keyword ligand. The different ligands must be distinguished by their residue name (columns 18-20) in the concatenated pdb file (different chain identifier or residue sequence number will do as well, however we recommend to use different residue names). ARP/wARP will automatically choose the best-matching ligand candidate and will attempt to build it at the binding site, either determined automatically or supplied by the user. However, since this feature is new, supplying the binding site using search_model or search_position keywords is recommended.

If auto_ligand.sh is called with an option ‘parfile’, the script will create a parameter file and a directory in the workdir whose name will be printed. The job can subsequently be launched by:

% $warpbin/warp_ligand.sh NAME_OF_PARFILE

If auto_ligand.sh is called without an option ‘parfile’, it will also launch the job. The log files and additional output files as well as the building results can be found in the directory created.

Automated Solvent Building Running solvent building from the GUI, ARP/wARP Solvent

Within solvent building module restrained reciprocal space refinement is carried out with REFMAC while ARP/wARP is performing automatic adjustment of the solvent structure. Resolution of thedata should be 2.5 Å or higher. The output is the protein model with the solvent molecules transformed with symmetry operations to lie around the protein.

The ARP/wARP solvent building module requires the X-ray data (in MTZ format) and the protein model (in PDB format) without solvent or with a partial solvent model.

  • Launch the ARP/wARP Solvent window within the CCP4i GUI

    • Provide required input:

      1. o MTZ in -X-ray data in the MTZ format containing structure factor amplitudes and their standard deviations.

      2. o Fobs Sigma -If the MTZ column labels for structure factor amplitudes and their standard deviations have obvious names, they will be recognised automatically. Otherwise please use the scrolling button, navigate to List All Labels and chose appropriate ones.

      3. o Starting model in -Provide the PDB file with coordinates of the protein only. If the file already contains some solvent sites, these will be updated during the iterative solvent building.

      4. o Output model -Provide the name of the file where output PDB of the protein with the built solvent will be written to.

  • Click on Run and choose Run now

There are a number of options that can be added. A brief description is given below.

    • Required parameters:

      1. o ARP/REFMAC refinement cycles -By default 20 cycles will be carried out. However, the job may finish earlier if converged. Please monitor R factor / R free for convergence.

      2. o Free R flag -It is advantageous to use R free flag for solvent building. You can chose to use R-free, this will cause additional options to appear within the section “Refmac parameters”. The default is not to use R free.

    • ARP/wARP flow parameters:

      1. o Add atoms -This is followed by two numbers defining the threshold (in sigmas of the density above the mean) for addition and removal of solvent atoms. The defaults are 3.4 and 1.0, respectively, which should work for most cases.

      2. o Disable Wilson plot statistics check -The current Wilson plot checking routine is probably too stringent. You may disable the check and the warnings if you are sure that the X-ray data is of high quality. However, we strongly recommend to not disable the check and in case of warnings, inspect the plot and only then proceed.

    • Refmac parameters:

      1. o Cycles of refinement in each Refmac run -Refmac is invoked to refine the model before the density maps are computed. The default is 1 cycle and there is usually no need to change this.

      2. o Matrix weight for Xray / Geometry -The default is automatic weighting. This proved to work well and, probably, there is no need to change this parameter.

      3. o Scaling model -The default is to use simple scaling of the low angle part of the X-ray data. You can change this to bulk solvent correction if you are sure that your low angledata below about 8 Å resolution are complete and correct.

      4. o Scaling B factor -The default is to use anisotropic B factor for scaling the X-ray data. You can choose isotropic scaling B factor if your data are systematically incomplete (e.g. a cone is missing in reciprocal space).

      5. o Data with free R label -This parameter appears if the free R flag is chosen for refinement of the protein part of the model. Here you can provide a column label for the free R flag.

      6. o Scaling and sigmaa calculations -This parameter also appears if the free R flag is chosen for refinement of the protein part of the model. The scaling and calculation of σA coefficients by Refmac map can be computed on the basis of the free reflections (this is the default) or using all reflections.

      7. o TLS refinement -The default is not to do a TLS refinement of the model.

      8. o Input a user-defined library file -If you already have a Refmac-style cif library for, e.g. your already present ligand, you can input it here.

    • Crystal parameters:

      1. o Space group, Cell, ARP/wARP asymmetric unit, Wilson B factor and Solvent content are derived automatically from the MTZ and the PDB files, displayed for information only and cannot be changed. However, you may want to check whether their values conform to your expectations.

      2. o Resolution -By default all reflections present in the MTZ file will be used. You can check the box (Use reflections between) and then narrow the range if you are aware of certain deficiencies of your data.

    • OUTPUT files, short Log File:

      1. o Refinement with refmac -The R factor (and R free if requested) are printed after refinement of the protein with Refmac. Check that the value of the R factor is decreasing upon solvent building.

      2. o Job termination -The statement Task completed successfully indicates that the job has finished with no error. An error statement

QUITTING … ARP/wARP module stopped with an error message: name_of_the_program indicated that one of the modules of the task has terminated with an error message. Please refer to the specified log file.

Running solvent building from command line (auto_solvent.sh)

The script auto_solvent.sh in the $warpbin directory allows you to run the solvent building as a single-line command without the use of the GUI. The use of auto_solvent.sh is fairly simple. The script prints out help information if it is invoked without arguments.

$warpbin/auto_solvent.sh \ datafile {mtzfile} \ protein {starting_PDB_file} \ [workdir {FULLPATH_WORKING_DIRECTORY}] \ [solventfileout {output_PDB_file}] \ [fp {fp_label}] [sigfp {sigfp_label}] [freer {freer_label}] \ [restrcyc {number_of_cycles (default is 20) }] \ [extralibrary {user_defined_library_for_Refmac5}] \ [tlsin {fixed pre-refined TLS tensors from Refmac5}] \ [parfile {parfilename_if_only_parfile_is_to_be_created}]

-
Optional command line arguments are given in square parentheses
-
All input files are assumed to be located in working directory unless they are given with full path
-
If workdir is not given, the current directory will be assumed
-
All output files will be written into workdir/subdirectory

Required keywords are: datafile (followed by the mtz-file name with the full path) and protein (followed by the pdb-file name of the protein model with the full path).

Optional keywords include: workdir (followed by the full path to the working directory), solventfileout (followed by the name of the PDB file where the output will be written), fp (followed by the fp label), sigfp (followed by the sigfp label) and freer (followed by the Rfree label). The defaults for the first two are FP and SIGFP, respectively. Alternatively, if the mtz file contains only one column for structure factor amplitudes and only one column for their standard deviations, these will be taken. The number of cycles (default is 20) can be changed with keyword restrcyc. The user-defined library and the tls-tensor for Refmac can be supplied by using the keywords extralibrary and tlsin.

If auto_solvent.sh is called with an option ‘parfile’, the script will create a parameter file and a directory in the workdir whose name will be printed. The job can subsequently be launched by:

% $warpbin/warp_solvent.sh NAME_OF_PARFILE

If auto_solvent.sh is called without an option ‘parfile’, it will also launch the job. The log files and additional output files as well as the building results can be found in the directory created.

ARP/wARP molecular graphics: ARP Navigator

Running small ARP/wARP tasks and viewing models and density maps

The proto-version of the graphical front-end to ARP/wARP Version 7.1 is an OpenGL/X-window based graphics program that can be launched by pressing the GUI ‘ARP Navigator’ button. The program can also be started from the command line by typing ‘arpnavigator’.

Mouse and Keyboard functions Rotation

  • Left mouse button pressed and mouse moved: the scene rotates about the x and y axes (screen plane).

  • Left mouse button + r-key pressed and mouse moved left-right: the scene rotates about the z axis (perpendicular to screen plane).

Translation

  • Right mouse button pressed and mouse moved: the scene is translated in the xy-plane (screen plane; maps are infinitely repeated).

  • Alternatively: Left mouse button + t-key pressed and mouse moved.

  • Left mouse button + z-key pressed and mouse moved up-down: the scene is translated in z-direction (perpendicular to screen plane).

Scaling

  • Middle mouse button pressed and mouse moved left-right: zooming, the scene is scaled and a scale-o-meter is shown on the right.

  • Alternatively: Left mouse button + s-key pressed and mouse moved.

Clip planes

  • Left mouse button + f-key pressed and mouse moved left-right: changes the front clip position.

  • Left mouse button + b-key pressed and mouse moved left-right: changes the back clip position.

  • Left mouse button + g-key pressed and mouse moved left-right: changes the front and back clip position together.

  • Left mouse button + d-key pressed and mouse moved left-right: changes the position of the rotation-center (similar to translation).

Map contouring

  • The mouse wheel is used for changing the contour level of a map. The map must be activated by pressing the corresponding object button at the bottom of the graphics window.

  • Alternatively: Left mouse button + c-key pressed and mouse moved up-down.

Map extent

Left mouse button + e-key pressed and mouse moved up-down/left-right (size increases right/up).

Menus

  • Left mouse button pressed on top of menu items: this alone operates the menu and activates/deactivates object buttons at the bottom of the window.

  • Left mouse button pressed in graphics area: marks atoms or density (switch this in Options menu). Double-click will also centre on atoms.

  • Right mouse button pressed on top of an object button: opens the Mini menu of the related object (Parameters, close, save, etc.).

  • Right mouse button pressed in graphics area: opens the Quick actions menu.

The keyboard alone can have the following functions:

  • w: Hide the menu and all attached information as long as pressed

  • W (=shift-w): Lock the function of 'w' and do not show the menu when released. To unlock, press 'w' or 'shift-w' again, then the menu will be visible again.

  • G (=shift-g): Launch a goto-atom dialog (see 'goto atom' below).

  • C (=shift-c): Center on the last mark set irrespective of whether this was an atom or a density region.

  • D (=shift-d): Activate the display of distances between the most recent mark and all other marks set so far.

  • m: Toggle the control of a detached model: move the model only vs move the crystal frame alone with the model fixed.

  • k: Toggle the control of a detached model: move the model and the crystal frame together vs move the crystal frame alone.

The ArpNavigator Menu

  • About: This menu item contains the 'about' information of the program.

  • Quit ArpNavigator: To exit the program using the mouse.

The Files Menu

  • Open MTZ File: Open an MTZ file that contains structure factor amplitudes and phases to feed into fft. Will show as a map. The first part will be an intuitive file browser, then you are asked to select labels, resolution range and colour. Working default values are provided.

  • Open MAP File: Open a map. Use the file browser, then go with defaults or choose different values for colour, contour level, etc.

  • Open PDB File: Open a coordinate file.

Note: When a file is loaded and put on display, there will be little buttons appearing in the bottom left corner representing each of the graphical objects. Only one object can be active at a time.

An object can be made active by clicking on the button with the left mouse button. A little eye symbol shows whether this object is currently on display or if it's hidden. Clicking with the right mouse button on this button will pull out the mini-menu with actions applied to this object only (see also Mini menu).

  • Duplicate Object: If an item that is already loaded should be duplicated, e.g. a map that is to be looked at with different contour level and colour.

  • Close File: Delete an object from memory and remove it from the screen - all changes applied will be lost!

  • Close All Files: The real clean - again all changes will be lost.

  • Save File As: Choose a name for an object and make a copy of what is on the screen (for maps and models).

  • Open Status File: Opens a status file saved previously. This will reproduce most of what the screen looked like at the time the status was saved. All files must be in their places still. The file-name suffix is '.vst'.

  • Save Status: Saves the current status into a file whose name must be chosen.

The Mini-Menus

Each object loaded has its own menu of selected actions:

  • Save File As, Close File: Basic operations.

  • Detach Molecule: This detaches a model from the crystal frame. It can then be moved independently from it. An orange button box appears that allows to control whether the crystal frame is moved, or the model is moved, or if they are moved together. A fourth button allows resetting the position of the detached model to the start. When a model is detached, the respective menu item is renamed to 'Fix Molecule' to put the model back into the frame.

  • Duplicate Object: Makes an exact copy.

  • Parameters: Starts a dialog window with plenty of settings.

  • Fit To Screen, Center On, Hide: More basic operations.

The Tasks Menu

  • Fit a Ligand: This will run the ARP/wARP ligand building software as an external software in a separate thread. The same files are required as for the CCP4 GUI. It is quite useful to leave defaults as they are, but one can play with them. When run now is pressed, the job starts with auto_ligand.sh. If this is successful, i.e. the parameter file could be made, then there is a live button in the top right corner, where the parameter file can be looked at. The short log file of the ligand job appears instantly and the calculated data and structures show up on the screen as soon as they are ready. A job may be killed as long as it is running. The window cannot be closed when a job is running.

  • Build Helices and Strands: This will run the ARP/wARP secondary structure modeller as an external job.

Note: In difference to the functionality offered from the CCP4 GUI, the above two tasks will also accept density maps as input.

• Model Solvent: This runs the solvent building module of ARP/wARP.

The Display Menu

  • Global Parameters: This allows you to change the background colour and the depth fog. You can also switch on an off the perspective distortions. These become active only when the 'Apply' button is pressed.

  • Map Parameters: For the active map object only one can change the parameters of display. The window can stay alive even when the map is no longer the active object and it will vanish when the map is deleted. Here you can change the map colour, whether it is displayed as a mesh or a solid body. You can clip the density to a model of a ligand, too. The structures will show up in a browse button next to 'clip to'. The clip radius can be set. Again here the contour level and extension can be set by typing them. If the contouring is changed in the graphics using the mouse wheel, then in the map parameters menu, these values will change.

  • Model Parameters: For the active model object only one can change the parameters of display. A dialog window appears where various settings can be found to suit a special purpose.

  • Show Graphics Status: This activates the display of the status information on graphics in a separate little window (e.g. centre and eye position).

  • Show Scale-o-meter: This is a toggle button to activate/deactivate the meter bar on the right to show distance units at the current scale. Off by default.

  • Show Axes Orientation: This is a toggle button to activate/deactivate the display of xyzaxes with letters and in colours in the top right corner of the graphics window. On by default.

  • Show Contour Levels: Switch on the display of all contour levels of maps loaded at the top right. On by default. Auto activates when a contour level is changed.

  • Show Clipping Info: Displays graphical information about the clipping planes in relation to centre and eye-position. Off by default. Auto-activates when clipping is changed.

  • Hide Object: Takes the active object off the screen, but does not delete it. The little eye symbol changes to closed. If one changes to a hidden item as active object, then the menu item will read as 'Show object'.

  • Hide All But Active Object: If for an isolated view you want to just look at the active object, then pressing this will take all objects except it off the screen without deleting them. All their eye symbols change to the closed state.

  • Reset Display: This resets the display to a defined hard coded position, orientation and scale factor. Observe the status bar on the right.

The Options Menu

  • Centre On Last Mark: This will translate to the position of the last mark set (atom or density).

  • Centre On Active Model: This will translate the centre position of the visible volume to the centre of mass of the model that is currently the active one.

  • Centre On New Models: This is a toggle button with a little indicator field. When clicked, it changes the behaviour of the viewer in that it will activate/deactivate the automatic centring on every newly loaded model. The default of this is 'activated'.

  • Fit Active Model To Screen: This attempts to set the scale factor and the centre position such that the active model is completely visible in x and y direction. It also adjusts the orientation to align the model such that its longest principal axes are in the xy-plane.

  • Mark Atoms: This is a toggle button that activates/deactivates the single click marking/labelling of atoms with the left mouse button.

  • Clear Atom Marks: All atom marks are deleted.

  • Clear Atom Distances: All distance lines between marks (atoms and density) are deleted.

  • Mark Density Point: This is a toggle button that activates/deactivates the single click marking of density regions with the left mouse button.

  • Clear Marked Density Points: All marks on density are deleted.

  • Save Screenshot (graphics): This will read out the screen pixel buffer and create a bitmap. A file browser pops up that lets you choose or type a file name to use for the new image file.

  • Save Screenshot (all): As above but will also include all elements of the menu, status bar and object related buttons.

Note: Screenshots will only produce files in uncompressed bmp-format.

The Help Menu

Help Screen: When clicking on this item, a text view window pops up that contains this help text.

The quick actions

When the right mouse button is pressed with no movement, then a green button box is displayed that contains functionalities to be applied 'ad-hoc' and with no input dialog.

  • Goto Atom: This button launches the 'goto-atom' dialog as 'shift-g' does.

  • Real Space Refine Ligand: The ligand to be refined is a detached molecule and there is one density map on display. The ligand gets refined to that density map locally and the initial ligand position must be in the radius of convergence. The output will replace the detached model.

  • Find Ligand Binding Site: The ligand to be located is a detached molecule and there is one density map on display. Furthermore all other models displayed are taken as occupants of space and the binding site can not intersect with them. In return a dummy atom model of the located density blob is shown.

  • Fit Ligand Here: The ligand to be fit is the detached model, there is at least one density map on display that has one of its blobs marked. The output will replace the detached model.

  • Build Helices: At least one density map must be on display (or activated). Helices are built and side chains are modelled up to C-gamma.

Dialog windows

The goto-atom dialog expects that atoms are specified as e.g. CA/123/A for the CA atom of residue 123 in chain A. Just specifying CA/123 means the first occurrence of CA in residue 123. Specifying /123/ means the first atom in residue 123. Typing //Z will be interpreted as the first atom of chain Z. The program will centre on the atom if found. In case the atom cannot be found, the dialog gets coloured in pink.


Chapter 4. Additional Remarks

Quality of the X-ray Data

The X-ray data should be as complete as possible, especially in the low resolution range (5 Å and lower). Ideally the X-ray data should have no low resolution cutoff. If the low resolution strong data are systematically incomplete (e.g. missing or overloaded reflections), the density map, even in the case of a good model, may be discontinuous and inconsistent with the model. Because ARP/wARP involves updating on the basis of density maps, such discontinuity can lead partially to slow convergence or even non-interpretable maps.

ARP/wARP automatically checks the fit of your data to the expected Wilson plot and will report if necessary. If suggested to cut the data from the high resolution side -follow the suggestion. If suggested to cut the data from the low resolution side -do so but do not cut to a resolution below 8or 10 Å. If suggested to ignore all data or there are still other complaints after the cut -go and recollect/reprocess your data. The current version of the ARP/wARP Wilson plot check might be too stringent. Nevertheless the user is advised to visually inspect the Wilson plot and apply his/her critical judgment whether or not the data should be cut. It has sometimes proved beneficial to cut the data which were flagged as poor, though in some cases the presence of these data were crucial for the model building.


Chapter 5. References

The most recent overview of the ARP/wARP can be found in:

• Langer, G., Cohen, S.X., Lamzin, V.S. & Perrakis, A. (2008) Automated macromolecular model building for X-ray crystallography using ARP/wARP version 7. Nature Protocols. 3, 1171-1179.

Applications are presented in:

  • Mooij, W.T., Cohen, S.X., Joosten, K., Murshudov, G.N. & Perrakis, A. (2009) Conditional Restraints: Restraining the free atoms in ARP/wARP. Structure. 17, 183-189 (protein model building)

  • Hattne, J. & Lamzin, V.S. (2008) Patter recognition-based detection of planar objects in 3D electron density maps. Acta Cryst. D64, 834-842 (nucleotide building)

  • Joosten, K., Cohen, S.X., Emsley, P., Mooij, W., Lamzin, V.S. & Perrakis, A. (2008) A knowledge-driven approach for crystallographic protein model completion. Acta Cryst. D64, 416-424 (protein model building, loops)

  • Cohen, S.X., Jelloul M.B., Long, F., Vagin, A., Knipscheer, P., Lebbink, J., Sixma, T.K., Lamzin, V.S., Murshudov, G.N. & Perrakis, A. (2008) ARP/wARP and molecular replacement: the next generation, Acta Cryst. D64, 49-60 (protein model building)

  • Evrard, G.X., Langer, G.G., Perrakis, A. & Lamzin, V.S. (2007) Assessment of automatic ligand building in ARP/wARP. Acta Cryst. D63, 108-117. (ligand building)

  • Zwart, P.H., Langer, G.G. & Lamzin, V.S. (2004) Modelling bound ligands in protein crystal structures. Acta Cryst. D60, 2230-2239. (ligand building)

  • Cohen, S.X., Morris, R.J., Fernandez, F.J., Ben Jelloul, M., Kakaris, M., Parthasarathy, V., Lamzin, V.S., Kleywegt, G.J. & Perrakis, A. (2004) Towards complete validated models in the next generation of ARP/wARP. Acta Cryst. D60, 2222-2229. (side chains)

  • Morris, R.J., Zwart, P.H., Cohen, S., Fernandez, F.J., Kakaris, M., Kirillova, O., Vonrhein, C., Perrakis, A. & Lamzin, V.S. (2004) Breaking good resolutions with ARP/wARP. J. Synchr. Rad. 11, 56-59. (helices and strands, protein model building)

  • Morris, R.J., Perrakis, A. & Lamzin, V.S. (2003) ARP/wARP and automatic interpretation of protein electron density maps.In Meth. Enz. (Carter, C. & Sweet, B. eds.) 374, 229-244. (protein model building)

  • Morris, R.J., Perrakis, A. & Lamzin, V.S. (2002) ARP/wARP's model-building algorithms. I. The main chain. Acta Crystallogr. D58, 968-975. (protein model building)

  • Perrakis, A., Harkiolaki, M., Wilson, K.S. and Lamzin, V.S. (2001) ARP/wARP and molecular replacement. Acta Cryst. D57, 1445-1450. (protein model building)

  • Lamzin, V.S., Perrakis, A. & Wilson, K.S. (2001) The ARP/WARP suite for automated construction and refinement of protein models. In Int. Tables for Crystallography. Vol. F: Crystallography of biological macromolecules (Rossmann, M.G. & Arnold, E. eds.), Dordrecht, Kluwer Academic Publishers, The Netherlands, pp. 720-722. (solvent)

  • Perrakis, A., Morris, R. and Lamzin, V.S. (1999). Automated protein model building combined with iterative structure refinement. Nature Struct. Biol. 6, 458-463. (protein model building)

  • Perrakis, A., Sixma, T.K., Wilson, K.S. and Lamzin, V.S. (1997) wARP: improvement and extension of crystallographic phases by weighted averaging of multiple refined dummy atomic models. Acta Cryst. D53, 448-455. (protein model building)

  • Lamzin, V.S. and Wilson, K.S. (1993) Automated refinement of protein models. Acta Cryst. D49, 129-149. (model update and solvent)

For other publications please refer to the references therein or to the ARP/wARP web page.


Chapter 6. Author-Abuse Information and Acknowledgements

The authors to abuse of...

The Hamburg team (European Molecular Biology Laboratory (EMBL) Hamburg Outstation, c/o DESY, Notkestrasse 85, 22603 Hamburg, Germany):

Victor S. Lamzin (tel +49-40-89902-121, fax +49-40-89902-149, email: victor@embl-hamburg.de)

Ciaran Carolan
Helene Dörksen
Philipp Heuser
Gerrit G. Langer
Tim Wiegels

The Amsterdam team (Molecular Carcinogenesis Programme, Netherlands Cancer Institute, Plesmanlaan 121, 1066 CX Amsterdam, The Netherlands):

Anastassis Perrakis (tel. +31-20-512-1951, fax +31-20-512-1954, email: a.perrakis@nki.nl)

Krista Joosten
Robbie Joosten

Former members

Serge X. Cohen
Guillaume X. Evrard
Francisco Fernandez
Johan Hattne
Marouane Ben Jelloul
Matheos Kakaris
Olga V. Kirillova
Wijnand Mooij
Richard J. Morris
Parthasarathy Venkataraman
Tilo Strutz
Petrus H. Zwart

The authors are especially grateful to:

  • Keith S Wilson (York, UK) one of the originators of the software.

  • Zbyszek Dauter (Argonne, USA) for significant contributions at earlier stages the software development.

  • The REFMAC developers team lead by Garib Murshudov (York, UK).

  • The CCP4 developers, particularly to Liz Potterton, Kevin Cowtan and Eleanor Dodson (York, UK) and Peter Briggs (Daresbury, UK)

  • Many of our collaborators and active users –a comprehensive list is very long !

  • We would also like to take this opportunity to thank for the support of ARP/wARP: The NIH and the EU commission for research and infrastructure grants; the EMBL and the NKI, for hosting the research groups; our industrial users, for generating a license income which strengthens our ability to keep to our commitment for free distribution to the academic community.


Appendix A. Example output from install.sh on a Mac

ARP/wARP installer is checking your c-shell... 

c-shell is installed on your machine at /bin/csh

Your login shell is: /bin/tcsh 

Checking permissions for /dev/null -OK Checking availability of sed command -OK Checking availability of tail command -OK Checking availability of awk command -OK Checking decimal separator -OK Checking ARP/wARP directory path -OK Checking ARP/wARP directory structure -OK Checking java installation - installed version is “1.5.0_19” Checking java version number -OK Checking python installation - installed version is 2.5.1 Checking python version number -OK Python executables are available in /Users/testuser/arp_warp_7.1/byte-code/python-2.5 Checking operating system name -Darwin Checking processor type -i386 ARP/wARP version 7.1 executables for this platform are available in /Users/testuser/arp_warp_7.1/bin/bin-i386-Darwin

Installing script and data files for: bin-athlon-Linux bin-i386-Darwin bin-i686-Linux bin-ia64-Linux bin-powerpc-Darwin bin-x86_64-Linux

Checking CCP4 & ARP/wARP installation -OK Checking refmac5 installation - installed version is 5.5.0101 Checking refmac5 version number -OK

Testing the possibility of remote job submission - please wait...

curl seems to be able to communicate with the cluster at EMBL-Hamburg The remote job submission of ARP/wARP 7.0.1 from CCP4i should work

=========================================================================== Remember that you have accepted the terms of the license agreement when you downloaded ARP/wARP 7.1. Proceeding with installation reinforces your acceptance of the license agreement A main obligation of this agreement is that any reference to the software for crystallographic computations will cite one or more ARP/wARP publications as set forth in the manual and on http://www.arp-warp.org ===========================================================================

*** IMPORTANT FOR SETUP *** Do not forget to add this line in your .cshrc file: source /Users/testuser/arp_warp_7.1/arpwarp_setup.csh

*** INSTALLATION OF ARP/wARP 7.1 HAS BEEN SUCCESSFUL ***

An attempt will be made to install the graphical user interface automatically. Checking availability of tclsh command -OK The $CCP4I_TOP dir has an ARP/wARP GUI, which is owned and is writeable by you (the current user) (version is 7.0 or higher). Uninstalling previous GUI Installing GUI for ARP/wARP 7.1 CCP4 interface version detected is : 2.0.4 Using tarball ARP_wARP_CCP4I6.tar.gz (for ccp4 version 6 and newer).

*** INSTALLATION OF ARP/wARP 7.1 GUI HAS BEEN SUCCESSFUL ***