0
EMBL Hamburg Biological
Small Angle Scattering
BioSAXS
SASBDB

EFAMIX manual

efamix

Written by P.V. Konarev, V.V. Volkov and D.I. Svergun.
Post all your questions about EFAMIX to the ATSAS Forum.

© ATSAS Team, 2005-2021

Table of Contents

  • Examples
  • Manual

    Introduction

    EFAMIX program represents the algorithm for restoring the scattering profiles of individual components of protein mixtures using the evolving factor analysis (EFA).

    The scattering profiles of individual components and the corresponding concentration (volume fraction) profiles are restored. The method uses the singular value decomposition (SVD) [1] of multiple data set. The fundamental idea of EFA [2] is to follow the change or evolution of the rank of the data matrix as a function of the ordered variable, which is done by SVD on an increasing data matrix.

    In order to correctly associate the appearance of a given component with its disappearance, one assumes that the first substance present in the system will be the first to disappear, the second component will disappear next, and so on. Thus, the region of existence of a component, called concentration window, is generated for the i-th compound from the point where the rank rises to i in the forward EFA calculation to the point where the rank rises to N - i + 1 in the backward calculation [2].

    Outside this concentration window a component is not present; its concentration is therefore is equal to zero. The typical examples of such kind of data are the SEC-SAXS data sets. At the next step the algorithm calculates the rotation matrix R from which the concentration matrix is restored and afterwards the individual scattering profiles.

    EFAMIX can be run as a standalone application or can be called from the program CHROMIXS via menu 'File->Configure EFAMIX'.

    If you use results from EFAMIX in your publication, please cite:

    P.V. Konarev, M.A. Graewert, C.M. Jeffries, M. Fukuda, T.A. Cheremnykh, V.V. Volkov and D.I. Svergun. (2022) EFAMIX, a tool to decompose inline chromatography SAXS data from partially overlapping components. Protein Science 31, 269-282.

    Running EFAMIX

    Usage:

    $ efamix [OPTIONS] []

    OPTIONS known by EFAMIX are described in this section.

    In general, command-line options can be used to make choices about the parameters of the algorithm, while the interactive configuration is used to govern the data processing.

    If no OPTIONS is given, the configuration is done in full interactive mode.

    Command-Line Options

    EFAMIX recognizes the following command-line options. Mandatory arguments to long options are mandatory for short options too.

    Short OptionLong OptionDescription
    -m --mode=<MODE> Mode of data processing, AUTOMATIC (all calculations will be made automatically using default values of parameters), or INTERACTIVE. Default is 'INTERACTIVE'.See example.
    -n --Ncomp=<NUMBER> Specify the number of components in the protein mixture
    --Nframes_total=<NUMBER> Specify the total number of time frames (number of data curves) in the SEC-SAXS data set.
    --Nframe_start=<NUMBER> Specify the starting frame number to include.
    --Nframe_end=<NUMBER> Specify the final frame number to include.
    --Nbufframe_start=<NUMBER> Specify the starting buffer frame number to average.
    --Nbufframe_end=<NUMBER> Specify the final buffer frame number to average.
    --Nbeg=<NUMBER> Specify the first data point in SAXS curve to be processed.
    --Nend=<NUMBER> Specify the last data point in SAXS curve to be processed.
    -p --prefix=<NAME> Specify the prefix name to prepand to any output filenames (default: 'efamix').
    -d --dir-name=<NAME> Specify the path to the directory with the SAXS data set (default: '.' , i.e. current direcotry).
    -d --data=<NAME> Specify the root names of the experimental data files (default: 'img_0002').
    -p --profile=<NAME> Specify the file name of the elution profile obtained by CHROMIXS (default: 'chromixs.dat') for interactive selection of concentration windows for the components using sasplot.
    -c --conc-file=<NAME> Specify the file name containing information about the concentration windows of the components. (default: ' ' ).
    -s --sasplot=<TRUE/FALSE> Specify the call of sasplot (windows version) for interactive selection of concentration windows. (default: FALSE)
    -s --show-progress=<TRUE/FALSE> Specify the appearance of the run progress information. (default: FALSE)
    -e --error-weighting=<TRUE/FALSE> Specify if the error weighting of input SEC-SAXS data is applied during the data decomposition. (default: TRUE)
    -w --write-fit=<TRUE/FALSE> Specify the writing of fit files to the output. (default: FALSE)
    -b --brief=<TRUE/FALSE> Writing of Components/Concentrations profiles in separate files for each component/concentration (--brief=FALSE), or in common files, where all components are stored (--brief=TRUE). (default: FALSE)
    -v --version Print version information and exit.
    -h --help Print a summary of arguments and options and exit.

    Interactive Configuration

    If the options are omitted, settings available through command-line options may also be configured interactively as shown in the table below. Otherwise these questions are skipped.

    Screen TextDefaultDescription
    Enter number of components in the system? 2 Number of components in the system. Default value is set to 2. It can be set between 2 and 4.
    Enter number of frames for SAXS data set? 3000 Total number of time frames (number of data curves) in the SEC-SAXS data set. Default value is set to 3000.
    Enter number of sample frame start? 1 The starting frame number to include. Default value is set to 1.
    Enter number of sample frame end? 3000 The final frame number to include. Default value is set to 3000.
    Enter starting buffer frame number to average? 1 The starting buffer frame number to average. Default value is set to 1.
    Enter final buffer frame number to average? 200 The starting buffer frame number to average. Default value is set to 200.
    Enter number of SAXS data point - Nbeg? 1 The first data point number in SAXS curve to be processed. Default value is set to 1.
    Enter number of SAXS data point - Nend? 3000 The last data point number in SAXS curve to be processed. Default value is set to 3000.
    Do you want to select windowa via sasplot [ Y / N } ? No The option to use sasplot (windows version) in interactive mode to select concentration windows of the components. Default value is set to 'No'.
    Enter conc window start for comp1 [VALUE] ? ---- The start frame number of the concentration winfow for component1. Estimated automatically using the EFA analysis. It can be set interactively via sasplot or defined using 'conc-file' command-line option in the specified concentration window file.
    Enter conc window end for comp1 [VALUE] ? ---- The final frame number of the concentration winfow for component1. Estimated automatically using the EFA analysis. It can be set interactively via sasplot or defined using 'conc-file' command-line option in the specified concentration window file.
    Enter conc window start for comp2 [VALUE] ? ---- The start frame number of the concentration winfow for component2. Estimated automatically using the EFA analysis. It can be set interactively via sasplot or defined using 'conc-file' command-line option in the specified concentration window file.
    Enter conc window end for comp2 [VALUE] ? ---- The final frame number of the concentration winfow for component2. Estimated automatically using the EFA analysis. It can be set interactively via sasplot or defined using 'conc-file' command-line option in the specified concentration window file.

    EFAMIX Input Files

    EFAMIX uses the experimental files of SEC-SAXS data set in ASCII format as input files. The program should get the information about the path location of the files (option '--dir-name') and the image root names of the files (option '--data'). The files should have the following numeration: 'root-name_NNNNN.dat', where 'root-name' is defined by option '--data' and NNNNN are the frame numbers in ascending order from '00001' to the total number of frames.

    The information about the concentration windows of the components optionally may be provided in a configuration file with the following format (if it is not specified, it will be estimated automatically by EFA analysis):

       Comp1
       140 280
       Comp2
       220 380
    

    EFAMIX Output Files

    With each succesful run, EFAMIX creates a set of output files in different subfolders ("Component_and_Concentration_profiles", "Individual_frames_subtracted", "Restored_individual_frames_subtracted", "Singular_value_EFA_plots"), each filename starts with a customizable prefix that gets an extension appended. If a prefix has been used before, existing files will be overwritten without further note.

    Subfolder: "Component_and_Concentration_profiles"
    ExtensionDescription
    prefix_component_profiles.dat The scattering profiles of the components restored by EFAMIX. The first column is the S-axis. The second column is the scattering intensity from component1, the third column - component 2, etc. If the option '--brief' is disabled, the information is saved in separate files for each components. Subfolder: "Component_and_Concentration_profiles"
    prefix_concentration_profiles.dat The concentration profiles of the components restored by EFAMIX. The first column is the frame numbering (starting from Nframe_start). The second column is the concentration profile of component1, the third column - component 2, etc. The last column contain the total concentration profile from all components. If the option '--brief' is disabled, the information is saved in separate files for each components. Subfolder: "Component_and_Concentration_profiles"
    prefix.log Contains the information about the input parameters and the estimated concentration window numbers for the components. It is updated during execution of the program.
    data_NNNNN_sub.dat The experimental data with the subtracted buffer signal. The file numbering starts from Nframe_start and ends with Nframe_end. They are created only if the option 'write-fit' is enabled. Subfolder: "Individual_frames_subtracted"
    data_NNNNN_sub_restored.dat The restored by EFAMIX data (fit files). The file numbering starts Nframe_start and ends with Nframe_end. They are created only if the option 'write-fit' is enabled. Subfolder: "Restored_individual_frames_subtracted"
    prefix_forwards*.dat The information is written to the following files: prefix-'_diag_forwards_N.dat' and prefix-'_grad_forwards_N.dat' (where N is the number of components, it can be set between 2 and 4). It contains information about evolving singular values and their first derivatives (gradients) obtained in forward direction. The files are created only if the option 'write-fit' is enabled. Subfolder: "Singular_value_EFA_plots"
    prefix_backwards*.dat The information is written to the following files: prefix-'_diag_backwards_N.dat' and prefix-'_grad_backwards_N.dat' (where N is the number of components, can be set between 2 and 4). It contains information about evolving singular values and their first derivatives (gradients) obtained in backwards direction. The files are created only if the option 'write-fit' is enabled. Subfolder: "Singular_value_EFA_plots"
    prefix_conc_window*.dat The information is written to the following files: prefix-'_conc_window_N.dat' (where N is the number of components, it can be set between 2 and 4). It contains information about the sizes of concentration windows of the components estimated by EFA or defined in interactive mode. The files are created only if the option 'write-fit' is enabled. Subfolder: "Singular_value_EFA_plots"

    Examples

    Please note that the prefixes in the examples may be chosen arbitrarily. The values below are chosen for maximum clarity only.

    SEC-SAXS data set

    Use EFAMIX in AUTOMATIC-mode to restore the scatterinf profiles and concentrations of the components in a two-component protein mixture. The concentration windows for the components will be estimated automatically from the EFA analysis:

    $ efamix --dir-name=C:\My_SEC-SAXS_data\ --data=img_0002 --Ncomp=2 --mode=automatic --Nframes_total=3000 --Nframe_start=1100 --Nframe_end=1600 --Nbufframe_start=1 --Nbufframe_end=500 --Nbeg=21 --Nend=1400 --prefix=aldolase

    Time-resolved SAXS data set

    Use EFAMIX for processing time-resolved SAXS data set for the cases where the components appear one after another and disappear in the same order. Prior to run, SVD analysis of SAXS data set should be performed to ensure that the number of independent components of the system lies between 2 and 4. The concentration windows will be read from the file 'conc_window.con':

    $ efamix --dir-name=C:\My_Time-resolved-SAXS_data\ --data=prion_0005 --Ncomp=3 --mode=automatic --Nframes_total=2000 --Nframe_start=800 --Nframe_end=1800 --Nbufframe_start=1 --Nbufframe_end=300 --Nbeg=1 --Nend=2500 --conc-file=conc_window.con --prefix=prion-dyn

    IEC-SAXS data set

    For best results, run EFAMIX in INTERACTIVE mode, customizing the input parameters as required. With the following command all fit files will be written to the output, the progress of the program run will be shown on the screen and the program sasplot (WINDOWS version only) will be called for the interactive selection of the concentration windows from the elution profile of IEC-SAXS data set (file 'chromixs.dat'):

    $ efamix --dir-name=D:\My-IEX_SAXS-data\ --data=run_0008 --mode=interactive --prefix=fab_fc --sasplot --profile=chromixs.dat --write-fit --show-progress --brief

    EFAMIX is also integrated into CHROMIXS GUI and can be called from CHROMIXS via menu 'File->Configure EFAMIX'.

    References:

    [1] Golub G. H. & Reinsh C. Singular Value Decomposition and Least Squares Solutions // Numer. Math. (1970), V. 14, p.403-420. https://doi.org/10.1007/BF02163027

    [2] Keller H.R., Massart D.L. Evolving factor analysis // Chemometrics and Intelligent Laboratory Systems, (1991), V. 12, Issue 3, p. 209-224. https://doi.org/10.1016/0169-7439(92)80002-L


      Last modified: April 26, 2021

    © BioSAXS group 2021