Tuesday 20 August 2013

Command-line Options

The behaviour of all of the programs is subject to modification via the setting of command-line options. Some of these are generic and apply to most/all SLiMSuite programs - see the rje.py documentation for these, or the section below - whereas others are program specific.

Setting commandline options

Commandline options have two parts: the argument and the value. These can be fed to programs in one of two formats:

argument=value
-argument value

These two lines have equivalent functions. The two styles can be mixed within a program call, e.g.

python program.py arg1=val1 -arg2 val2

Options can also be supplied within *.ini files (see below).

Option Types

There are essentially three types of command-line option:

  1. Those that require a value (numerical or text), option=X. Those that require a filename as the value will be witten: option=FILE. Those that require a directory path as the value will be witten: option=PATH. Those that lead to an accessory application (rather than just its path) may also be listed as option=COMMAND. Paths and filenames should always use forward slash (/) separators, whatever the operating system.
  2. True/False (On/Off) options, option=T/F. For these options:
    • option=F and option=False are the same and turn the option off.
    • option (or -option), option=T and option=True are the same and turn the option on.
  3. List options. These are like the value options but have multiple values, separated by commas: option=X,Y. Where .. is used, the number elements is optional, e.g. option=X,Y,..,Z could take option=X or option=A,B,C,D. Where option=LIST is used, the number of elements is optional and LIST could actually be the name of a file containing the list of elements.

Long option values, whitespace and special characters

Some characters, such as whitespace, commas, pipes (“|”) and ampersands, will be interpreted by UNIX in particular ways from the commandline. If you have such characters within the option value, then either place the settings in an INI file (see below) or enclose the option value in quotes. If the value contains whitespace, double quotes will be needed even within an INI file, as whitespace is used to delimit commandline options, e.g.

python program.py option="Two words" limits="2,3"

NB. For PATH variables, directories should be separated by a forward slash (/). If paths contain spaces, they must be enclosed in double quotes:

path="example path".

It is recommended that paths do not contain spaces as function cannot be guaranteed if they do.

INI Files

As well as feeding commands in on the command-line, any options listed can also be save in a plain text file and called using the option ini=FILE. The precedence of loading default run settings from ini files is slightly complex but (hopefully) makes sense once it is clear that there is two kinds of precedence being invoked:

  1. For each ini file there is a directory precedence determining where to look for that file. Once the file is found, commands from that file will be read in and the program will stop looking for other versions of the file. Each ini file is looked for:
    • in the current directory from which the run command is being executed
    • the directory containing the program being run. (Under usual circumstances, it is not recommended to put ini files in these directories, using instead:
    • the settings/ directory of the distribution. This is the recommended location for default ini files and universal default values for all runs should be put here.
  2. For each ini file that is read in, each command has a setting precedence as described below, such that later values will over-rule earlier values for the same argument. Default ini files (if present) are read in the following order:
    • Global defaults are read from a defaults.ini file. (This is recommended.)
    • System defaults are read from an rje.ini file. (This file is not recommended and is largely for development reasons.)
    • Program defaults are read from the file named after the program (e.g. haqesac.ini for HAQESAC). (This will be the same root filename as the default *.log file if you are not sure.)

For example, if you are running haqesac.py in a directory containing haqesac.ini, the full list of commandline arguments will be any in PATH/settings/defaults.ini (if it exists) plus any in PATH/settings/rje.ini (if it exists) plus the contents of ./haqesac.ini plus the options given on the commandline. If, on the other hand, there is no ./haqesac.ini file, options will instead be read from PATH/settings/haqesac.ini (if it exists). (The PATH/ is determined using the path given to the haqesac.py.) If any of these files have been placed in tools/ instead (not recommended), these will be used in place of those from settings/.

It is recommended that a defaults.ini file is made and placed in the settings/ directory. This file should contain the paths to the External Programs used by RJE programs:

blastpath=PATH
blast+path=PATH
fastapath=PATH
clustalw=COMMAND
muscle=COMMAND

Note that the first three are just paths to the programs, while for ClustalW and MUSCLE the actual program commands themselves must be included. This is to make it easier to replace these programs with alternatives.

If running in windows, it is also advisable to add the win32=T command to the defaults.ini file.

INI File formatting

INI files are simple plain text files. Several commands can be put on a single line, although it is generally clearer to stick to one command per line. Any text on a line following a hash (#) will be treated as a comment and ignored unless it is part of an option value in double quotes. This allows INI files to be documented.

Option Precedence

Later options will supersede earlier ones if they are mutually exclusive. Options from an INI file will be inserted into the list at the point the ini=FILE command is called. (Default *.ini files are read in the order listed above, i.e. options from the defaults.ini file are read first, followed by the program.ini file.) This means that ini file options can be over-ruled, e.g. program.py ini=eg.ini i=1 will supersede any interactivity setting in eg.ini with i=1, whereas program.py i=1 ini=eg.ini will use any interactivity setting in eg.ini and over-rule i=1.

Interactivity and Verbosity settings

By default, the programs are generally setup to run through to completion without any user-interaction if given all the options it needs. For more interaction with the program as it runs, use the argument i=1.

python xxx.py commandlist i=1

Both the level of interactivity and the amount printed to screen can be altered, using the interactivity i=X and verbosity v=X command-line options, respectively, where X is the level from none (-1) to lots (2+). Although in theory i=-1 and v=-1 will ask for nothing and show nothing, there is a chance that some print statements will have escaped in these early versions of the program. There is also the possibility that accessory programs may print things to the screen beyond the control of the calling program. Please report any that you spot!

Please report any irritations and suggestions for changes to what is printed at different verbosity levels.

General Command-line Options

Along with the some of the options listed above, there are a number of core options that are used in many or all of the SLiMSuite programs. Defaults are given in square brackets.

NOTE: Default settings might vary between programs. To set global defaults, it is recommended to put these options in the defaults.ini file.

Help and Program Logs

help            : Prints help documentation to screen.
v=X             : Sets screen verbosity (-1 for silent) [0]
i=X             : Sets interactivity (-1 for full auto) [0]
silent=T/F      : If set to True will not write to screen or log. [False]
log=FILE        : Redirect log to FILE [program.log]
newlog=T/F      : Create new log file. [False]
errorlog=FILE   : If given, will write errors to an additional error file. [None]

General Input/Output Options

outfile=FILE    : This will set the 'root' filename for (non-log) output files in most programs (FILE.*) [None]
basefile=FILE   : Equivalent of log=FILE outfile=FILE. [None]
force=T/F       : Force to regenerate data rather than keep old results. [False]
append=T/F      : Append to results files rather than overwrite. [False]
backups=T/F     : If True, option given to backup certain files if append=F. [True]
delimit=X       : Sets standard delimiter for results output files. [varies]
mysql=T/F       : “MySQL output” with lowercase headers that lack spacers. (Not all programs) [False]

System settings

win32=T/F       : Run in Win32 Mode for Windows operation. [False]
memsaver=T/F    : Run in “Memory Saver” mode. Varies with program. [False]
runpath=PATH    : Run program as if in given path (log files and some programs only) [PATH called from]
rpath=COMMAND   : Path to installation of R. ['R']
maxbin=X        : Maximum number of trials for using binomial (else use Poisson) [∞]

Forking Options

forks=X         : Number of forks. (Some programs only.) [0]
killforks=X     : Number of seconds of inactivity before killing forks. [3600]
noforks=T/F     : Over-ride and cancel forking if True. [False]

This information is also available by printing the __doc__ attribute of the rje.py module at a Python prompt (print rje.__doc__), or using the help option: python rje.py help. Please contact me if you want any further details of a specific option and/or advice as to when (not) to use it.

No comments:

Post a Comment