Changes

Trisha Sheth · 762c56d8
--- a/Cluster-Guide.md
+++ b/Cluster-Guide.md
@@ -283,26 +283,30 @@ active_stage = cluster_chain_stage (ctrl_chain)
 Returns the active stage for each chain in ctrl_chain.
-cluster_cleanup
+# cluster_cleanup
+```
 ctrl = cluster_cleanup(...)
+```
 Cleans up (deletes temporary files and jobs) a batch, list of batches, or a batch chain.
-cluster_compile
+# cluster_compile
+```
 cluster_compile(...)
+```
 Compiles the cluster job function cluster_job.m.
-cluster_error_mask
+# cluster_error_mask
+```
 cluster_error_mask(error_mask)
+```
 If error_mask is passed in, interprets the error mask and prints out information about each error that occurred. If no input arguments, this function sets all the error mask variables in the caller's workspace.
-cluster_exec_task
+# cluster_exec_task
+```
 cluster_exec_task(ctrl,task_ids,run_mode)
+```
 Executes task(s) in the current matlab session. This is useful for debugging or sometimes to avoid long queue wait times if only a few jobs have failed and need to rerun. For example, to run task TASK_ID from batch BATCH_ID:
 cluster_exec_task(BATCH_ID, TASK_ID)
@@ -319,62 +323,75 @@ Gets a batch and populates a ctrl structure for that batch.
 Update_mode should be set to 0 if this is not the process running the batch because it can cause errors in the status files if two processes are accessing them.
-cluster_get_batch_list
+# cluster_get_batch_list
+```
 ctrls = cluster_get_batch_list(param)
+```
 Get a list of all batches. Leave param undefined to use gRadar.cluster.data_location to search for batches.
-cluster_get_chain_list
+# cluster_get_chain_list
+```
 cluster_get_chain_list (param)
+``` 
 Prints a list of all chains. Leave param undefined to use gRadar.cluster.data_location to search for chains.
-cluster_hold
+# cluster_hold
+```
 ctrls = cluster_hold(batch_id,hold_state)
+```
 Sends a hold command to cluster_run and causes it to enter debug mode.
+# cluster_job
+```
 cluster_job
-cluster_job
+```
 Matlab function that runs the job on the cluster. This is the function which is compiled and calls the desired task functions.
+# cluster_job.sh
+```
 cluster_job.sh
-cluster_job.sh
+```
 Bash script which is used by torque and slurm cluster types. This is the job which is called by torque or slurm. This bash script calls cluster_job.m (which when compiled creates run_cluster_job.sh).
-cluster_load_chain
+# cluster_load_chain
+```
 ctrl = cluster_load_chain([],chain_id)
+```
 Loads a batch chain from a chain file. chain_id is a unique positive integer.
-cluster_new_batch
+#cluster_new_batch
-cluster_new_batch
+```
+cluster_new_batch
+```
 Creates a new batch.
 This creates a new batch directory and support files in the batch directory. The default parameters are loaded from gRadar.cluster.
-cluster_new_task
+# cluster_new_task
-ctrl = cluster_new_task(ctrl,sparam,dparam,varargin)
+```
+ctrl = cluster_new_task(ctrl,sparam,dparam,varargin)
+```
 Create a new task in the batch specified by ctrl. The dparam_save option can be set to zero to speed up task creation if many tasks will be created. If this option is used, after all the tasks are created, the cluster_save_dparam function needs to be called or the dynamic parameters will not be saved to disk (the tasks will therefore not have access to the dynamic parameters when they run).
 Input arguments:
-Name	Description
+|Name	|Description|
-ctrl	batch structure array (may not be a batch ID)
+|----|-------------------|
-sparam	static param structure (only has an effect for the first task that is created for the batch)
+|ctrl|	batch structure array (may not be a batch ID)|
-dparam	dynamic param structure that will be merged with the sparam structure when the task is run
+|sparam	|static param structure (only has an effect for the first task that is created for the batch)|
-varargin	name-value pairs (e.g. 'dparam_save',0 could be passed in as the 4th and 5th arguments to turn on dynamic parameter saving during task creation)
+|dparam|	dynamic param structure that will be merged with the sparam structure when the task is run|
+|varargin|	name-value pairs (e.g. 'dparam_save',0 could be passed in as the 4th and 5th arguments to turn on dynamic parameter saving during task creation)|
 sparam and dparam each have the same fields. Each field only needs to be set in one or the other structure. For example, one could create the task like this:
+```
 sparam.task_function = @qlook;
 sparam.num_args_out = 1;
 dparam.cpu_time = 100;
@@ -383,63 +400,88 @@ dparam.notes = 'Test';
 param = struct('qlook',struct(some_static_field,3));
 sparam.argsin{1} = param;
 dparam.argsin{1}.qlook.some_dynamic_field = 2;
+```
 Note that sparam is only written out for the very first task. Any changes to sparam in subsequent tasks will have no effect.
-Field Name	Description
+|Field Name|	Description|
-task_function	function handle of job, this function handle tells cluster_job.m what to run
+|--------|-----------------|
-argsin	cell vector of input arguments (default is {})
+|task_function|	function handle of job, this function handle tells cluster_job.m what to run|
-num_args_out	number of output arguments to expect (default is 0)
+|argsin|	cell vector of input arguments (default is {})|
-notes	optional note to print after successful submission of job default is '' (nothing is written out)
+|num_args_out|	number of output arguments to expect (default is 0)|
-cpu_time	maximum cpu time of this task in seconds
+|notes|	optional note to print after successful submission of job default is '' (nothing is written out)|
-mem	maximum memory usage of this task in bytes (default is 0)
+|cpu_time|	maximum cpu time of this task in seconds|
-file_success	cell array of files that must exist to determine the task a success. If the file is a .mat file, then it must have the file_version variable and not be marked for deletion.
+|mem	|maximum memory usage of this task in bytes (default is 0)|
-sparam and dparam will be merged when the task runs. The merging works across cell arrays too. Therefore, setting:
+|file_success|	cell array of files that must exist to determine the task a success. If the file is a .mat file, then it must have the file_version variable and not be marked for deletion.|
+sparam and dparam will be merged when the task runs. The merging works across cell arrays too. 
+Therefore, setting:
+```
 sparam.argsin{1}.my_task.sfield = 1;
 dparam.argsin{1}.my_task.dfield= 2;
+```
 will cause the first argument to the task, (argsin{1}), to be a structure array with the field "my_task" which is itself a structure array with two fields: sfield and dfield. For example, if your task is a function:
+```
 function success = my_task(param)
+```
 then inside your function you will have these fields:
+```
 param.my_task.sfield = 1;
 param.my_task.dfield = 2;
-cluster_print
+```
+# cluster_print
+```
 cluster_print(ctrl,ids,print_mode,ids_type)
+```
 Prints or gets information about a particular task or set of tasks in a batch specified by ctrl. The type of ID defaults to a task ID (ids_type equal to 0). If ids_type is set to 1, then it looks up by the torque job ID.
-cluster_print(ctrl,task_id,1)
+**cluster_print(ctrl,task_id,1)**
 Prints full information about one task.
-[in,out] = cluster_print(ctrl,task_id,0)
+**[in,out] = cluster_print(ctrl,task_id,0)**
 Returns input and output information for one task.
-cluster_print(ctrl,task_ids,2)
+**cluster_print(ctrl,task_ids,2)**
 Print tables and returns a struct with these fields for each task ID specified:
-field	description
-task_id	The task ID is the index into the task. It can be from 1 to N where N is the number of tasks in the batch.
-job_id	The ID of the job. This is the ID used by the cluster interface.
-job_status	Job status
-error_flag	Job error state
-cpu_time_req	CPU time requested in minutes
-cpu_time	CPU time currently used in minutes
-memory_req	Memory requested in MB
-memory	Memory currently being used in MB
-schedule	Scheduled time to run as date string.
-cluster_print_chain
-cluster_print_chain(ctrl_chain)
-Prints and gets job state, cpu, and memory information about all jobs in a control chain. cluster_print_chain uses the saved chain information unless a chain cell array is passed in. This occasionally causes erroneous reporting. For example, if the chain is saved in the debug state, but is actually running on the cluster, the running and queued task totals will be zero and these jobs will all be marked as pending.
+|field	|description|
+|-------|---------------|
+|task_id|	The task ID is the index into the task. It can be from 1 to N where N is the number of tasks in the batch.|
+|job_id|	The ID of the job. This is the ID used by the cluster interface.|
+|job_status|	Job status|
+|error_flag|	Job error state|
+|cpu_time_req|	CPU time requested in minutes|
+|cpu_time|	CPU time currently used in minutes|
+|memory_req|	Memory requested in MB|
+|memory|	Memory currently being used in MB|
+|schedule|	Scheduled time to run as date string.|
+# cluster_print_chain
+```
+cluster_print_chain(ctrl_chain)
+```
+Prints and gets job state, cpu, and memory information about all jobs in a control chain. cluster_print_chain uses the saved chain information unless a chain cell array is passed in. This occasionally causes erroneous reporting. For example, if the chain is saved in the debug state, but is actually running on the cluster, the running and queued task totals will be zero and these jobs will all be marked as pending.
+```
 Number of tasks: 771, 198/315/255/3 C/R/Q/T, 0 error, 0 retries
+```
 The task output information stands for: C/R/Q/T = Complete/Running/Queued/Pending.
-cluster_reset
+# cluster_reset
+```
 ctrl_chain = cluster_reset(ctrl_chain)
+```
 Resets fields in a control chain so that the control chain may be run again with cluster_run. Usually this is done after some tasks failed in the control chain and another attempt is being made to run those tasks.
-cluster_run
+# cluster_run
+```
 ctrl_chain = cluster_run(ctrl_chain, cluster_run_mode)
+```
 Runs a list of chains (each chain is a list of batches that must be run in serial) or a batch. This function has blocking/polling and non-blocking modes. If the ctrl_chain has been run before, but the ctrl_chain cell array does not have the most recent state information in it, use cluster_run_mode 2 (non-blocking) or 3 (blocking) to tell cluster_run to first query the state of every batch before running it. This query is very slow for large batches and so it is much more efficient to make sure you keep track of the ctrl_chain with all the state information intact. If you have done this, then use cluster_run_mode 0 (non-blocking) or 1 (blocking) to skip the query step.
@@ -447,49 +489,67 @@ A typical example of where cluster_run_mode is required would be if cluster_run
 Typically, if you are just polling the status of your jobs rather than blocking until they are all done, you would run:
+```
 ctrl_chain = cluster_load_chain(CHAIN_NUMBER); % CHAIN_NUMBER should be your most recent saved version of this chain
 ctrl_chain = cluster_run(ctrl_chain,0);
 cluster_save_chain(ctrl_chain); % Note the CHAIN_NUMBER since this will be what you load the next time you poll.
+```
 Then you would repeat these commands each time you want to poll the status of the job chains.
-cluster_save_chain
+# cluster_save_chain
+```
 [chain_fn,chain_id] = cluster_save_chain(ctrl_chain)
+```
 Saves a batch chain to a chain file. chain_id is a unique positive integer. This function also prints out all the batch files and directories that are required to run the chain. This is useful if the batch files need to be generated on one computer system and then moved to another computer system to be run.
-cluster_save_dparam
+# cluster_save_dparam
+```
 ctrl = cluster_save_dparam(ctrl);
+```
 Save the dparam structure to the dynamic inputs file. This is needed with the dparam_save option is set to 0 when cluster_new_task is called.
-cluster_set_chain
+# cluster_set_chain
+```
 ctrl_chain = cluster_set_chain(ctrl_chain,varargin)
+```
 Sets a parameter in every batch in a control chain. Note that some parameters, notably cluster.type, require additional manual functions to be run if changed with cluster_set_chain. If cluster.type is set to matlab, then the "ctrl.cluster.jm = parcluster" may need to be run manually for each batch. If the cluster.type is set to torque or slurm, then cluster_compile may need to be run manually if there have been code changes since the last compile.
 For example, to change the type to debug mode:
+```
 ctrl_chain = cluster_set_chain(ctrl_chain,'cluster.type','debug')
+```
-cluster_stop
+# cluster_stop
+```
 cluster_stop(ctrl_chain,mode)
+```
 Stops/cancels/deletes jobs that are running in a cluster for the batches identified in ctrl_chain. The batches are also put on hold. To specify a batch or batches, the second argument can be set to 'batch' (default is 'chain'). Only useful for 'torque', 'slurm', and 'matlab' modes of operation.
-cluster_submit_batch
+# cluster_submit_batch
+```
 ctrl = cluster_submit_batch(fun,block,argsin,num_argsout,cpu_time,mem)
+```
 Convenient function which creates a batch, adds a task to it, runs the batch, and then cleans up the batch. This can be used as a simple example of how to use the cluster interface.
-cluster_submit_job
+# cluster_submit_job
+```
 ctrl = cluster_submit_job(ctrl,job_tasks,job_cpu,job_mem)
+```
 Submits a job to the cluster. For the debug type cluster, this function also executes the tasks in the job.
-cluster_update_batch
+# cluster_update_batch
+```
 ctrl = cluster_task_status(ctrl,force_success_check)
+```
 Updates the status of the batch
@@ -500,214 +560,236 @@ Another Matlab session is accessing the state of a batch and the cluster specifi
 If a job is completed, the error flags for the tasks executed in that job will be updated in the ctrl structure depending on the success condition and the contents of the task's out file.
-cluster_update_task
+# cluster_update_task
+```
 ctrl = cluster_update_task (ctrl,task_id)
+```
 Updates the status of the task
 A tasks "error_mask" uses a binary mask to separate different types of errors. Matlab commands bitor, bitand, or dec2bin are useful for interpreting this mask.
                                              Task error_mask
-Error bit	Error decimal	Description
+|Error bit|	Error decimal|	Description|
-b0000 0000 0000 0001	1	Output file out_TASKID does not exist
+|----------|----------------------|---------------|
-b0000 0000 0000 0010	2	Output file out_TASKID does not load properly
+|b0000 0000 0000 0001	|1|	Output file out_TASKID does not exist|
-b0000 0000 0000 0100	4	argsout variable does not exist in output file out_TASKID
+|b0000 0000 0000 0010	|2|	Output file out_TASKID does not load properly|
-b0000 0000 0000 1000	8	argsout variable has wrong length
+|b0000 0000 0000 0100|	4|	argsout variable does not exist in output file out_TASKID|
-b0000 0000 0001 0000	16	errorstruct variable does not exist in output file out_TASKID
+|b0000 0000 0000 1000	|8|	argsout variable has wrong length|
-b0000 0000 0010 0000	32	errorstruct variable contains error
+|b0000 0000 0001 0000	|16|	errorstruct variable does not exist in output file out_TASKID|
-b0000 0000 0100 0000	64	success criteria error
+|b0000 0000 0010 0000	|32|	errorstruct variable contains error|
-b0000 0000 1000 0000	128	cluster killed error (torque only)
+|b0000 0000 0100 0000	|64	|success criteria error|
-b0000 0001 0000 0000	256	wall time exceeded (torque only). This error may be fixed by adjusting the scripts which estimate the cpu time usage, or by increasing ctrl.cluster.cpu_time_mult.
+|b0000 0000 1000 0000	|128|	cluster killed error (torque only)|
-b0000 0010 0000 0000	512	Task success criteria failed to evaluate
+|b0000 0001 0000 0000	|256|	wall time exceeded (torque only). This error may be fixed by adjusting the scripts which estimate the cpu time usage, or by increasing ctrl.cluster.cpu_time_mult.|
-b0000 0100 0000 0000	1024	Output files that are used to check task success do not exist.
+|b0000 0010 0000 0000|	512|	Task success criteria failed to evaluate|
-b0000 1000 0000 0000	2048	Output files that are used to check task success are corrupt and cannot be read.
+|b0000 0100 0000 0000|	1024|	Output files that are used to check task success do not exist.|
-b0001 0000 0000 0000	4096	Maximum memory exceeded error (job used more memory than was requested). This error can be fixed by adjusting the scripts which estimate the memory usage, increasing ctrl.cluster.mem_mult, or by setting ctrl.cluster.max_mem_mode to 'auto'.
+|b0000 1000 0000 0000|	2048	|Output files that are used to check task success are corrupt and cannot be read.|
-b0010 0000 0000 0000	8192	Insufficient space for matlab compiler runtime (MCR) cache so job could not be started. The primary and secondar cache are set by environment variables MCR_CACHE_ROOT and MCR_CACHE_ROOT2.
+|b0001 0000 0000 0000	|4096	|Maximum memory exceeded error (job used more memory than was requested). This error can be fixed by adjusting the scripts which estimate the memory usage, increasing ctrl.cluster.mem_mult, or by setting ctrl.cluster.max_mem_mode to 'auto'.|
-Cluster Settings
+|b0010 0000 0000 0000|	8192|	Insufficient space for matlab compiler runtime (MCR) cache so job could not be started. The primary and secondar cache are set by environment variables MCR_CACHE_ROOT and MCR_CACHE_ROOT2.|
+# Cluster Settings
 User settings used by cluster programs:
-Cluster Setting	Default	Description
+|Cluster Setting|	Default	|Description|
-cluster.cluster_job_fn	fullfile(gRadar.path,'cluster','cluster_job.sh')	Sets the location of the cluster_job.sh bash program.
+|---------------|----------------|---------------|
-cluster.cpu_time_mult	1	This is a multiplication factor that will be applied to the supplied cpu time required for a task.
+|cluster.cluster_job_fn|	fullfile(gRadar.path,'cluster','cluster_job.sh')|	Sets the location of the cluster_job.sh bash program.|
-cluster.data_location	gRadar.cluster.data_location	Sets the location of the batch directories.
+|cluster.cpu_time_mult|	1|	This is a multiplication factor that will be applied to the supplied cpu time required for a task.|
-cluster.dbstop_if_error	true	Turns on "dbstop if error" when running tasks and using cluster.type debug.
+|cluster.data_location|	gRadar.cluster.data_location|	Sets the location of the batch directories.|
-cluster.desired_time_per_job	0	Sets the desired time per job. This controls how tasks will be divided into jobs. Setting to zero means one task per job.
+|cluster.dbstop_if_error|	true|	Turns on "dbstop if error" when running tasks and using cluster.type debug.|
-cluster.file_check_pause	4	Sets the number of seconds to wait for the output file when it is expected.
+|cluster.desired_time_per_job|	0	|Sets the desired time per job. This controls how tasks will be divided into jobs. Setting to zero means one task per job.|
-cluster.file_version	'-v7'	Sets the version of the static (static.mat) and dynamic (dynamic.mat) input files and of the output files (out_TASKID.mat) used by the cluster. NOTE: -v6 does not support unicode characters. -v7 supports unicode characters. -v7.3 is HDF, but is generally not as efficient for this task.
+|cluster.file_check_pause|	4|	Sets the number of seconds to wait for the output file when it is expected.|
-cluster.force_compile	false	If true, cluster_compile will always recompile even if no files have changed.
+|cluster.file_version|	'-v7'|	Sets the version of the static (static.mat) and dynamic (dynamic.mat) input files and of the output files (out_TASKID.mat) used by the cluster. NOTE: -v6 does not support unicode characters. -v7 supports unicode characters. -v7.3 is HDF, but is generally not as efficient for this task.|
-cluster.hidden_depend_funs	{}	Cell array of hidden dependency functions needed by cluster_compile.
+|cluster.force_compile|	false|	If true, cluster_compile will always recompile even if no files have changed.|
-cluster.job_complete_pause	{}	For Matlab compiled jobs, this is the pause in seconds after the job completes. Useful on large file systems/clusters that may have long delay times for output files to show up to all computers.
+|cluster.hidden_depend_funs|	{}	|Cell array of hidden dependency functions needed by cluster_compile.|
-cluster.matlab_mcr_path	matlabroot	Sets the location of the Matlab Compile Runtime library installation. If you have Matlab installed, this is just your Matlab installation directory. Only used for torque and slurm.
+|cluster.job_complete_pause|	{}|	For Matlab compiled jobs, this is the pause in seconds after the job completes. Useful on large file systems/clusters that may have long delay times for output files to show up to all computers.|
-cluster.max_jobs_active	1	Sets the maximum number of active (queued or running) jobs.
+|cluster.matlab_mcr_path|	matlabroot|	Sets the location of the Matlab Compile Runtime library installation. If you have Matlab installed, this is just your Matlab installation directory. Only used for torque and slurm.|
-cluster.max_mem_mode	'debug'	String that controls how cluster_update_task.m will behave when the max memory is exceeded. Options are 'debug' which drops the session into debug mode when memory is exceeded and 'auto' which doubles the memory request and resubmits the job.
+|cluster.max_jobs_active|	1|	Sets the maximum number of active (queued or running) jobs.
-cluster.max_retries	1	Sets the maximum number of retries per task before it gives up running that task.
+|cluster.max_mem_mode|	'debug'	String that controls how cluster_update_task.m will behave when the max memory is exceeded. Options are 'debug' which drops the session into debug mode when memory is exceeded and 'auto' which doubles the memory request and resubmits the job.|
-cluster.max_time_per_job	86400	Sets the maximum time per job in seconds.
+|cluster.max_retries|	1|	Sets the maximum number of retries per task before it gives up running that task.|
-cluster.mcc	'system'	If set to 'system' it runs mcc from the command line using the system() function. This is preferable because the Matlab Compiler license gets released when finished. However, mcc from the command line may not work. If so, set to 'eval' and mcc will be run from within Matlab. The downside is that the license will not be released until this Matlab session ends.
+|cluster.max_time_per_job|	86400	|Sets the maximum time per job in seconds.|
-cluster.mcr_cache_root	'/tmp'	Sets the location of the Matlab Compile Runtime temporary folder. Only used for torque and slurm.
+|cluster.mcc|	'system'|	If set to 'system' it runs mcc from the command line using the system() function. This is preferable because the Matlab Compiler license gets released when finished. However, mcc from the command line may not work. If so, set to 'eval' and mcc will be run from within Matlab. The downside is that the license will not be released until this Matlab session ends.|
-cluster.mem_mult	1	This is a multiplication factor that will be applied to the supplied memory required for a task.
+|cluster.mcr_cache_root|	'/tmp'|	Sets the location of the Matlab Compile Runtime temporary folder. Only used for torque and slurm.|
-cluster.mem_to_ppn	[]	If not empty, this causes memory requirements to be converted into processor requirements. This is useful when Torque ignores the memory requirement. For example with 46 processors per node and 120e9 bytes of memory per node available to the cluster, one would set cluster.mem_to_ppn = 120e9/46. The max_ppn parameter must be set as well.
+|cluster.mem_mult|	1	|This is a multiplication factor that will be applied to the supplied memory required for a task.|
-cluster.max_ppn	[]	To be used when mem_to_ppn is set. This should usually be set equal to the number of cores on a processor. The valid range is 1 to the number of cores on a processor. Since there is no way with torque to ensure that a task requesting multiple nodes will have all the nodes on one machine, we are restricted to requesting one node. We cannot request more cores than this node has or the job will never run.
+|cluster.mem_to_ppn|	[]|	If not empty, this causes memory requirements to be converted into processor requirements. This is useful when Torque ignores the memory requirement. For example with 46 processors per node and 120e9 bytes of memory per node available to the cluster, one would set |cluster.mem_to_ppn| = 120e9/46. The max_ppn parameter must be set as well.|
-cluster.qsub_submit_arguments	'-m n -l nodes=1:ppn=%p,pmem=%m,walltime=%t'	Submit argument string for torque. Note memory and time requirements are inserted with regexprep inside cluster_submit_job.m.
+|cluster.max_ppn	|[]	|To be used when mem_to_ppn is set. This should usually be set equal to the number of cores on a processor. The valid range is 1 to the number of cores on a processor. Since there is no way with torque to ensure that a task requesting multiple nodes will have all the nodes on one machine, we are restricted to requesting one node. We cannot request more cores than this node has or the job will never run.|
-cluster.ssh_hostname		Optional cluster hostname. If empty or undefined, then cluster commands are run on the local machine. If specified and not empty, cluster commands will be submitted using "ssh -p %d -o LogLevel=QUIET -t %s@%s "COMMAND" where the port=cluster.ssh_port, username=cluster.ssh_username and hostname=cluster.ssh_hostname. This functionality requires that the user's login does not print any text to the screen because this will confuse the cluster software's interpretation of the output (i.e. this remote server implementation is not robust). Commands that print to the screen should have " &>/dev/null" and/or " >/dev/null" added to the end of each of them. Occasionally, the "ssh" command hangs which causes the matlab process to hang. Run "ps -ef | grep USERNAME | grep ssh" several times over the course of one minute from the terminal and if you see the same command show up in the list every time, then the command has probably hung and should be killed. To kill the process, run "kill PROCESS_ID". For example, "jpaden 9814 47277 0 01:00 pts/21 00:00:00 ssh -p 22 -o LogLevel=QUIET -t jpaden@karst.uits.iu.edu qstat -u jpaden </dev/null" is killed by "kill 9814".
+|cluster.qsub_submit_arguments|	'-m n -l nodes=1:ppn=%p,pmem=%m,walltime=%t'|	Submit argument string for torque. Note memory and time requirements are inserted with regexprep inside cluster_submit_job.m.|
+|cluster.ssh_hostname	||	Optional cluster hostname. If empty or undefined, then cluster commands are run on the local machine. If specified and not empty, cluster commands will be submitted using "ssh -p %d -o LogLevel=QUIET -t %s@%s "COMMAND" where the port=cluster.ssh_port, username=cluster.ssh_username and hostname=cluster.ssh_hostname. This functionality requires that the user's login does not print any text to the screen because this will confuse the cluster software's interpretation of the output (i.e. this remote server implementation is not robust). Commands that print to the screen should have " &>/dev/null" and/or " >/dev/null" added to the end of each of them. Occasionally, the "ssh" command hangs which causes the matlab process to hang. Run "ps -ef | grep USERNAME | grep ssh" several times over the course of one minute from the terminal and if you see the same command show up in the list every time, then the command has probably hung and should be killed. To kill the process, run "kill PROCESS_ID". For example, "jpaden 9814 47277 0 01:00 pts/21 00:00:00 ssh -p 22 -o LogLevel=QUIET -t jpaden@karst.uits.iu.edu qstat -u jpaden </dev/null" is killed by "kill 9814".
 Also, no password should be required, so it is likely that a key should be setup. Reminder steps:
 ssh-keygen -t rsa
 ssh username@hostname mkdir -p .ssh
-cat .ssh/id_rsa.pub | ssh username@hostname 'cat >> .ssh/authorized_keys'
+cat .ssh/id_rsa.pub ssh username@hostname 'cat >> .ssh/authorized_keys'|
-cluster.ssh_port	22	If ssh_hostname is specified, this ssh_port will be used.
+|cluster.ssh_port|	22	|If ssh_hostname is specified, this ssh_port will be used.|
-cluster.ssh_username	whoami	If ssh_hostname is specified, this ssh_username will be used. The default username is the response from the whoami command on the local machine.
+|cluster.ssh_username|	whoami|	If ssh_hostname is specified, this ssh_username will be used. The default username is the response from the whoami command on the local machine.|
-cluster.slurm_submit_arguments	'-N 1 -n 1 --mem=%m --time=%t'	Submit argument string for slurm. Note memory and time requirements are inserted with regexprep inside cluster_submit_job.m. QOS and partition requests can be added here. For example "-N 1 -n 1 --mem=%m --time=%t -p fat --qos=short" would submit to the "fat" partition and set the quality of service to "short".
+|cluster.slurm_submit_arguments|	'-N 1 -n 1 --mem=%m --time=%t'	|Submit argument string for slurm. Note memory and time requirements are inserted with regexprep inside cluster_submit_job.m. QOS and partition requests can be added here. For example "-N 1 -n 1 --mem=%m --time=%t -p fat --qos=short" would submit to the "fat" partition and set the quality of service to "short".|
-cluster.stat_pause	1	Sets the number of seconds to pause between each status check.
+|cluster.stat_pause	|1|	Sets the number of seconds to pause between each status check.|
-cluster.stop_on_error	1	If a Matlab error occurs in the execution of a task, then the submission script goes into debug mode in cluster_update_task.m.
+|cluster.stop_on_error|	1|	If a Matlab error occurs in the execution of a task, then the submission script goes into debug mode in cluster_update_task.m.|
-cluster.submit_pause	0	Sets the number of seconds to pause between each submission.
+|cluster.submit_pause|	0|	Sets the number of seconds to pause between each submission.|
-cluster.type	'debug'	Sets the cluster type to run jobs (torque, matlab, slurm, debug).
+|cluster.type|	'debug'	|Sets the cluster type to run jobs (torque, matlab, slurm, debug).|
 User settings used by user programs:
-Cluster Setting	Default	Description
+|Cluster Setting|	Default|	Description|
-cluster.rerun_only	false	This is a property used by the user. Typically it means a task will not be created if its output already exists.
+|------------------|-------------|-----------------|
+|cluster.rerun_only|	false|	This is a property used by the user. Typically it means a task will not be created if its output already exists.|
 Settings that should not be modified by user. "BATCH" in the filenames below represents the directory name where the batch files are stored. BATCH is formatted like this "batch_BB_tp20f818ed_5a79_4002_b41a_bf91ab030392" where the "BB" represents that batch number (positive integer starting at 1) and "tp20f818ed_5a79_4002_b41a_bf91ab030392" represents a random string generated by the Matlab tempname.m command.
-Cluster Setting	Default	Description
+|Cluster Setting|	Default|	Description|
-batch_dir	ctrl.cluster.data_location	Directory where batch files are stored.
+|--------------|---------------|-------------------|
-job_id_fn	BATCH/job_id_file	File with all the job IDs for each task. One line per task. 20 characters. -1 means the task is waiting to be submitted.
+|batch_dir|	ctrl.cluster.data_location|	Directory where batch files are stored.|
-batch_id	Lowest positive ID that is not used	An integer containing the batch ID for this batch.
+|job_id_fn|	BATCH/job_id_file|	File with all the job IDs for each task. One line per task. 20 characters. -1 means the task is waiting to be submitted.|
-in_fn_dir	BATCH/in	Inputs for each task: static.mat contains inputs that do not change, dynamic.mat contains inputs that do change in variables called dparam_TASKID.
+|batch_id|	Lowest positive ID that is not used|	An integer containing the batch ID for this batch.|
-out_fn_dir	BATCH/out	Outputs and errors for each task: out_TASKID.mat file
+|in_fn_dir|	BATCH/in|	Inputs for each task: static.mat contains inputs that do not change, dynamic.mat contains inputs that do change in variables called dparam_TASKID.|
-stdout_fn_dir	BATCH/stdout	Standard output for each task (slurm and torque only). Stored in stdout_TASKID.txt files and only one task (the one with the maximum task ID) per job will have the file.
+|out_fn_dir|	BATCH/out|	Outputs and errors for each task: out_TASKID.mat file|
-error_fn_dir	BATCH/error	Standard error for each task (slurm and torque only). Stored in error_TASKID.txt files and only one task (the one with the maximum task ID) per job will have the file.
+|stdout_fn_dir|	BATCH/stdout|	Standard output for each task (slurm and torque only). Stored in stdout_TASKID.txt files and only one task (the one with the maximum task ID) per job will have the file.|
-hold_fn	BATCH/hold_fn	Empty file that if present causes cluster_run to enter debug mode. Placed by cluster_hold.m
+|error_fn_dir|	BATCH/error|	Standard error for each task (slurm and torque only). Stored in error_TASKID.txt files and only one task (the one with the maximum task ID) per job will have the file.|
-job_id_list	NA	Cached contents of job_id_fn
+|hold_fn|	BATCH/hold_fn|	Empty file that if present causes cluster_run to enter debug mode. Placed by cluster_hold.m|
-task_id	NA	Last task ID used.
+|job_id_list|	NA|	Cached contents of job_id_fn|
-submission_queue	NA	Vector representing queue of task IDs waiting to be submitted. Index 1 is front of queue.
+|task_id|	NA|	Last task ID used.|
-job_status	NA	String containing the job status for each task. 'T': waiting to be submitted, 'Q/R': active, 'C': complete.
+|submission_queue|	NA|	Vector representing queue of task IDs waiting to be submitted. Index 1 is front of queue.|
-error_mask	NA	Vector containing the error status for each task. 0 is no error.
+|job_status|	NA|	String containing the job status for each task. 'T': waiting to be submitted, 'Q/R': active, 'C': complete.|
-retries	NA	Vector containing the number of retries used for each task
+|error_mask	|NA|	Vector containing the error status for each task. 0 is no error.|
-active_jobs	NA	Number of active jobs.
+|retries|	NA|	Vector containing the number of retries used for each task|
-notes	NA	Cache of the same field stored in sparam/dparam. This field is not always available. The values are stored in a cell array. Each cell contains the value of this field for the corresponding task (e.g. the first cell contains the "notes" for task_id 1).
+|active_jobs|	NA|	Number of active jobs.|
-cpu_time	NA	Cache of the same field stored in sparam/dparam. This field is not always available. The values are stored in a vector. Each element contains the value of this field for the corresponding task (e.g. the first element contains the "cpu_time" for task_id 1).
+|notes|	NA|	Cache of the same field stored in sparam/dparam. This field is not always available. The values are stored in a cell array. Each cell contains the value of this field for the corresponding task (e.g. the first cell contains the "notes" for task_id 1).|
-mem	NA	Cache of the same field stored in sparam/dparam. This field is not always available. The values are stored in a vector. Each element contains the value of this field for the corresponding task (e.g. the first element contains the "mem" for task_id 1).
+|cpu_time|	NA|	Cache of the same field stored in sparam/dparam. This field is not always available. The values are stored in a vector. Each element contains the value of this field for the corresponding task (e.g. the first element contains the "cpu_time" for task_id 1).|
-success	NA	Cache of the same field stored in sparam/dparam. This field is not always available. The values are stored in a cell array. Each cell contains the value of this field for the corresponding task (e.g. the first cell contains the "success" for task_id 1).
+|mem|	NA|	Cache of the same field stored in sparam/dparam. This field is not always available. The values are stored in a vector. Each element contains the value of this field for the corresponding task (e.g. the first element contains the "mem" for task_id 1).|
-file_success	NA	Cache of the same field stored in sparam/dparam. This field is not always available. The values are stored in a cell array. Each cell contains the value of this field for the corresponding task (e.g. the first cell contains the "file_success" for task_id 1).
+|success|	NA|	Cache of the same field stored in sparam/dparam. This field is not always available. The values are stored in a cell array. Each cell contains the value of this field for the corresponding task (e.g. the first cell contains the "success" for task_id 1).|
+|file_success|	NA|	Cache of the same field stored in sparam/dparam. This field is not always available. The values are stored in a cell array. Each cell contains the value of this field for the corresponding task (e.g. the first cell contains the "file_success" for task_id 1).|
 The settings for each task are stored in the "in" directory within each batch directory. These are stored in the static.mat and dynamic.mat files in the "in" directory. The parameter structure for a given task is the result of calling merge_structs.m on static_param (stored in static.mat) and dparam{task_id} (stored in dynamic.mat). The combined result should contain all the input parameters for the task.
 NOTE: static fields that do not change for each task should be stored in "static_param" to save space on disk. It is the job of the script that is creating the tasks to determine where to store each parameter (in sparam or in dparam).
-Input Files static.mat and dynamic.mat
+# Input Files static.mat and dynamic.mat
 The input files (in/static.mat and in/dynamic.mat) are created by cluster_new_task. Their contents are set by the sparam and dparam input arguments to cluster_new_task. Usually, the function call looks like this:
+```
 ctrl = cluster_new_task(ctrl,sparam,dparam,'dparam_save',0); % Creating multiple tasks in a batch
 ctrl = cluster_new_task(ctrl,sparam,[]); % Creating a single task in a batch
+```
 The sparam input argument maps to the in/static.mat "static_param" variable.
 The dparam input argument for task TASK_ID maps to the in/dynamic.mat "dparam{TASK_ID}" variable.
 The contents of the in/static.mat and in/dynamic.mat files are very similar and will be merged when each task executes. The fields in the in/static.mat input file are:
-Cluster Setting	Default	Description
+|Cluster Setting|	Default	Description|
-static_param	NA	Structure with static parameters. These are parameters that do not change from task to task within one batch.
+|---------------|--------------------------|
-static_param.cpu_time	0	The CPU time in seconds to be requested for this task.
+|static_param|	NA|	Structure with static parameters. These are parameters that do not change from task to task within one batch.|
-static_param.file_success	{}	Used by cluster_update_task.m to determine if a task has completed successfully. Each task's file_success field is a cell array of files that must contain a file_version field without a 'D' in the file_version to be considered successfully created. If all of the files in the cell array list exist and meet this condition, then the task is considered to have ran successfully.
+|static_param.cpu_time|	0	|The CPU time in seconds to be requested for this task.|
-static_param.file_version	ctrl.cluster.file_version	This field is set to the value stored in ctrl.cluster.file_version when cluster_new_task is called. The version argument to the save function in Matlab. There is no default. It should be set to '-v7.3' and is generally set from cluster.file_version in cluster_new_task.m.
+|static_param.file_success|	{}|	Used by cluster_update_task.m to determine if a task has completed successfully. Each task's file_success field is a cell array of files that must contain a file_version field without a 'D' in the file_version to be considered successfully created. If all of the files in the cell array list exist and meet this condition, then the task is considered to have ran successfully.|
-static_param.mem	0	The memory in bytes to be requested for this task.
+|static_param.file_version|	ctrl.cluster.file_version|	This field is set to the value stored in ctrl.cluster.file_version when cluster_new_task is called. The version argument to the save function in Matlab. There is no default. It should be set to '-v7.3' and is generally set from cluster.file_version in cluster_new_task.m.|
-static_param.notes	''	The notes field is used for debugging and is printed out by cluster_print and other cluster functions to provide information to the operator about the tasks.
+|static_param.mem|	0	|The memory in bytes to be requested for this task.|
-static_param.num_args_out	0	The number of output arguments to expect for this task.
+|static_param.notes	|''|	The notes field is used for debugging and is printed out by cluster_print and other cluster functions to provide information to the operator about the tasks.|
-static_param.success	''	Success criteria evaluation string. Used by cluster_update_task.m to determine if task was successful. If the command in the string evaluates with a logical true, then the task is successful.
+|static_param.num_args_out|	0	|The number of output arguments to expect for this task.
-static_param.taskfunction	NA	String containing the function to be run
+static_param.success	''	Success criteria evaluation string. Used by cluster_update_task.m to determine if task was successful. If the command in the string evaluates with a logical true, then the task is successful.|
-The fields in the in/dynamic.mat are:
+|static_param.taskfunction	|NA|	String containing the function to be run
+The fields in the in/dynamic.mat are:|
-Cluster Setting	Description
-dparam	Cell array. Each cell contains a structure with the dynamic parameters for the corresponding task. The fields inside each cell array are the same as for the in/static.mat file's static_param structure.
+|Cluster Setting|	Description|
-dparam{TASKID}	Structure with fields the same as in/static.mat file's static_param structure. These fields do not need to be defined in in/dynamic.mat or in/static.mat, but they must be defined in one or the other place. If defined in both files, the dparam{TASKID} will override the static parameters.
+|---------------|------------------|
-dparam{TASKID}.file_success	See in/static.mat description.
+|dparam|	Cell array. Each cell contains a structure with the dynamic parameters for the corresponding task. The fields inside each cell array are the same as for the in/static.mat file's static_param structure.|
-dparam{TASKID}.file_version	See in/static.mat description.
+|dparam{TASKID}|	Structure with fields the same as in/static.mat file's static_param structure. These fields do not need to be defined in in/dynamic.mat or in/static.mat, but they must be defined in one or the other place. If defined in both files, the dparam{TASKID} will override the static parameters.|
-dparam{TASKID}.mem	See in/static.mat description.
+|dparam{TASKID}.file_success|	See in/static.mat description.|
-dparam{TASKID}.notes	See in/static.mat description.
+|dparam{TASKID}.file_version|	See in/static.mat description.|
-dparam{TASKID}.num_args_out	See in/static.mat description.
+|dparam{TASKID}.mem|	See in/static.mat description.|
-dparam{TASKID}.success	See in/static.mat description.
+|dparam{TASKID}.notes|	See in/static.mat description.|
-dparam{TASKID}.taskfunction	See in/static.mat description.
+|dparam{TASKID}.num_args_out|	See in/static.mat description.|
-Output Files out_TASKID.mat
+|dparam{TASKID}.success|	See in/static.mat description.|
+|dparam{TASKID}.taskfunction|	See in/static.mat description.|
+# Output Files out_TASKID.mat
 The output files (in/out_TASKID.mat and in/dynamic.mat) are created by cluster_job.m in regular operation and cluster_exec_task.m in the debug run_mode 1. The fields in the out/out_TASKID.mat output files are:
-Cluster Setting	Description
+|Cluster Setting	|Description|
-argsout	Cell array containing all of the output arguments of the task function. The number of output arguments must be specified in the input files' num_args_out field.
+|------------------------|----------|
-cpu_time_actual	Double scalar containing the number of seconds it took to execute the task.
+|argsout|	Cell array containing all of the output arguments of the task function. The number of output arguments must be specified in the input files' num_args_out field.|
-errorstruct	If no error occurs, this field will be empty. If an error is thrown, the Matlab exception will be contained in this field.
+|cpu_time_actual|	Double scalar containing the number of seconds it took to execute the task.|
-Torque/PBS
+|errorstruct|	If no error occurs, this field will be empty. If an error is thrown, the Matlab exception will be contained in this field.|
-Common Commands
-Always available
+# Torque/PBS
-qstat
+## Common Commands
+**Always available**
+**qstat**
 List all jobs
-qstat -q -u USERNAME and qstat -r -u USERNAME
+**qstat -q -u USERNAME and qstat -r -u USERNAME**
 List all queued or running jobs owned by USERNAME
-qdel 'all'
+**qdel 'all'**
 Deletes all jobs that you control (normally just your jobs unless administrator).
-pbsnodes -l
+**pbsnodes -l**
 Prints out node reservations
-Only available on PBS/torque
+**Only available on PBS/torque**
-showstart JOBID
+**showstart JOBID**
 Shows scheduled time for job.
-checkjob JOBID
+**checkjob JOBID**
 Prints out complete information about the job including scheduled time
-Other Commands
-qalter
+# Other Commands
+**qalter**
 Change the hold state of a job
-qdel JOBID
+**qdel JOBID**
 Deletes a specific job
-qsub
+**qsub**
 Submit a job
-Interactive
+## Interactive
 Interactive mode lets you login to the cluster node with the environment setup in the same way it is when the job actually runs. This is helpful for debugging problems that have to do with running your code on the cluster nodes themselves.
 To run interactively, set cluster.interactive to true and then run your batch like you normally would. For each job, the torque_submit_job function will print out the steps to run the job interactively. It is not critical, but to keep the torque submission routines in sync you need to assign the new_job_id as one of these steps. Look for the output from qsub like this:
+```
 qsub: waiting for job 2466505.m2 to start
  ... INTERACTIVE SESSION ...
 qsub: job 2466505.m2 completed
+```
 In this case, run "new_job_id = 2466505" in matlab. m2 is the queue and is not part of the job id.
 When debugging at IU, it is a good idea to add '-q debug' (short jobs) or '-q interactive' (long jobs) to your sched.submit_arguments because your job will get processed much faster than the default queue.
-Slurm
+#Slurm
-Common Commands
+## Common Commands
-scontrol show job JOBID
+**scontrol show job JOBID**
 List complete information for JOBID
-sinfo
+**sinfo**
 List all the computer nodes and their state
-squeue --job JOBID
+**squeue --job JOBID**
 Show status information for JOBID
-sstat --format=AveCPU,AvePages,AveRSS,AveVMSize,JobID -j JOBID --allsteps
+**sstat --format=AveCPU,AvePages,AveRSS,AveVMSize,JobID -j JOBID --allsteps**
 List statistics about JOBID
-Other Commands
-sacct -u <username> --format=JobID,JobName,MaxRSS,Elapsed
+## Other Commands
+**sacct -u <username> --format=JobID,JobName,MaxRSS,Elapsed**
 List information about all jobs
-sacctmgr list qos
+**sacctmgr list qos**
 List information about cluster such as maximum number of jobs that can be active/submitted at once, maximum cores, maximum cpu time, etc.
-sbatch
+**sbatch**
 Submit a job to the cluster to be run in the background
-scancel JOBID
+**scancel JOBID**
 Cancel or delete JOBID
-scontrol show jobid -dd <jobid>
+**scontrol show jobid -dd <jobid>**
 List status info for a currently running job
-scontrol hold job_id
+**scontrol hold job_id**
 Hold a job so that it does not run
-squeue -u <username>
+**squeue -u <username>**
 List all jobs
-srun
+**srun**
 Run a job on the cluster in interactive mode. For example: srun -N 1 -n 1 --mem=20000 --time=480 --pty bash
-sstat --format=AveCPU,AvePages,AveRSS,AveVMSize,JobID -j <jobid> --allsteps
+**sstat --format=AveCPU,AvePages,AveRSS,AveVMSize,JobID -j <jobid> --allsteps**
 List detailed information about a job
 On ollie at AWI some sacct commands do not run, but these commands can be used:
-sudo get_my_jobs.sh
+**sudo get_my_jobs.sh**
 to get information about todays jobs
-sudo get_my_jobs.sh -t
+**sudo get_my_jobs.sh -t**
 to get information about running jobs
-Matlab Compiler
+# Matlab Compiler
 The Matlab compiler is only used with the Torque and Slurm cluster interfaces. Also, it is only required when the code changes. The compiled code is cross platform (i.e. it is possible to compile on a Windows computer and then run the compiled code on a Linux computer). Once cluster_job.m is compiled, it can be used without requiring any Matlab license. For the compiler to work, the Matlab Compiler Runtime libraries (freely distributable) must be installed. If Matlab is installed on a machine, there is a copy of the MCR files (for that version of Matlab) already available and the Matlab installation directory can be used for the MCR path as long as the same version of Matlab was used to compile the files. Inside cluster_new_batch, the matlabroot command is used to set the ctrl.cluster.matlab_mcr_path.
 If Matlab is not installed on the cluster or the compiled version is different than the Matlab version installed, then the gRadar.cluster.matlab_mcr_path should be set to the correct MCR path. Instructions to install the MCR libraries are available on Matlab's website (Matlab Runtime).
-Currently, Matlab is required to submit and track jobs using Slurm and Torque even if the compiler is not needed. A feature needs to be added to support secure shell commands that would allow submission on cluster interfaces (i.e. head nodes) without a Matlab license. The task generation commands should also be compiled so that the dependent files (frames, records, gps, raw data, layer data, etc.) only need to be accessible from the cluster. A useful setup to do this would be to write a generic single job submission function that can run arbitrary functions through ssh; it would 1. compile the function, 2. copy the compiled function, inputs and outputs via scp, and 3. call the compiled function via ssh. This would be useful if a user has a personal Matlab license, but the cluster interface does not have a license and the personal computer does not have direct access to the cluster's file system. The Matlab Runtime/MCR libraries would still need to be installed on the cluster to run the compiled code.
+**Currently, Matlab is required to submit and track jobs using Slurm and Torque even if the compiler is not needed**. A feature needs to be added to support secure shell commands that would allow submission on cluster interfaces (i.e. head nodes) without a Matlab license. The task generation commands should also be compiled so that the dependent files (frames, records, gps, raw data, layer data, etc.) only need to be accessible from the cluster. A useful setup to do this would be to write a generic single job submission function that can run arbitrary functions through ssh; it would 1. compile the function, 2. copy the compiled function, inputs and outputs via scp, and 3. call the compiled function via ssh. This would be useful if a user has a personal Matlab license, but the cluster interface does not have a license and the personal computer does not have direct access to the cluster's file system. The Matlab Runtime/MCR libraries would still need to be installed on the cluster to run the compiled code.
 TshethTalkPreferencesWatchlistContributionsLog outPageDiscussionReadView sourceView history