There are three ways to monitor the overall status of the HiPAS cluster. Text Mode is the quickest.
If you do not have a
DISPLAY redirected to your local machine you can monitor the cluster by typing
beostatus -c. You will see a screen very similar to
top that will automaniclly update the screen with information about the nodes.
If you do have a display redirected you can use the graphical monitoring tool. You can acccess it by typing
beostatus. After a short delay you will have a nice Colored GUI you can use to monitor various aspects of the cluster. You can change the style of the graphs by clicking on the
Mode menu. You can exit the program by either closing the window, or clicking on the
File menu and selecting
To view online cluster click status here. The screen will refreash automatically after 5 seconds. Note that only the node status is reported here, the output is identical to running beostatus from the command line. Job and queue status coming soon!
Monitoring Job Status
Check the status of your job using
qstat. Here's an example with output:
[bjosh@hipas]$ qsub myjob && watch qstat -n master: Req'd Req'd Elap Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time --------------- -------- -------- ---------- ------ --- --- ------ ----- - ----- 15.hipas bjosh default myjob -- 1 -- -- 00:01 Q -- --
watch command is used to execute the
qstat -n command every 2 seconds by default. This will help you see the progression of events. Press Control-C to interrupt
Some Helpful commands
||Display all running jobs, with node number for each.|
||Display status of all queues.|
||Display status of queued jobs.|
||Display very detailed information about JOBID.|
||Display status of all queues in more detail.|
||Display status of all nodes.|
How to Find Which Nodes Your Job is Using
Note your jobid(s).
qstat -f jobid
Note the process id(s) of your job(s).
ps -ef | bpstat -P | grep yourname
The number of the node running your job will be displayed in the first column of output.
Where To Find Job Output
When your job terminates, Torque will store its output and error streams in files in the script's work directory.
The output file is
[JOBNAME].o[JOBID] by default. You can override that using the
qsub -o PATH option.
The error file is
[JOBNAME].e[JOBID] by default. You can override that using the
qsub -e PATH option.
qsub -j oe option can be used to join the output and error streams into a single file.