|
|
There are three ways to monitor the overall status of the HiPAS cluster. Text Mode is the quickest.
If you do not have a DISPLAY redirected to your local machine you can monitor the cluster by typing beostatus -c. You will see a screen very similar to top that will automaniclly update the screen with information about the nodes.
If you do have a display redirected you can use the graphical monitoring tool. You can acccess it by typing beostatus. After a short delay you will have a nice Colored GUI you can use to monitor various aspects of the cluster. You can change the style of the graphs by clicking on the Mode menu. You can exit the program by either closing the window, or clicking on the File menu and selecting Quit
To view online cluster click status here. The screen will refreash automatically after 5 seconds. Note that only the node status is reported here, the output is identical to running beostatus from the command line. Job and queue status coming soon!
Check the status of your job using qstat. Here's an example with output:
[bjosh@hipas]$ qsub myjob && watch qstat -n
master:
Req'd Req'd Elap
Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time
--------------- -------- -------- ---------- ------ --- --- ------ ----- - -----
15.hipas bjosh default myjob -- 1 -- -- 00:01 Q --
--
|
The watch command is used to execute the qstat -n command every 2 seconds by default. This will help you see the progression of events. Press Control-C to interrupt watch.
Some Helpful commands
| Command | Purpose |
|---|---|
ps -ef | bpstat -P |
Display all running jobs, with node number for each. |
qstat -Q |
Display status of all queues. |
qstat -n |
Display status of queued jobs. |
qstat -f JOBID |
Display very detailed information about JOBID. |
qstat -Q -f |
Display status of all queues in more detail. |
pbsnodes -a |
Display status of all nodes. |
qstat -an
Note your jobid(s).
qstat -f jobid
Note the process id(s) of your job(s).
ps -ef | bpstat -P | grep yourname
The number of the node running your job will be displayed in the first column of output.
When your job terminates, Torque will store its output and error streams in files in the script's work directory.
The output file is [JOBNAME].o[JOBID] by default. You can override that using the qsub -o PATH option.
The error file is [JOBNAME].e[JOBID] by default. You can override that using the qsub -e PATH option.
The qsub -j oe option can be used to join the output and error streams into a single file.