Doing situps for MO'AB(s)

From athena

Contents

[edit] Job Wrangling

[edit] Details on a job

% checkjob <jobid>

Or for details on "Why isn't my job running on node X":

% checkjob -v <jobid>

You can also do:

% mdiag -j <jobid>

[edit] Node Wrangling

Detailed node status:

% mdiag -n <nodeid>

Even more detailed node status:

% mdiag -n -v <nodeid>

[edit] Why is a node in the "Busy w/No Job" state?

See Soft Errors for how to diagnose this condition and correct it.

[edit] Node Draining and "Undraining"

Telling a node to drain:

% mnodectl compute-3-2.local -m state=Draining
state on node compute-3-2.local updated

Return it to the pool:

% mnodectl compute-3-2.local -m state=Idle
state on node compute-3-2.local updated

[edit] Setting/Changing/Removing Reservations

Make a reservation for the "mops" queue on nodes compute-6-[1-4] starting at 1pm on Sep 8, 2008 and lasting 24 hours. Note that the term "compute-6-1" also matches "compute-6-10," "compute-6-11," etc. Therefore, if you want an exact match, it is necessary to append ".local" so that you only match the first digit of the node number ("^" means that the line must start with "compute..."):

% mrsvctl -c -a CLASS=mops -s 13:00:00_09/08/08 -d 24:00:00 -h '^compute-6-[1-4].local'

NOTE:     reservation mops.1278 created

Make a reservation for the "mops" queue on 16 nodes selected within compute-6-*:

% mrsvctl -c -a CLASS=mops -s 8:00:00_09/15/08 -d 3:00:00:00 -h 'compute-6-*' -t 16

NOTE:     reservation mops.1278 created

Create a reservation for system downtime:

% mrsvctl -c -s 8:00:00_09/22/08 -d 9:00:00 -h ALL

NOTE:     reservation system.1279 created

Release an existing reservation:

% mrsvctl -r mops.1283

reservation mops.1283 successfully released

Modify starttime of reservation (duration will stay the same, thus moving the entire reservation time slot). Note the use of the "--flags=force":

% mrsvctl -m starttime+=1:00:00 --flags=force mops.2814

successfully changed starttime for rsv mops.2814

Modify duration of reservation:

% mrsvctl -m duration=3:00:00 --flags=force mops.2814

successfully changed duration for rsv mops.2814

Examining current reservations:

% showres

[edit] Restarting MOAB

Tell MOAB to re-read config file:

% mschedctl -R

Force MOAB to completely reconfigure its state:

% /etc/rc.d/init.d/moab stop; rm /opt/moab/.moab.ck*; /etc/rc.d/init.d/moab start