Too many power outages in Baker Lab!
Reboot switches
- 8 switches support the clusters (out of 12 in the room).
Start headnodes, if not on already
- Only a few are on UPS. Those can obviously be left on.
- None should be set to auto-start on power-off.
Confirm headnodes accessible via SSH
PuTTY on Windows
Use FMPro to get connection info?! (not the right info there, though...)
Menu => List Machine / User Selections = > List Group or selected criteria
- Machine function => Server HN
Start compute nodes
If nodes done show up, consider:
- Restart Torque scheduler on problematic nodes.
- Try rebooting the switch the affected nodes are connected to, especially if the problematic nodes are grouped to a single switch.
- Hook up a monitor as one of the high nodes boot.