You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 3 Next »

Too many power outages in Baker Lab!

Reboot switches

  • 8 switches support the clusters (out of 12 in the room).

Start headnodes, if not on already

  • Only a few are on UPS. Those can obviously be left on.
  • None should be set to auto-start on power-off.

Confirm headnodes accessible via SSH

PuTTY on Windows

Use FMPro to get connection info?! (not the right info there, though...)

Menu => List Machine / User Selections = > List Group or selected criteria

  • Machine function => Server HN

Start compute nodes

If nodes done show up, consider:

  • Restart Torque scheduler on problematic nodes.
  • Try rebooting the switch the affected nodes are connected to, especially if the problematic nodes are grouped to a single switch.
  • Hook up a monitor as one of the high nodes boot.
  • No labels