You are viewing an old version of this page. View the current version.
Compare with Current
View Page History
« Previous
Version 58
Next »
Summer 2013 and winter 2014, there were an inordinate number of power outages in Baker Lab, and other Chem buildings!See also
ChemIT's record of recent power outages
Date | Outage duration | Cause | Official link | ChemIT notes |
---|
| Perhaps up to 2 hours, starting Saturday, late afternoon: Sent: Saturday, February 4, 2017 5:46 PM Subject: CornellALERT: Power Outage - Ithaca Campus
CornellALERT: ITHACA CAMPUS POWER OUTAGE ---------------------------------------------------------------------- A widespread power outage is currently impacting the entire Ithaca campus and parts of the Ithaca area. ============= Sent: Saturday, February 4, 2017 7:52 PM Subject: CornellALERT: UPDATE 7:45 p.m. Power Outage - Ithaca Campus CornellALERT: UPDATE 7:45 p.m. ITHACA CAMPUS POWER OUTAGE ---------------------------------------------------------------------- Power to Cornell’s Ithaca campus has been restored. | | https://itservicealerts.hosting.cornell.edu/view/4655 (Nominal start reported was delayed by ~ 1 hour, "Event: 2017-02-04 18:44:00" (6:44). Per reported, largely resolved "As of 7:45pm, Cornell reported that power had been restored.") | NetAdmin-L alerted potential ~10 minutes before Alert: From: On Behalf Of Jamie Rosner Duong Sent: Saturday, February 4, 2017 5:36 PM To: NET-ADMIN-L Subject: Power outage? Are parts of Cornell experiencing a power outage? ============== Early on, "NYSEG has estimated restoration around 8:15-8:30pm" |
4/11/16 Monday | About 40 minutes, starting shortly after noon. Official emails: Mon, Apr 11, 2016 at 12:25 PM: CornellALERT: ITHACA CAMPUS POWER OUTAGE ---------------------------------------------------------------------- There widespread power outage is impacting the entire Ithaca campus and parts of Ithaca. Service personnel are aware of the outage and responding. Additional reports of the outage are not needed. NYSEG and Cornell Utilities personnel are working to restore power as quickly as possible Mon, Apr 11, 2016 at 12:59 PM: CornellALERT: ITHACA CAMPUS POWER OUTAGE - UPDATE 12:48 ---------------------------------------------------------------------- As of 12:48 p.m on 04/11/16 Cornell Utilities reports that the power has beeAs of 12:48 p.m on 04/11/16 Cornell Utilities reports that the power has been fully restored to the Ithaca campus. | Sun article, quoting Melissa Hines: http://cornellsun.com/2016/04/11/damaged-transmission-line-responsible-for-cornell-power-outage/ | | Servers mostly OK. Some UPS failings. Some router failings. See our notes on this particular outage. |
2/27/2014 Thursday | 2 minutes CU's record: Per 3/3/14 email: On Thursday 2/27/14 at 9:50am the campus experienced a disruption of electricity. The whole campus experienced an approximately 30 second outage. Many buildings on the central, west and north campus areas remained without power for 30 minutes to an hour. Oliver's record: Power outage at 9:53a. Restored at 9:55a (more than one minute). | Procedural error? Per 3/3/14 email: The outage was caused as a result of routine maintenance activities which were being conducted at the Campus’ main substation which takes power from the NYSEG transmission system and provides it to campus. This work has been conducted many times before without incident but in this case caused a major disruption of electricity supply. Staff from Utilities and external technical resources are investigating the root cause of this unexpected event. | No link to info in 3/3/14 email? Some delayed time-stamps, from the IT folks: http://www.it.cornell.edu/services/alert.cfm?id=3072 Initial timing info, from the power folks: http://www.cornell.edu/cuinfo/specialconditions/#2050 | Michael led our effort to initially evaluate and restore systems, with Oliver adding to documentation and to-do's. Lulu completed the cluster restoration efforts. Lost 1-2 hours each for Michael, Lulu, and Oliver. Cornell called it a "power blip". In Oliver's books, any outage longer than seconds is not a "blip". Q: Broke one GPU workstation? |
1/27/2014 Monday | 17-19 minutes CU's record: Power outage at 2:22p. Restored around 2:41p. Oliver's record: Power outage at 2:22p. Restored at 2:39p. | ? | http://www.it.cornell.edu/services/alert.cfm?id=3040 | Lulu, Michael, and Oliver shut down head nodes and other systems which were on UPS. (Those systems non UPS shut down hard, per usual.) Lost 3 hours, for Lulu, Michael, and Oliver. Roger away on vacation (out of the U.S.) |
12/23/2013 Monday | 2 minutes CU's report: 08:36 AM - 8:38 AM (ChemIT staff not in yet.) | Procedural error? | http://www.it.cornell.edu/services/alert.cfm?id=2982 | Terrible timing, right before the longest staff holiday of the year. ChemIT staff not present during failure. Lost most of the day, for Roger and Oliver. Michael Hint and Lulu way on vacation (out of the U.S.) |
7/17/13 | Half a morning (~2 hours) CU's report: 8:45 AM - 10:45 AM | | http://www.it.cornell.edu/services/alert.cfm?id=2711 | |
Question: When power is initially restored, do you trust it? Or might it simply kick back off in some circumstances?
- Because we don't know the answer to this question following any specific power outage, we are reluctant to turn back on servers right away. Instead, we like to wait ~10-25 minutes.
Procedures and reminders
Reboot switches?
- 8 switches support the clusters (out of 12 in the room).
Start head nodes, if not on already
- Only a few are on UPS. Those can obviously be left on.
- None should be set to auto-start on power-off.
Confirm head nodes accessible via SSH
PuTTY on Windows
Use FMPro to get connection info?! (not the right info there, though...)
Menu => List Machine / User Selections = > List Group or selected criteria
- Machine function => Server HN
Start compute nodes
If nodes done show up, consider:
- Restart Torque scheduler on problematic nodes.
- Try rebooting the switch the affected nodes are connected to, especially if the problematic nodes are grouped to a single switch.
- Hook up a monitor as one of the high nodes boot.