Historical Node Restarts

From athena

Historical Node restarts
Date Time Node Action Reason/notes
3/19/08 18:00 3-21 powercord reboot; possibly due to power-strip testing
3/19/08 18:00 3-21 powercord reboot; possibly due to power-strip testing
3/19/08 18:00 3-21 powercord reboot; possibly due to power-strip testing
3/20/08 11:35 6-25 powercord reboot; appeared hung, tapping power button didn't affect it.
3/25/08 11:00 6-32 node-power reboot; showstate reported "down, with job". Unresponsive.
3/26/08 16:40 6-28 powercord reboot; E2110 MBE Error: DIMM 3 and 4 (died at 1am)
4/01/08 11:49 3-21 powercord reboot; E1410 CPU 2 IERR, E1410 CPU 1 IERR, E2119
4/01/08 22:00 6-29 powercord reboot 4/2; E1410 CPU 2 IERR, E1410 CPU 1 IERR, E2119
4/03/08 13:00 3-32 powercord reboot; probable overcommit_memory issue - set to 2 after reboot
4/03/08 14:00 3-24 powercord reboot; probable overcommit_memory issue - set to 2 after reboot
4/03/08 14:00 3-21 powercord reboot; probable overcommit_memory issue - cluster set to 2 after reboot
4/03/08 14:50 6-23 powercord reboot; kernel panic upon attaching USB keyboard
4/07/08 10:00 6-20 powercord reboot; E1422 CPU Machine Check, VGA screen displays nmi sync error.
4/08/08 11:15 6-26 powercord reboot; no obvious reason, died at 04:00
4/08/08 13:25 6-28 powercord reboot; E1410 CPU 2 IERR, E1410 CPU 1 IERR, E2119
4/14/08 13:25 6-20 powercord reboot; 6-20, 6-26, 3-21 between 4-09 to 4-14
4/16/08 17:00 Poly1 controlled reboot; E1211 ROMB Batt ... diagnostic repair attempt
5/05/08 09:00 many powercord reboot; 3-21,4-7,4-9,4-11,4-14,4-17,4-18,4-21,4-25,5-25,5-26,5-27,5-28,6-20
5/05/08 17:00 3-25 powercord reboot;
5/07/08 13:25 two powercord reboot; 5-30, 6-20
5/08/08 13:25 11:00 powercord reboot; 5-30, 5-32, 4-25 (5-30 was running same jobs as yesterday)
5/12/08 16:30 3-2 powercord reboot; no trouble indication on front panel, no job stuck running.
5/19/08 10:30 3-19,6-32 powercord reboot; 3-19 had usual messages, 6-32 was happily blue.
5/19/08 10:30 3-22 dead disk; awaiting replacement, replace and insert-ethers --replace at 11am 5/20
5/19/08 13:30 6-31 powercord reboot; no trouble indication on front panel.
5/20/08 11:00 3-10,3-27,4-32 powercord reboot; no trouble indication on front panel, simultaneous with restoring 3-22.
5/20/08 13:40 6-2,6-4 powercord reboot; 6-4 showed E2119 SBE (single-bit-errors) on front panel.
5/20/08 16:00 5-30 powercord reboot; nothing on front panel. rebooted at 17:30
5/20/08 18:00 4-30,5-30,6-4 powercord reboot; 6-4 showing SBE after power cycle.
5/21/08 10:00 3-12,6-3 powercord reboot; 3-12 went down circa midnight, 6-3 circa 9am
5/27/08 10:55 6-28 powercord reboot; E1410 CPU 1 (and 2) IERR on panel, but suspiciously synchronous with a job finishing.
5/28/08 15:00 6-20 powercord reboot; E2119, E1410 CPU 1 (and 2) IERR on panel
5/29/08 02:00 6-29 powercord at 11am; E2119 SBC Mem, E1410 CPU 1 and 2 IERR on panel
6/02/08 12:01am 3-1,3-3 powercord at 11am; no errors on panels
6/03/08 10:00pm 3-30,3-32 remote power drop software testing (3-30 rebooted 6/4/08)
6/03/08 10:54pm athena0 powercord at 10:30am June 4; E2119 SBC Mem, E1410 CPU 1 and 2 IERR on panel
6/05/08 10:00pm 3-11,3-24,6-16 powercord at 8:30am June 6; Jeff Gardner testing
6/20/08 10:00am 3-21 powercord soft reboot. Died doing job 189855, "usual" E2119,E1410 errs on panel
6/23/08 10:00am 6-28 powercord soft reboot. "usual" E2119,E1410 errs on panel
6/23/08 10:00am various remote reboots Torque maintenance
7/3/08 10:00am 3-6 soft reboot done by Duncan, no data
7/7/08 11:00am 6-20 soft reboot "usual" E2119,E1410 errs on panel
7/23/08 17:15pm 6-26 hard reboot just hung, no amber displays
7/24/08 14:30pm 6-28 hard reboot just hung, no amber displays
8/4/08 13:00pm 6-20 hard reboot "usual" E2119,E1410 errs on panel (died 8/3)
8/4/08 13:00pm 3-21 hard reboot "usual" E2119,E1410 errs on panel (died 7/31)
8/25/08 11:00am 5-2,5-3,6-20 hard reboot 6-20 had "usual" E2119,E1410 errs on panel (died 8/22)
8/26/08 12:00pm 3-7 hard reboot just hung, no amber displays
8/27/08 10:00am 3-2,6-20 hard reboot 6-20 had "usual" E2119,E1410 errs on panel
8/28/08 10:00am 3-1 hard reboot no errors shown, had died circa 10pm 8/27
8/28/08 14:30am 3-8,3-9 hard reboot no errors shown, had died circa 13:30 8/28
8/31/08 18:30am 3-5 hard reboot E2119 only

Back to Node Restarts

Back to Service Interruptions log