ある日ケース背面(リア)のファンがうるさい音をたてるようになりました。
電源を落として再度電源を投入すると音が止まるのですが、しばらくするとまた音がではじめるので
  • 新しいファンを買わねば
  • 冬だからそんなに急がなくて大丈夫かな
と思って、ファンの電源を落としたままで、ケースの蓋を閉じてそのままにしていました。
12月23日の15時頃、以下のメッセージを出力してサーバーの応答がなくなりました。

/var/log/messages
Dec 22 10:02:56 mcelog: Processor 0 heated above trip temperature. Throttling enabled.
Dec 22 10:02:56 mcelog: Please check your system cooling. Performance will be impacted
Dec 22 10:02:56 mcelog: Processor 0 below trip temperature. Throttling disabled
Dec 22 14:10:26 mcelog: Processor 0 heated above trip temperature. Throttling enabled.
Dec 22 14:10:26 mcelog: Please check your system cooling. Performance will be impacted
Dec 22 14:10:26 mcelog: Processor 0 below trip temperature. Throttling disabled
Dec 22 14:17:56 mcelog: Processor 0 heated above trip temperature. Throttling enabled.
Dec 22 14:17:56 mcelog: Please check your system cooling. Performance will be impacted
Dec 22 14:17:56 mcelog: Processor 0 below trip temperature. Throttling disabled
Dec 22 14:24:11 mcelog: Processor 0 heated above trip temperature. Throttling enabled.
Dec 22 14:24:11 mcelog: Please check your system cooling. Performance will be impacted
Dec 22 14:24:11 mcelog: Processor 0 below trip temperature. Throttling disabled
Dec 22 14:29:11 mcelog: Processor 0 heated above trip temperature. Throttling enabled.
Dec 22 14:29:11 mcelog: Please check your system cooling. Performance will be impacted
Dec 22 14:29:11 mcelog: Processor 0 below trip temperature. Throttling disabled
Dec 22 14:35:26 mcelog: Processor 0 heated above trip temperature. Throttling enabled.
Dec 22 14:35:26 mcelog: Please check your system cooling. Performance will be impacted
Dec 22 14:35:26 mcelog: Processor 0 below trip temperature. Throttling disabled
Dec 22 14:42:56 mcelog: Processor 0 heated above trip temperature. Throttling enabled.
Dec 22 14:42:56 mcelog: Please check your system cooling. Performance will be impacted
Dec 22 14:42:56 mcelog: Processor 0 below trip temperature. Throttling disabled
Dec 22 14:50:26 mcelog: Processor 0 heated above trip temperature. Throttling enabled.
Dec 22 14:50:26 mcelog: Please check your system cooling. Performance will be impacted
Dec 22 14:50:26 mcelog: Processor 0 below trip temperature. Throttling disabled
Dec 22 14:52:56 mcelog: Processor 0 heated above trip temperature. Throttling enabled.
Dec 22 14:52:56 mcelog: Please check your system cooling. Performance will be impacted
Dec 22 14:52:56 mcelog: Processor 0 below trip temperature. Throttling disabled
Dec 22 15:00:26 mcelog: Processor 0 heated above trip temperature. Throttling enabled.
Dec 22 15:00:26 mcelog: Please check your system cooling. Performance will be impacted
Dec 22 15:00:26 mcelog: Processor 0 below trip temperature. Throttling disabled

/var/log/mcelog
CPU 0 THERMAL EVENT TSC 9557f0a491a8cb
TIME 1356155513 Sat Dec 22 14:51:53 2012
Processor 0 heated above trip temperature. Throttling enabled.
Please check your system cooling. Performance will be impacted
STATUS 8801000f MCGSTATUS 0
MCGCAP 806 APICID 0 SOCKETID 0
CPUID Vendor Intel Family 6 Model 23
Hardware event. This is not a software error.
MCE 1
CPU 0 THERMAL EVENT TSC 9557f0a4980a8d
TIME 1356155513 Sat Dec 22 14:51:53 2012
Processor 0 below trip temperature. Throttling disabled
STATUS 8801000a MCGSTATUS 0
MCGCAP 806 APICID 0 SOCKETID 0
CPUID Vendor Intel Family 6 Model 23
Hardware event. This is not a software error.
MCE 0
CPU 0 THERMAL EVENT TSC 9558bfcbfa2e21
TIME 1356155827 Sat Dec 22 14:57:07 2012
Processor 0 heated above trip temperature. Throttling enabled.
Please check your system cooling. Performance will be impacted
STATUS 8801000f MCGSTATUS 0
MCGCAP 806 APICID 0 SOCKETID 0
CPUID Vendor Intel Family 6 Model 23
Hardware event. This is not a software error.
MCE 1
CPU 0 THERMAL EVENT TSC 9558bfcc009e99
TIME 1356155827 Sat Dec 22 14:57:07 2012
Processor 0 below trip temperature. Throttling disabled
STATUS 8801000a MCGSTATUS 0
MCGCAP 806 APICID 0 SOCKETID 0
CPUID Vendor Intel Family 6 Model 23
 
ああ…、だいぶ熱かったんですね…
現在は蓋をしめずに、開けっ放しで運用しています。

教訓
  •  ケースファンが動いていないのに蓋をしめて運用するのは冬でも無理
  •  ケースファンが壊れたときにすみやかに交換できるように予備を用意しておこう


ケースファンが壊れたのは現用系のサーバーでした。
待機系のサーバーはVRRPでアドレスを引きついで動作を開始しました。

/var/log/messages
Dec 22 15:10:25 erogamescape15 Keepalived_vrrp: VRRP_Instance(VI_1) Transition to MASTER STATE
Dec 22 15:10:26 erogamescape15 Keepalived_vrrp: VRRP_Instance(VI_1) Entering MASTER STATE
Dec 22 15:10:26 erogamescape15 Keepalived_vrrp: VRRP_Instance(VI_1) setting protocol VIPs.
Dec 22 15:10:26 erogamescape15 Keepalived_vrrp: VRRP_Instance(VI_1) Sending gratuitous ARPs on eth0 for 192.168.0.100
Dec 22 15:10:26 erogamescape15 Keepalived_healthcheckers: Netlink reflector reports IP 192.168.0.100 added
Dec 22 15:10:27 erogamescape15 ntpd[10305]: Listening on interface #8 eth0, 192.168.0.100#123 Enabled
Dec 22 15:10:31 erogamescape15 Keepalived_vrrp: VRRP_Instance(VI_1) Sending gratuitous ARPs on eth0 for 192.168.0.100

開始したのですが…続く