Ganglia Debugging
From athena
It turns out Ganglia is kind of brainless (in a good way.) When we reloaded the head node, we left the compute nodes up and running. These compute nodes reported data to the newly loaded ganglia daemon (gmond). However, their IP addresses weren't in the host file because we didn't insert them in the hosts file (via insert-ethers.)
Fortunately, I learned one, that when you restart gmond it recollects data from the sources. And two, you can set the maximum time a down host is in this host list in ganglia. I reset this from 0 (never) to 86400 seconds (1 day.)
vi /etc/gmond.conf
edited this line:
host_dmax = 86400 /*secs */
Sources:
- http://ganglia.sourceforge.net/docs/ganglia.html (search for How do I remove a host from the list?)