Ganglia Debugging

From athena

It turns out Ganglia is kind of brainless (in a good way.) When we reloaded the head node, we left the compute nodes up and running. These compute nodes reported data to the newly loaded ganglia daemon (gmond). However, their IP addresses weren't in the host file because we didn't insert them in the hosts file (via insert-ethers.)

Fortunately, I learned one, that when you restart gmond it recollects data from the sources. And two, you can set the maximum time a down host is in this host list in ganglia. I reset this from 0 (never) to 86400 seconds (1 day.)

 vi /etc/gmond.conf

edited this line:

 host_dmax = 86400 /*secs */

Sources: