Author Archives: admin

100 problems you will never know about

So I’ve been ranting, and ranting, and ranting about how unreliable Zabbix is, unless you go out of your way to force it to be reliable.

Things did improve with time. In 2.2, finally we could monitor stale items and triggers. Zabbix folks even created an official template for that. That really was an improvement.

Or so I thought, until re-checking the template. Lo and behold:

<expression>{Template App Zabbix Server:zabbix[queue,10m].min(10m)}>100</expression>
<name>More than 100 items having missing data for more than 10 minutes</name>

Dammit. Why? Why shooting all new users in the foot? Why there just has to be a pitfall?

That’s 100 items with missing data. 100 problems that a user will not be notified about, unless he edits the template by hand.

So, as usual, the fix: go into Template_App_Zabbix_Server configuration, and change trigger to fire on “[queue,10m].min(10m) > 0”

Zabbix unreachable hosts

The problem

Every now and then, a host in your Zabbix system will turn unavailable. It results in notifications like this:

Trigger: Zabbix agent on myhost is unreachable for 5 minutes
Trigger status: PROBLEM

And logs like this:

1461:20140808:074300.517 Zabbix agent item "vfs.fs.size[/home/deploy/sites/mc/shared/voip,free]" on host "myhost" failed: first network error, wait for 15 seconds
1466:20140808:074345.134 temporarily disabling Zabbix agent checks on host "myhost": host unavailable

What could be a cause? No, it’s not because a host could not be reached. That would be too easy.

Continue reading

zabbix_sender failed items debug

Once upon a time, you need to send a value to Zabbix from a script. Zabbix_sender comes to the rescue:
zabbix_sender -z "server IP address" -p 10051 -s "host in zabbix" -k "item key" -o "item value"

But, what happens if for some reason Zabbix won’t accept the value (say, wrong item type?). It’ll just fail (non-zero), without any error message. Well, let’s try to debug (notice the added “-vv”):

zabbix_sender -vv -z "server IP address" -p 10051 -s "host in zabbix" -k "item key" -o "item value"
Info from server: "Processed 1 Failed 1 Total 1 Seconds spent 0.003079"
sent: 1; skipped: 0; total: 1

Oh wow, that’s helpful. I already know it’s failed. I want to know why.

Continue reading

Cleaning up Zabbix database

Zabbix database can grow quite considerably over time. Expecially if you use default templates where many items are checked at interval of 30 and aren’t even used in the triggers.

The biggest database I had to manage so far was about 80Gb. And when I tried to upgrade, it took a day or two to run all the scripts. Of course, having to backup 80Gb of SQL data does not add to my enjoyment as well.

So, there it goes – a few scripts, some stolen copied from another person’s repo, some written by myself: github link

With that, the database shrinked from 80Gb to 15Gb, quite an improvement. And the upgrade was finished in a few hours.

NOTE: due to the way the databases work (Mysql in particular), running these scripts won’t reduce Zabbix db size if it’s already bloated. You will have to dump and reload the db after that. What the scripts do is keep its size more or less constant if you run them regularly.

Zabbix housekeeper woes

Every item in Zabbix is configured to store its history and trends for some individual period. The process that cleans outdated data is Housekeeper. Although if you monitor at least a hundred of hosts, you probably know about it already. From been bitten by its performance. It housekeeps and housekeeps and housekeeps, wasting CPU cycles and increasing the entropy of the Universe.
Continue reading