Jekyll2019-11-09T09:46:45+00:00https://huyabbix.com/feed.xmlHuyabbixThings you should have known about ZabbixIlya Ivanovilya@devopsx.com100 problems you will never know about2016-10-21T09:38:10+00:002016-10-21T09:38:10+00:00https://huyabbix.com/100-problems-you-will-never-know-about<p>So I’ve been <a href="https://huyabbix.com/zabbix-unsupported-items-or-why-you-should-upgrade-to-2-2-asap/">ranting</a>, and <a href="https://huyabbix.com/zabbix-unreachable-hosts/">ranting</a>, and <a href="https://github.com/burner1024/zabbix-unsupported-items">ranting</a> about how unreliable Zabbix is, unless you go out of your way to force it to be reliable.</p>
<p>Things did improve with time. In 2.2, finally we could monitor stale items and triggers. Zabbix folks even created an official <a href="https://share.zabbix.com/official-templates/zabbix-components/zabbix-server">template</a> for that. That really was an improvement.</p>
<p>Or so I thought, until re-checking the template. Lo and behold:</p>
<div class="language-xml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nt"><expression></span>{Template App Zabbix Server:zabbix[queue,10m].min(10m)}>100<span class="nt"></expression></span>
<span class="nt"><name></span>More than 100 items having missing data for more than 10 minutes<span class="nt"></name></span>
</code></pre></div></div>
<p>Dammit. Why? Why shooting all new users in the foot? Why there just has to be a pitfall?</p>
<p>That’s <strong>100</strong> items with missing data. 100 problems that a user will not be notified about, unless he edits the template by hand.</p>
<p>So, as usual, the fix: go into <strong>Template_App_Zabbix_Server</strong> configuration, and change trigger to fire on <code class="highlighter-rouge">[queue,10m].min(10m) > 0</code>.</p>Ilya Ivanovilya@devopsx.comSo I’ve been ranting, and ranting, and ranting about how unreliable Zabbix is, unless you go out of your way to force it to be reliable.Zabbix unreachable hosts2014-08-14T09:38:10+00:002014-08-14T09:38:10+00:00https://huyabbix.com/zabbix-unreachable-hosts<p><strong>The problem</strong></p>
<p>Every now and then, a host in your Zabbix system will turn unavailable. It results in notifications like this:</p>
<blockquote>
<p>Trigger: Zabbix agent on myhost is unreachable for 5 minutes
Trigger status: PROBLEM</p>
</blockquote>
<p>And logs like this:</p>
<pre><code class="language-log">1461:20140808:074300.517 Zabbix agent item "vfs.fs.size[/home/deploy/sites/mc/shared/voip,free]" on host "myhost" failed: first network error, wait for 15 seconds
1466:20140808:074345.134 temporarily disabling Zabbix agent checks on host "myhost": host unavailable
</code></pre>
<p>What could be a cause? No, it’s not because a host could not be reached. That would be too easy.</p>
<p>A quote from <a href="https://www.zabbix.com/documentation/2.2/manual/appendix/items/unreachability">Zabbix documentation</a>:</p>
<blockquote>
<p>A host is treated as unreachable after a failed agent check (network error, timeout).
…
After the UnreachablePeriod ends and the host has not reappeared, the host is treated as unavailable.</p>
</blockquote>
<p>Let me decipher this for you: <strong>when a single check fails, the whole host</strong> will be considered “unavailable”, and <strong>will not be monitored</strong> anymore.</p>
<p>An example of such check could be <code class="highlighter-rouge">vfs.fs.size</code> of a network share that has gone stale. You lose all data and all monitoring of the host until you fix that single check.</p>
<p>Bad design. <strong>Bad</strong>. <strong><em>Really bad</em></strong>. (I hope some Zabbix developer will read this)</p>
<p><strong>The solution</strong></p>
<p>There’s none, actually. A workaround is to track down such checks and replace them with more reliable <code class="highlighter-rouge">UserParameter</code>s. In the example with network share something like this could be used instead of <code class="highlighter-rouge">vfs.fs.size</code>:</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">set</span> +m <span class="nt">-o</span> pipefail<span class="p">;</span> <span class="nb">timeout</span> <span class="nt">-s</span> 9 3 <span class="nb">df</span> <span class="nt">-h</span> | <span class="nb">grep</span> /my/mount/point | <span class="nb">awk</span> <span class="s1">'{print $5}'</span> | <span class="nb">grep</span> <span class="s1">'%'</span> | <span class="nb">tr</span> <span class="nt">-d</span> <span class="s1">'%'</span> <span class="o">||</span> <span class="nb">echo </span>100
</code></pre></div></div>
<p>It <strong>shall</strong> output a “Used” value in %’s within 3 seconds (or report full partition, so you’ll get an alert anyway). Not a real solution, but at least something to do.</p>Ilya Ivanovilya@devopsx.comThe problemzabbix_sender failed items debug2014-07-14T09:38:10+00:002014-07-14T09:38:10+00:00https://huyabbix.com/zabbix_sender-failed-items-debug<p>Once upon a time, you need to send a value to Zabbix from a script. Zabbix_sender comes to the rescue:
<code class="highlighter-rouge">zabbix_sender -z "server IP address" -p 10051 -s "host in zabbix" -k "item key" -o "item value"</code></p>
<p>But, what happens if for some reason Zabbix won’t accept the value (say, wrong item type?). It’ll just fail (non-zero), without any error message. Well, let’s try to debug (notice the added <code class="highlighter-rouge">-vv</code>):</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>zabbix_sender <span class="nt">-vv</span> <span class="nt">-z</span> <span class="s2">"server IP address"</span> <span class="nt">-p</span> 10051 <span class="nt">-s</span> <span class="s2">"host in zabbix"</span> <span class="nt">-k</span> <span class="s2">"item key"</span> <span class="nt">-o</span> <span class="s2">"item value"</span>
Info from server: <span class="s2">"Processed 1 Failed 1 Total 1 Seconds spent 0.003079"</span>
sent: 1<span class="p">;</span> skipped: 0<span class="p">;</span> total: 1
</code></pre></div></div>
<p>Oh wow, that’s helpful. I already know it’s failed. I want to know why.</p>
<p>One would assume that if this information isn’t available at zabbix_sender directly, it must be in Zabbix server logs (<code class="highlighter-rouge">/var/log/zabbix/zabbix_server.log</code> ot something like that). <strong>One would be wrong.</strong></p>
<p>Then, one would think that increasing DebugLevel to maximum in zabbix_server.conf and restarting Zabbix server would surely shed some light on the issue. After digging through megabytes of irrelevant information, it would be clear that <strong>one is wrong again.</strong>
In summary, there’s no information available as to why a particular item fails to accept a value.
<strong>Zabbix forces you to scry.</strong></p>
<p>Oh, but wait, let’s make this even more interesting. Say, you need to send multiple items. Fortunately, <code class="highlighter-rouge">zabbix_sender</code> can accept properly formatted files (<code class="highlighter-rouge">$host $item $value</code>) on input:</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>zabbix_sender <span class="nt">-vv</span> <span class="nt">-z</span> <span class="s2">"server IP address"</span> <span class="nt">-p</span> 10051 <span class="nt">-i</span> zabbix.input
Info from server: <span class="s2">"Processed 245 Failed 5 Total 250 Seconds spent 0.003079"</span>
sent: 250<span class="p">;</span> skipped: 0<span class="p">;</span> total: 250
</code></pre></div></div>
<p>Marvellous, isn’t it? WHICH ONES? At this point, one wouldn’t hope to find this information in the logs. And that’s correct, Zabbix doesn’t log it.
The solution is to send values one by one and see zabbix_sender responds. Not too difficult, and can even be automated, but why oh why it can’t be printed to stderr or somewhere in the logs?
Good, transparent, clear logging is an essential component of any system that aims for “enterprise”. Sorry, but Zabbix isn’t there yet.</p>
<p>Just so it wouldn’t be a completely whining post, I’ll list some common issues to check:</p>
<ol>
<li>hostname in <code class="highlighter-rouge">zabbix_agentd.conf</code> and hotstname in Zabbix database are different.</li>
<li>Typo in item key.</li>
<li>Item value/type mismatch.</li>
</ol>Ilya Ivanovilya@devopsx.comOnce upon a time, you need to send a value to Zabbix from a script. Zabbix_sender comes to the rescue: zabbix_sender -z "server IP address" -p 10051 -s "host in zabbix" -k "item key" -o "item value"Zabbix unsupported items (or why you should upgrade to 2.2 ASAP)2014-05-29T09:38:10+00:002014-05-29T09:38:10+00:00https://huyabbix.com/zabbix-unsupported-items-or-why-you-should-upgrade-to-2-2-asap<blockquote>
<ul>
<li>Is my server up?</li>
<li>I don’t know.</li>
</ul>
</blockquote>
<p>Right. Exactly what I need from a monitoring system.</p>
<p>One of the biggest and long-standing shortcomings of Zabbix was its handling of “unsupported” items (the other one is the lack of security, but that’s for another day). “Unsupported” means that if for whatever reason (network failure, unexpected results, just zabbix-agent going haywire, etc) Zabbix server couldn’t retrieve a valid value from the agent. And then the triggers using that that item would switch into “Unknown” state. And there no alert, no state color change in the web interface, NOTHING. Honestly, I don’t know what could a man designing that possibly think.</p>
<p>Fortunately, after only 10 years or so of development and <a href="https://www.google.ru/search?client=safari&rls=en&q=zabbix+unknown+triggers&ie=UTF-8&oe=UTF-8&gfe_rd=cr&ei=rfeGU-fTMYvK8geY0YCYCw#newwindow=1&q=zabbix+unknown+triggers&rls=en">LOTS</a>, <a href="https://www.google.ru/search?client=safari&rls=en&q=zabbix+unknown+triggers&ie=UTF-8&oe=UTF-8&gfe_rd=cr&ei=rfeGU-fTMYvK8geY0YCYCw#newwindow=1&q=zabbix+unknown+triggers&rls=en">LOTS</a> of user input a decision was made that a monitoring system should actually be reliable to be useful. Zabbix 2.2 introduced <a href="https://www.zabbix.com/documentation/2.2/manual/config/notifications/unsupported_item">new internal events</a> that allow to send alerts on unknown triggers and unsupported items. Of course, it couldn’t just have been done properly from the first time, there’s a <a href="https://support.zabbix.com/browse/ZBX-7494">caveat</a>: the unknown triggers will disappear from the dashboard. It doesn’t matter much if you’re the sole Zabbix person – you’ll still get emails and will at least know about the problem. But in case of a big deployment with multiple roles, where not everyone is Zabbix superadmin, it’s a drawback.</p>
<p>To summarize:</p>
<ul>
<li>If you’re on Zabbix 2.2, but don’t monitor unsupported items and unknown triggers yet, <a href="https://www.zabbix.com/documentation/2.2/manual/config/notifications/unsupported_item">configure it now</a>.</li>
<li>If you are on Zabbix 2.0 or earlier, upgrade now and c<a href="https://www.zabbix.com/documentation/2.2/manual/config/notifications/unsupported_item">onfigure the alerts</a>.</li>
<li>If for some reason you’re stuck on earlier version and can’t upgrade, take a look <a href="https://github.com/burner1024/zabbix-unsupported-items">here</a>. It’s a script that gathers the missing info and uses Zabbix to alert.</li>
</ul>Ilya Ivanovilya@devopsx.comIs my server up? I don’t know.Zabbix and non-standard email server port2014-04-24T09:38:10+00:002014-04-24T09:38:10+00:00https://huyabbix.com/zabbix-and-non-standard-email-server-port<p>The most used notification type in Zabbix (and other systems, likely) is email. So, suppose you’ve installed Zabbix server and want to enable the alerts. There’s only one minor issue: your SMTP runs on a non-standard port. For whatever reason. This can’t be a problem, can it?</p>
<p>Wrong!</p>
<p>Meet Zabbix: 15 year of development (9 since first stable release), hundreds of developers, thousands of installations, hundreds of thousands of lines of code and NO OPTION TO CHANGE SMTP PORT.</p>
<p><img src="/assets/facepalm-1024x622.png" alt="Facepalm" /></p>
<p>That’s just beyond good and evil.</p>
<p>Anyway, what are the options? Let’s see:</p>
<ol>
<li>Route mail through local smtp to the remote one (doesn’t help when it’s the local SMTP that is on a different port)</li>
<li>Redirect with iptables (for those on Linux)</li>
<li>Use custom media script</li>
</ol>
<p>Since my Zabbix installations are all Linux-based, the chosen solution is iptables. Supposing that SMTP port is 20025, this is what usually used for port redirection:</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>iptables <span class="nt">-A</span> INPUT <span class="nt">-m</span> state <span class="nt">--state</span> NEW <span class="nt">-p</span> tcp <span class="nt">--dport</span> 20025 <span class="nt">-j</span> ACCEPT
iptables <span class="nt">-t</span> nat <span class="nt">-A</span> PREROUTING <span class="nt">-p</span> tcp <span class="nt">--dport</span> 25 <span class="nt">-j</span> REDIRECT <span class="nt">--to-ports</span> 20025
</code></pre></div></div>
<p>However, if you access the service from localhost, it won’t work, since the packets don’t get into PREROUTING. What you need to add is</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>iptables <span class="nt">-A</span> OUTPUT <span class="nt">-p</span> tcp <span class="nt">-d</span> 127.0.0.1 <span class="nt">--dport</span> 25 <span class="nt">-j</span> REDIRECT <span class="nt">--to-port</span> 20025
</code></pre></div></div>
<p>Now the email server is accesible on port 25 from both inside and outside. And Zabbix is finally able to use it.</p>Ilya Ivanovilya@devopsx.comThe most used notification type in Zabbix (and other systems, likely) is email. So, suppose you’ve installed Zabbix server and want to enable the alerts. There’s only one minor issue: your SMTP runs on a non-standard port. For whatever reason. This can’t be a problem, can it?Cleaning up Zabbix database2014-03-22T09:38:10+00:002014-03-22T09:38:10+00:00https://huyabbix.com/cleaning-up-zabbix-database<p>Zabbix database can grow quite considerably over time. Expecially if you use default templates where many items are checked at interval of 30 and aren’t even used in the triggers.</p>
<p>The biggest database I had to manage so far was about 80Gb. And when I tried to upgrade, it took a day or two to run all the scripts. Of course, having to backup 80Gb of SQL data does not add to my enjoyment as well.</p>
<p>So, there it goes – a few scripts, some stolen copied from another person’s repo, some written by myself: <a href="https://github.com/burner1024/zabbix-sql">github link</a></p>
<p>With that, the database shrinked from 80Gb to 15Gb, quite an improvement. And the upgrade was finished in a few hours.</p>
<p><strong>NOTE</strong>: due to the way the databases work (Mysql in particular), running these scripts won’t reduce Zabbix db size if it’s already bloated. You will have to dump and reload the db after that. What the scripts do is keep its size more or less constant if you run them regularly.</p>Ilya Ivanovilya@devopsx.comZabbix database can grow quite considerably over time. Expecially if you use default templates where many items are checked at interval of 30 and aren’t even used in the triggers.Zabbix housekeeper woes2013-11-09T09:38:10+00:002013-11-09T09:38:10+00:00https://huyabbix.com/zabbix-housekeeper-woes<p>Every item in Zabbix is configured to store its history and trends for some individual period. The process that cleans outdated data is Housekeeper. Although if you monitor at least a hundred of hosts, you probably know about it already. From been bitten by its performance. It housekeeps and housekeeps and housekeeps, wasting CPU cycles and increasing the entropy of the Universe.</p>
<p>Really, it’s not that much of a task – look up item’s storage intervals, delete everything too old, repeat. Ideally, a user doesn’t even need to know about its existence. Why is it that google search yields 15000 results for “zabbix housekeeper”, then? It’s beyond my understanding. Housekeeper is a certified resource hog. An utter and complete failure of the developers.</p>
<p>Fortunately, it’s possible to disable it. But then we need to clean up old data somehow. Mostly people suggest database partitioning at this point. See <a href="https://www.zabbix.com/wiki/howto/db/postgres/partition">Zabbix wiki</a> for an example. If you run a really large environment (say, a thousand hosts), this might be your only choice. Looks a bit scary? Well, if you have just a few dozen or a few hundred hosts, probably you’ll be able to get away with a few SQL DROP queries. Of course, choosing that way, you lose the ability to control individual item history retain intervals. It’s worth it.</p>
<p>So let’s say we want to keep items history, alerts, acknowledges, events for a week and keep the trends for 3 months. Here’s how to do it in Postgresql:</p>
<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">delete</span> <span class="k">FROM</span> <span class="n">alerts</span> <span class="k">where</span> <span class="n">age</span><span class="p">(</span><span class="n">to_timestamp</span><span class="p">(</span><span class="n">alerts</span><span class="p">.</span><span class="n">clock</span><span class="p">))</span> <span class="o">></span> <span class="n">interval</span> <span class="s1">'7 days'</span><span class="p">;</span>
<span class="k">delete</span> <span class="k">FROM</span> <span class="n">acknowledges</span> <span class="k">where</span> <span class="n">age</span><span class="p">(</span><span class="n">to_timestamp</span><span class="p">(</span><span class="n">acknowledges</span><span class="p">.</span><span class="n">clock</span><span class="p">))</span> <span class="o">></span> <span class="n">interval</span> <span class="s1">'7 days'</span><span class="p">;</span>
<span class="k">delete</span> <span class="k">FROM</span> <span class="n">events</span> <span class="k">where</span> <span class="n">age</span><span class="p">(</span><span class="n">to_timestamp</span><span class="p">(</span><span class="n">events</span><span class="p">.</span><span class="n">clock</span><span class="p">))</span> <span class="o">></span> <span class="n">interval</span> <span class="s1">'7 days'</span><span class="p">;</span>
<span class="k">delete</span> <span class="k">FROM</span> <span class="n">history</span> <span class="k">where</span> <span class="n">age</span><span class="p">(</span><span class="n">to_timestamp</span><span class="p">(</span><span class="n">history</span><span class="p">.</span><span class="n">clock</span><span class="p">))</span> <span class="o">></span> <span class="n">interval</span> <span class="s1">'7 days'</span><span class="p">;</span>
<span class="k">delete</span> <span class="k">FROM</span> <span class="n">history_uint</span> <span class="k">where</span> <span class="n">age</span><span class="p">(</span><span class="n">to_timestamp</span><span class="p">(</span><span class="n">history_uint</span><span class="p">.</span><span class="n">clock</span><span class="p">))</span> <span class="o">></span> <span class="n">interval</span> <span class="s1">'7 days'</span><span class="p">;</span>
<span class="k">delete</span> <span class="k">FROM</span> <span class="n">history_str</span> <span class="k">where</span> <span class="n">age</span><span class="p">(</span><span class="n">to_timestamp</span><span class="p">(</span><span class="n">history_str</span><span class="p">.</span><span class="n">clock</span><span class="p">))</span> <span class="o">></span> <span class="n">interval</span> <span class="s1">'7 days'</span><span class="p">;</span>
<span class="k">delete</span> <span class="k">FROM</span> <span class="n">history_text</span> <span class="k">where</span> <span class="n">age</span><span class="p">(</span><span class="n">to_timestamp</span><span class="p">(</span><span class="n">history_text</span><span class="p">.</span><span class="n">clock</span><span class="p">))</span> <span class="o">></span> <span class="n">interval</span> <span class="s1">'7 days'</span><span class="p">;</span>
<span class="k">delete</span> <span class="k">FROM</span> <span class="n">history_log</span> <span class="k">where</span> <span class="n">age</span><span class="p">(</span><span class="n">to_timestamp</span><span class="p">(</span><span class="n">history_log</span><span class="p">.</span><span class="n">clock</span><span class="p">))</span> <span class="o">></span> <span class="n">interval</span> <span class="s1">'7 days'</span><span class="p">;</span>
<span class="k">delete</span> <span class="k">FROM</span> <span class="n">trends</span> <span class="k">where</span> <span class="n">age</span><span class="p">(</span><span class="n">to_timestamp</span><span class="p">(</span><span class="n">trends</span><span class="p">.</span><span class="n">clock</span><span class="p">))</span> <span class="o">></span> <span class="n">interval</span> <span class="s1">'90 days'</span><span class="p">;</span>
<span class="k">delete</span> <span class="k">FROM</span> <span class="n">trends_uint</span> <span class="k">where</span> <span class="n">age</span><span class="p">(</span><span class="n">to_timestamp</span><span class="p">(</span><span class="n">trends_uint</span><span class="p">.</span><span class="n">clock</span><span class="p">))</span> <span class="o">></span> <span class="n">interval</span> <span class="s1">'90 days'</span><span class="p">;</span>
</code></pre></div></div>
<p>That’s it. Be happy and keep your Housekeeper disabled.</p>
<p>Update 2014.05.20: proceed to the <a href="https://github.com/burner1024/zabbix-sql">github repo</a> for Mysql version and other goodies.</p>Ilya Ivanovilya@devopsx.comEvery item in Zabbix is configured to store its history and trends for some individual period. The process that cleans outdated data is Housekeeper. Although if you monitor at least a hundred of hosts, you probably know about it already. From been bitten by its performance. It housekeeps and housekeeps and housekeeps, wasting CPU cycles and increasing the entropy of the Universe.