BecklerNET Health Checks can monitor your cron jobs and notify you when they don't run at
expected times. Assuming curl
or wget
is available, you will not need to install
any new software on your servers.
The principle of operation is simple: your cron job sends an HTTP request ("ping") to BecklerNET Health Checks every time it completes. When BecklerNET Health Checks does not receive the HTTP request at the expected time, it notifies you. This monitoring technique, sometimes called "heartbeat monitoring", is a type of dead man's switch. It can detect various failure modes:
Let's take a look at an example cron job:
# run backup.sh at 06:08 every day
8 6 * * * /home/me/backup.sh
To monitor it, first create a new Check in your BecklerNET Health Checks account:
After creating the check, copy the generated ping URL , and update the job's definition:
# run backup.sh, then send a success signal to BecklerNET Health Checks
8 6 * * * /home/me/backup.sh && curl -fsS -m 10 --retry 5 -o /dev/null https://healthchecks.charleshouse.familyds.net/ping/your-uuid-here
The extra curl call lets BecklerNET Health Checks know the cron job has run successfully. BecklerNET Health Checks keeps track of the received pings and notifies you as soon as a ping does not arrive on time.
Note: you can alternatively add the extra curl
call as a final line inside the
/home/me/backup.sh
script to keep the cron job's definition clean and short.
You can use an HTTP client other than curl to send the HTTP request.
The extra options in the above example tell curl to retry failed HTTP requests, limit the maximum execution time, and silence output unless there is an error. Feel free to adjust the curl options to suit your needs.
/home/me/backup.sh
exits with an exit code 0.Grace Time is the amount of extra time to wait when a cron job is running late before declaring it as down. Set Grace Time to be above the expected duration of your cron job.
For example, let's say the cron job starts at 14:00 every day and takes between 15 and 25 minutes to complete. The grace time is set to 30 minutes. In this scenario, BecklerNET Health Checks will expect a ping to arrive at 14:00 but will not send any alerts yet. If there is no ping by 14:30, it will declare the job failed and send alerts.
BecklerNET Health Checks has integrations to deliver notifications over different channels: email, webhooks, SMS, chat messages, incident management systems, and more. You can and should set up multiple ways to get notified about job failures:
Additionally, to make sure no issues "slip through the cracks", in the Account Settings › Email Reports page you can configure BecklerNET Health Checks to send repeated email notifications every hour or every day as long as any of the jobs is down:
Classic cron implementations have a built-in method of notifying about cron job failures, the MAILTO variable:
MAILTO=email@example.org
8 6 * * * /home/me/backup.sh
So why not just use that? There are several drawbacks:
If your cron job consistently pings BecklerNET Health Checks an hour early or an hour late, the likely cause is a timezone mismatch: your machine may be using a timezone different from what you have configured on BecklerNET Health Checks.
On modern GNU/Linux systems, you can look up the time zone using the
timedatectl status
command and looking for "Time zone" in its output:
$ timedatectl status
Local time: C 2020-01-23 12:35:50 EET
Universal time: C 2020-01-23 10:35:50 UTC
RTC time: C 2020-01-23 10:35:50
Time zone: Europe/Riga (EET, +0200)
System clock synchronized: yes
NTP service: active
RTC in local TZ: no
journalctl
On a systemd-based system, you can use the journalctl
utility to see system logs,
including logs from the cron daemon.
To see live logs:
journalctl -f
To see the logs from e.g. the last hour, and only from the cron daemon:
journalctl --since "1 hour ago" -t CRON