Categories

[Linux] Alarm remote agent with telegram

You are here:
  • Main
  • Linux
  • [Linux] Alarm remote agent with telegram
< All Topics

Hello everybody,

today we’re going to talk in which manner you can implement some useful agents for monitoring the status of you machines and receive alert through telegram.

Let’s start!

Machine alive

First of all, is the machine alive? We’re going to use ping for discovering it:

function check_alive(){
        ping -c 1 ${machine}
        alive=$?
}

If the ping return a code different from 0, it means that the machine is unreachable.

Monitor Filesystem

One of the most important things to monitor is the status of filesystem, so let’s see how we can implement it:

function check_fs(){
        ret_fs=0
        fs_info=$(ssh -n ${user}@${machine} "df -Ph" | awk "0 + \$5 >= ${busy_fs} { print }")
        echo "fs_info: "$fs_info
        if [[ ! -z ${fs_info} ]]
        then
                ret_fs=1
        fi
}

In this function you get the status of the remote filesystems, printing all of them that are occupied more than ${busy_fs}, that is a variable that we’re going to pass to the function in the script. Then we set the variable ret_fs to 0 or 1 as the alert state, 0 is ok, 1 is alarm.

Monitor CPU

Another thing that you want to monitor is the cpu, a near 100% cpu activity is bad for a lot of things, heating and slowness of the machine, for example. So, we’re going to monitor the cpu in this way:

function check_cpu(){
        ret_cpu=0
        cpu_info=$(ssh -n ${user}@${machine} "vmstat | tail -1")
        free_cpu=$(echo ${cpu_info} | awk '{print $15}')
        if [[ ${free_cpu_perc} -gt ${free_cpu} ]]
        then
                ret_cpu=1
        fi
}

The concept of the script is really similar to the previous one, so we’re going to “return” 0 or 1 if the cpu is under or upper our threshold.

Monitor RAM

The last thing that usually you want to monitor is the ram, here you can find the function:

function check_ram(){
        ret_ram=0
        mem_info=$(ssh -n ${user}@${machine} "cat /proc/meminfo | egrep 'MemTotal|MemAvailable'")
        total_mem=$(echo ${mem_info} | awk '{print $2}')
        free_mem=$(echo ${mem_info} | awk '{print $5}')
        free_perc=$((100*free_mem/total_mem))
        if [[ ${free_ram_perc} -gt ${free_perc} ]]
        then
                ret_ram=1
        fi
}

The concept of the script is really similar to the previous one, so we’re going to “return” 0 or 1 if the ram is under or upper our threshold.

Create alert and send it

So, now, we want to send an alert report so we can fix the errors. We’re defining two functions, one that creates the report and one that sends it. Check the first one:

function report(){
        if [[ ${ret_ram} -eq 1 ]]
        then
                echo "Free ram percentage of ${machine} is: " ${free_perc} >>${basedir}/log.txt
        fi
        if [[ ${ret_cpu} -eq 1 ]]
        then
                echo "Free cpu percentage of ${machine} is: " ${free_cpu} >>${basedir}/log.txt
        fi
        if [[ ${ret_fs} -eq 1 ]]
        then
                echo "Full filesystems of ${machine} are: " ${fs_info} >>${basedir}/log.txt
        fi
}

In this function you’re going to use all the previous return codes to create the log to send through telegram:

function send_alert(){
        if [[ -s ${basedir}/log.txt ]]
        then
		token=$(cat ${basedir}/token.txt)
                message=$(cat ${basedir}/log.txt)
                url="https://api.telegram.org/bot${token}/sendMessage"
                cat ${basedir}/dist_list.txt | while read id
                do
                        curl -s -X POST ${url} -d chat_id=${id} -d text=${message}
                done
        fi
}

In this function you get the report created by the previous functions and send it through a telegram bot.

Variable parser

Ok, we have defined a lot of functions, but we need to pass a lot of variables to them, so this is the parser of the variables:

function get_variables(){
        basedir=$(dirname $0)
        machine=$(echo $line | awk '{print $1}')
        user=$(echo $line | awk '{print $2}')
        free_ram_perc=$(echo $line | awk '{print $3}')
        free_cpu_perc=$(echo $line | awk '{print $4}')
        free_fs=$(echo $line | awk '{print $5}')
        busy_fs=$((100 - ${free_fs}))
}

In this parser, you’re going to read every line and parsing all the given parameters. With this function you check that all variables are meaningful:

function check_variables(){
        check_var=0
        if [[ ${free_ram_perc} -lt 0 ]] || [[ ${free_ram_perc} -gt 100 ]]
        then
                echo "Ram percentage value is not valid"
                check_var=1
        fi
        if [[ ${free_cpu_perc} -lt 0 ]] || [[ ${free_cpu_perc} -gt 100 ]]
        then
                echo "Cpu percentage value is not valid"
                check_var=1
        fi
        if [[ ${busy_fs} -lt 0 ]] || [[ ${busy_fs} -gt 100 ]]
        then
                echo "Free fs value is not valid"
                check_var=1
        fi
}

Puttin all together: Main

Now we can define the main, putting all the functions toghether:

cat ${basedi}r/hostname.txt | while read line
do
        echo $line
        get_variables
        check_variables
        if [[ ${check_var} -eq 0 ]]
        then
                >${basedir}/log.txt
                check_alive
                echo $alive
                if [[ ${alive} -eq 0 ]]
                then
                        check_ram
                        check_cpu
                        check_fs
                else
                        echo "Machine ${machine} doesn't respond to ping" >${basedir}/log.txt
                fi
                report
                send_alert
        fi
        echo "check of the machine ${machine} terminated"
done

Configuration file

Now, we need to create a configuration file:

$ vi hostname.txt
host1 Vito 5 5 20
host2 Vito 10 10 50
$ vi token.txt
aaaabbbbbcccc
$ vi dist_list.txt
12334
45566

So, in the first file we’re going to write all the machines that we need to monitor, with this syntax: machine user free_ram% free_cpu% free_fs%.

In the second file, we’re going to put the token_id of the bot telegram. For generating the token id, follow the official documentation.

In the third file, we’re going to write all the telegram ids of the operator that wants to be alerted. You can check this through @userinfobot:

SSH Passwordless

Lastly, you need to create a passwordless ssh connection from the monitoring machine to all the machines that need to be monitorated. Just follow this.

Git Project

I’ve published the project of the bot here: https://github.com/crujiff/alarmbot

Feel free to share or improve it.

Thanks for the reading, I hope that you’ll find this useful!

Regards

Table of Contents