News
Apr 16

Fail Fast, Succeed Faster

Failing of automated IT processes is unfortunately inevitable. Failing VPN tunnelns, servers in planned or urgency  maintenance, timeouts because of heavy traffic, changed APIs and so on.

But failing is not the problem – the problem at all is not finding such failures or finding them very late!

Therefore Lomnido has a lot of concepts how to handle failures, expected failures and unexpected failures. Failures on remote systems or failures in the Lomnido Workflows.

Here are some of the key features.

Integrate into your enterprise monitoring

Lomnido is not a black box. Of course, you can integrate the Lomnido appliance into your monitoring. No matter if Nagios, Opsview, Zabbix, ……

Everything that is described in this article is possible to be monitor from an external monitoring tool. Connector Status, Alerthandler, … everything is accessible with rest webservices or SNMP Traps.

Connector Status

A Connector in a Lomnido Workflow is the technical part that fetches or pushes a message from/to a connected system. Lomnido has HTTPS, (S)FTP, SQL, File Share, …. and a lot of more Connectors.

Running or Stopped

A connector can be running or be stopped. Running means that the Connector is waiting for messages to be pushed or waiting for the next pull request to be executed. A connector can be started or stopped manually or is stopped due to errors.

Status OK/WARNING/CRITICAL

A connection also has the status OK, WARNING or CRITICAL. This status is set when a pull request fails or an push message fails. This is what the setting for an Error Handler of a Connector looks like:

So, after two unsuccessful attempts the Connector changes to status WARNING, after five to Status CRITICAL and if the Connector changes to the status CRITICAL, it stops.

Message Queueing

Each Outbound Connector has a FIFO queue to handle messages. If a remote system is not available or does not respond in a defined manner, all messages to this system are queued.

In combination with the described mechanism of Connector Status gives you a very fast feedback if something is wrong.

Failing messages in the Workflow

When a message runs through a Lomnido Workflow it can fail for different reasons. Unexpected, missing or wrong input data. Or just implementation errors. Failed messages can be identified in the Message List very fast. An failed messages has the flags VALID = FALSE and ACKNOWLEDGED = FALSE

The flag ACKNOWLEDGED is for operators who can set this flag to “true”. We call this action “acknowledge a message”, which means that an Operator saw this error and (hopefully)  took some action to prevent this in the future.

Alerthandler

An alerthandler is an object that enables you to get informed when something happens. Therefore, your can configure a trigger for different events. There are currently 21 trigger options available and we will not go into details about all triggers here. Let’s focus on the most important ones.

As you can see in the screenshot of the Alerthandler configuration page, you can configure finely granulated in which environment and in which scope this Alerthandler should react. The scope depends on the trigger type and can be a name of a Connector (wildcards allowed).

Connector stopped
Raise an alarm when a connector is stopped.

Connector WARNING
When a connector enters the WARNING status.

Connector CRITICAL
When a connector enters the CRITICAL status.

Unacknowledged Messages
If more than X (to be configured) messages having the flag acknowledged=false

Message Count
If a Connector holds more then X (to be configured) messages in the FIFO queue. This could have different kind of reasons – but being informed about such a problem is the first step of finding a solution.

Alerthandler for Connected Systems

Lomnido can not only monitor itself and messages coming into the workflow, it can also monitor connected systems. Therefore we have also triggers:

LAST OK
If you know that messages are normally sent to Lomnido in intervals, you can define an Alerthandler that raises an alert if the last valid message is older than XX minutes. Here you can also define service times (e.g. Mo-Fr 08-16) so that no false positives are raised on the weekend or on public holidays.

So, if the remote system does not have a good working error mechanism, this is a good alternative to find problems in the network or on the remote client.

PORT OPEN
This is a check if a port on a remote server is accessible. Imagine that you have a ticket bridge to a remote server and have to fulfill SLA times. If you find that the remote server is not available when you have to transfer a ticket, it is already too late. So, check the remote server on a regularly base to find errors already in advance.
Sure, you can add such a check also to your monitoring system. But often the Lomnido appliance has access to network segments that your internal monitoring server does not have, since only the Lomnido appliance is allowed to connect. In such a case, this check can be very useful.

Summary

Lomnido is a one in all solution for connecting and synchronizing systems and also to get informed when something works different than expected. Connect Lomnido to your enterprise monitoring system and alert your operations team with your well known mechanisms, too.

Interested in a live demo? Do you have an special usecase to talk about? Get in touch!