# Monitoring

By default, all events sent to the bus are routed to the Rsb-Service-E2emon service. This is how the bus counts them and logs the last 100 events emitted by each node for debugging purposes.

You can visualise the number of events emitted by nodes over the last 30 days and be notified on Slack when thresholds are reached or when events fail to reach one or more destinations. The bus is also able to monitor custom metrics sent to the bus for this purpose.

Both the number of events emitted and the events themselves can be sent to your Datadog account for extended retention time and functionalities.

# Number of events emitted

# Bus admin

Select a node that emits events in the list of nodes, then click on the menu Events and metrics. The line chart shows the number of events sent per event name over up to 30 days.

To create monitoring rules, click on the button Edit Monitoring Rules above the chart. You can set absolute bounds and/or relative bounds. You will be alerted on Slack if the number of events sent in a day is below or above the threshold. Relative bound rules compare the number of events sent yesterday and the day before.

Alerts are sent once a day at 9:00 AM GMT +2 hours, to the Slack channel of your choice. To define the Slack workspace and channel, go to the Event Destinations page then click on Rsb-Service-E2emon. In the Configuration data section, in the Data appended to events code area, insert the Slack API token and channel name as per the example below:

{
  "monitoring": {
    "credentials": {
      "slack_channel": "#slack-channel-name",
      "slack_token": "..."
    },
    "rules": {...}
    }
  }
}

# Datadog integration

To send the number of events emitted to your Datadog account, access the Event Destinations page then click on Rsb-Service-E2emon. In the Configuration data section, in the Data appended to events code area, insert your Datadog credentials as per the example below. Tags are optional.

{
  "datadog": {
    "credentials": {
      "api_key": "examplekey",
      "app_key": "examplekey"
    },
    "tags": {
        "optional_tag_key_1": "tag_value_1",
        "optional_tag_key_2": "tag_value_2",
    }
  }
}

To generate Datadog API and application keys, you can follow Datadog documentation (opens new window).

By default, the node id and event name are sent. You can define additional tags as per the example above.

Note: If you are monitoring your events using both the bus admin and Datadog, the custom configurations code area should include both the datadog and monitoring objects.

# Event logs

# Bus admin

The last 100 events emitted by a node are visible in the bus admin for debugging purposes.

Select a node that emits events in the list of nodes, then click on the menu Events and metrics. In the Recent events emitted section, you can see the last 100 events as they were received by the Rsb-Service-E2emon service.

# Datadog integration

To log events in your Datadog account, access the Event Destinations page.

  1. Add Rsb-Service-Event-Logging as an event destination, along with the name of the events you wish to log. You can use the wildcard * to log all events.

  2. In the Configuration data section, in the Stored data code area, insert your Datadog credentials as per the example below. tags and alert_type are optional.

"datadog" : {
    "credentials": {
        "api_key": "examplekey",
        "app_key": "examplekey"
    },
    "tags": {
        "optional_tag_key_1": "tag_value_1",
        "optional_tag_key_2": "tag_value_2",
    },
    "alert_type": "info"
}

To generate Datadog API and application keys, you can follow Datadog documentation (opens new window).

The service uses the Datadog Events API (opens new window). The API is mapped to the event and config fields as follows:

datadog field source key
alert_type config alert_type
tags config tags
aggregation_key event from
date_happened event created_at
text event .
title event event

The text field contains the body of the event.

# Failed events

Events permanently rejected by one or more receiving nodes are stored in the sending node's failed queue.
The Failed Events page gives an overview of the nodes they failed to reach and why. You can view the failed events' details and requeue them individually.
On the Events and Metrics page, in the Queues section, you can see the number of events in the failed queue, requeue all events or purge the queue.

To be notified when an event fails to reach one or more destinations permanently, go to the Event Destinations page then click on Rsb-Service-E2emon. In the Configuration data section, in the Data appended to events code area (which may already contain information related to your monitoring alerts), set the "send_failed_event_notifications" flag to true and ensure that a Slack channel and token are defined as per the example below:

{
  "monitoring": {
    "credentials": {
      "slack_channel": "#slack-channel-name",
      "slack_token": "..."
    },
    "send_failed_event_notifications": true
  }
}

# Custom metrics

# Send metrics

To monitor custom metrics, send a MetricEvent with one or more metrics in the payload, at least once a day, following the example below:

{
    "events": ["MetricEvent"],
    "from": "04f9f5d9-d84c-4636-acae-2067eee4d81f",
    "reference": "123456",
    "created_at": "2021-06-25T18:25:43.511Z",
    "payload": [
        {
            "name": "SmsServiceCost",
            "venture": "04f9f5d9-d84c-4636-acae-2067eee4d81f",
            "value": 102.50,
            "type": "COUNT",
            "timestamp": "2021-06-25T18:25:43.511Z",
            "range": "",
            "tags": ["sms"]
        },
        {
            "name": "Revenue",
            "venture": "04f9f5d9-d84c-4636-acae-2067eee4d81f",
            "value": 1499.99,
            "type": "COUNT",
            "timestamp": "2021-06-25T18:25:43.511Z",
            "range": "",
            "tags": ["aws"]
        }
    ],
    "version": "2.0.0"
}

At the moment, COUNT is the only type available. If you send one metric more than once a day, the values will be summed. Also, the range and tags are not processed yet.

A trigger in the global configuration takes care of routing those events to the end-to-end monitoring service and the name of the metrics can be used to create rules.

# Set up alert thresholds for metrics

Go to the Event Destinations page then click on Rsb-Service-E2emon. In the Configuration data section, in the Data appended to events code area, insert Slack credentials and rules tailored to your metrics as per the example below:

{
  "monitoring": {
    "credentials": {
      "slack_channel": "#slack-channel-name",
      "slack_token": "..."
    },
    "rules": {
      "SmsServiceCost": {
        "absolute_min": 10,
        "absolute_max": 150,
        "lower_bound": 0.5,
        "upper_bound": 2
      },
      "Revenue": {
        "absolute_min": 500,
        "absolute_max": 2500,
        "lower_bound": 0.5,
        "upper_bound": 2
      }
    }
  }
}

In the credentials object, indicate the Slack channel where the alert notifications must be sent as well as the Slack token for the corresponding Slack workspace.

In the rules object, add rules for your metrics. The rules consist of absolute bounds for one day and relative bounds for one day compared to the previous day. You need to set both absolute and relative bounds. The relative bounds are a percentage in decimal (e.g. 0.5 = 50%), and the absolutes are in decimal as well.

In the example, you will be alerted in the cases below:

  • The SmsServiceCost yesterday was less than 10 or more than 150,
  • The SmsServiceCost yesterday was less than half or more than double the SmsServiceCost the day before,
  • The Revenue yesterday was less than 500 or more than 2500,
  • The Revenue yesterday was less than half or more than double the Revenue the day before.