# Monitoring

You can visualise the number of events emitted and received by nodes over the last 30 days. You can also set up Slack notifications for when certain thresholds are reached and receive notifications when events fail.

Both the number of events emitted and the events themselves can be sent to your Datadog account for extended retention time and functionalities.

# Number of events emitted or received

# Bus admin

To view the number of events emitted or received by a node over the past 30 days, select the node from the list, then click on the Events and Metrics menu.

To receive Slack notifications for abnormally low or excessively high numbers of events emitted or received, first go to the Details and Settings page to turn notifications on and provide your Slack details. You can then go to the Events and Metrics and click on the button Edit Monitoring Rules above the chart to set the thresholds.

Alerts are sent once a day at 9:00 AM GMT +2 hours.

# Datadog integration

# Setup

To send the number of events emitted as well as other metrics to your Datadog account, access the Event Destinations page then add Rsb-Service-E2emon as a destination. In the Configuration data section, in the Data appended to events code area, insert your Datadog credentials as per the example below. Tags are optional.

{
  "datadog": {
    "credentials": {
      "api_key": "examplekey",
      "app_key": "examplekey"
    },
    "tags": {
        "optional_tag_key_1": "tag_value_1",
        "optional_tag_key_2": "tag_value_2",
    }
  }
}

To generate Datadog API and application keys, you can follow Datadog documentation (opens new window).

# Metrics

Here is the list of metrics that you will receive in Datadog:

Success metrics:

  • rsb.event.publish_count: number of events emitted by the node and received by the bus.
  • rsb.event.route_count: number of events routed to receiving nodes*.
  • rsb.event.dispatch_count: number of events received by nodes*, i.e. the nodes responded.
  • rsb.event.acknowledged_count: number of events acknowledged by receiving nodes successfully*, i.e. the response code is 2xx.

Error metrics:

  • rsb.event.publish_error_count: number of events emitted and rejected by the bus.
  • rsb.event.dispatch_error_count: number of events routed but not received by nodes*, i.e. the nodes did not respond.
  • rsb.event.reject_count: number of events rejected by receiving nodes*, i.e. the response code is not 2xx.

*one event (not) routed, received or acknowledged by 3 nodes counts as 3.

# Filters

In Datadog, you can filter using the from textbox on:

  • rsb_env: the bus environment, production or staging
  • emitter_id: the emitting node id
  • receiver_id: the receiving node id available on dispatch and acknowledged events
  • event_name
  • any tag (see setup)

# Event logs

# Bus admin

The last 100 events emitted and received are visible in the bus admin on the Events and Metrics page for debugging purposes.

# Datadog integration

To log events in your Datadog account, access the Event Destinations page.

  1. Add Rsb-Service-Event-Logging as an event destination, along with the name of the events you wish to log. You can use the wildcard * to log all events.

  2. In the Configuration data section, in the Stored data code area, insert your Datadog credentials as per the example below. tags and alert_type are optional.

{
    "datadog" : {
        "credentials": {
            "api_key": "examplekey",
                "app_key": "examplekey"
        },
        "tags": {
            "optional_tag_key_1": "tag_value_1",
                "optional_tag_key_2": "tag_value_2",
        },
        "alert_type": "info"
    }
}

To generate Datadog API and application keys, you can follow Datadog documentation (opens new window).

The service uses the Datadog Events API (opens new window). The API is mapped to the event and config fields as follows:

datadog field source key
alert_type config alert_type
tags config tags
aggregation_key event from
date_happened event created_at
text event .
title event event

The text field contains the body of the event.

# Failed events

The Failed Events page provides visibility into the events that failed to reach one or more destinations in the case of an emitting node, and the events they rejected in the case of a receiving node.

Failed events can be viewed and requeued individually or in batches from the overview section.

To be notified on Slack when an event fails, go to the Details and Settings page, turn notifications on, and provide Slack details.

# Custom metrics

To monitor custom metrics, send a MetricEvent with one or more metrics in the payload, at least once a day, following the example below:

{
    "events": ["MetricEvent"],
    "from": "04f9f5d9-d84c-4636-acae-2067eee4d81f",
    "reference": "123456",
    "created_at": "2021-06-25T18:25:43.511Z",
    "payload": [
        {
            "name": "SmsServiceCost",
            "venture": "04f9f5d9-d84c-4636-acae-2067eee4d81f",
            "value": 102.50,
            "type": "COUNT",
            "timestamp": "2021-06-25T18:25:43.511Z",
            "range": "",
            "tags": ["sms"]
        },
        {
            "name": "Revenue",
            "venture": "04f9f5d9-d84c-4636-acae-2067eee4d81f",
            "value": 1499.99,
            "type": "COUNT",
            "timestamp": "2021-06-25T18:25:43.511Z",
            "range": "",
            "tags": ["aws"]
        }
    ],
    "version": "2.0.0"
}

At the moment, COUNT is the only type available. If you send one metric more than once a day, the values will be summed. Also, the range and tags are not processed yet.

A trigger in the global configuration takes care of routing those events to the end-to-end monitoring service and the name of the metrics can be used to create rules.