10th Regional CMG Conference in Mumbai – Session 2

Statistical Data Analysis 101 for IT Professionals by Dr. Rajesh Mansharamani

Performance Monitoring using Prometheus by Amol Khanapurkar

ATA: Architecture-based Technology Advisor for Functional Application Domains by Dr. Shruti Kunde

Performance Monitoring using Prometheus by Amol Khanapurkar

In the second presentation, Amol Khanapurkar, explained how Prometheus is one of the best engineered tools to monitor tens of thousands of parameters of the system very short intervals.

The older generation of monitoring tools like Nagios, Zabbix and others were good enough to monitor the applications which did not require to address the scales that the modern applications have to cater to in the Internet world. Prometheus is a monitoring tool open sourced by Google based on their experience with monitoring the applications in their data centers. The key goal of Prometheus is ability scale and ease of querying.

Some key aspects of Prometheus
1. It does not rely on distributed storage
2. It works in a Pull mode by default rather than by Push. Push mode is supported through an intermediate gateway.
3. It stores the data as a Time Series data.
4. It has a flexible querying language, PromQL, which enables easy querying of the metrics.
5. It can be connected to different graphing tools like Grafana for dashboarding
6. It can dynamically determing targets via service discovery
7. It has been written in Go.

A few features Prometheus does not support are
1. Raw Event/Log Collection
2. Request Tracing
3. Anamoly Detection
4. Durable Long Term Storage
5. Automatic Horizontal Scaling
6. User/Authentication Management

Key Components of Prometheus
1. The main is the Prometheus server which scrapes the metrics data and stores them a time series data
2. Prometheus works with a concept called “Exporters”. These are agents running on the infrastructure that is to be monitored. Different “Exporters” are available for the standard platforms like HAProxy, StatsD, Graphite etc.
3. It has an Alert Manager to manage alerts that need to be generated based on the metrics. It provides features for dampening of alerts so as to prevent an alert flood.
4. It has client libraries which can be used for instrumenting client code.
5. It has a push-gateway to support push metrics collection

It supports four types of metrics
1. Counter
2. Gauge
3. Historgram
4. Summary

PromQL is an SQL like dialect to slice and dice the data gathered in the server. It provides for standard mathematical operators and also functions like Ceil, Floor, Day_of_week, day_of_month, rate, irate, sum, avg, ln, log. It also provide the function predict_linear to determine the trend.

Alert Manager provides hooks to different services like email, webhooks, PagerDuty etc.