Deep observability insights at any scale with Consul
Consul helps you securely connect applications running in any environment, at any scale. Consul observability features enhance your service mesh capabilities with enriched metrics, logs, and distributed traces so you can improve performance and debug your distributed services with precision.
In this tutorial, you will enable observability features for your Consul data plane and control plane. You will use Grafana to explore dashboards that provide information regarding health, performance, security, and operations. In the process, you will learn how using these features can provide you with deep insights, reduce operational overhead, and contribute to a more holistic view of your service mesh applications.
Scenario overview
HashiCups is a coffee shop demo application. It has a microservices architecture and uses Consul service mesh to securely connect the services. At the beginning of this tutorial, you will use Terraform to deploy the HashiCups microservices, a self-managed Consul cluster, and an observability suite on Elastic Kubernetes Service (EKS).
You will enable Consul observability features for your service mesh environment that will provide insights into the health and performance of your data plane and control plane. You will use these features to diagnose and troubleshoot traffic problems between services on the data plane.
In this tutorial, you will:
- Deploy the following resources with Terraform:
- Elastic Kubernetes Service (EKS) cluster
- A self-managed Consul datacenter on EKS
- Grafana and Prometheus on EKS
- HashiCups demo application
- Perform the following Consul data plane procedures:
- Review and enable observability features
- Explore dashboards with Grafana
- Troubleshoot the HashiCups demo application
- Perform the following Consul control plane procedures:
- Review and enable observability features
- Explore dashboards with Grafana
- Perform the following HCP Consul procedures:
- Review and enable HCP observability features
- Explore dashboards with HCP Consul portal
- Clean up your demo environment
Prerequisites
For this tutorial, you will need:
- An AWS account configured for use with Terraform
- An HCP account
- aws-cli >= 2.0
- terraform >= 1.0
- consul >= 1.16.0
- consul-k8s >= 1.2.0
- git >= 2.0
- helm >= 3.0
- kubectl > 1.24
Clone GitHub repository
Clone the GitHub repository containing the configuration files and resources.
Change into the directory that contains the complete configuration files for this tutorial.
Deploy infrastructure and demo application
With these Terraform configuration files, you are ready to deploy your infrastructure.
Initialize your Terraform configuration to download the necessary providers and modules.
Then, deploy the resources. Confirm the run by entering yes
.
Note
The Terraform deployment could take up to 15 minutes to complete. Feel free to explore the next sections of this tutorial while waiting for the environment to complete initialization.
Connect to your infrastructure
Now that you have deployed the Kubernetes cluster, configure kubectl
to interact with it.
Configure your CLI to interact with Consul datacenter
In this section, you will set environment variables in your terminal so your Consul CLI can interact with your Consul datacenter. The Consul CLI reads these environment variables for behavior defaults and will reference these values when you run consul
commands.
Set the Consul destination address. By default, Consul runs on port 8500
for http
and 8501
for https
.
Retrieve the ACL bootstrap token from the respective Kubernetes secret and set it as an environment variable.
Remove SSL verification checks to simplify communication to your Consul datacenter.
Note
In a production environment, we recommend keeping this SSL verification set to true
. Only remove this verification for if you have a Consul datacenter without TLS configured in development environment and demonstration purposes.
Run the consul catalog services
CLI command to print all known services from your Consul catalog. This will ensure you are able to communicate with your Consul environment.
Enable Consul data plane observability features
The Consul data plane is responsible for authorizing, forwarding, and observing every network packet that flows between the services in your service mesh.
Consul data plane observability features provide detailed statistics and logging data so you can understand distributed traffic flow and debug problems as they occur.
Review and enable data plane metrics
Consul lets you expose Prometheus metrics for your service mesh applications and sidecars. Review the highlighted lines in the values file below to see the parameters that enable this feature.
Refer to the Consul metrics for Kubernetes documentation to learn more about metrics configuration options and details.
Configure your Consul cluster to let Prometheus collect metrics from your data plane.
Note
The Helm upgrade could take up to 5 minutes to complete. Feel free to explore the next sections of this tutorial while waiting for your updated Consul environment to become available.
Review the official Helm chart values to learn more about these settings.
Review and enable data plane logging
The ProxyDefaults
configuration entry lets you configure global defaults across all sidecar proxies for Consul service mesh proxy configurations. The proxy/proxy-defaults.yaml
file enables accessLogs
for all of your Consul data plane sidecar proxies.
Review the Consul proxy defaults documentation to learn more.
Configure your proxy defaults to enable access logs.
Restart sidecar proxies
You need to restart your sidecar proxies to apply the updated configuration. To do so, redeploy your HashiCups application.
Prometheus will now begin scraping the /metrics
endpoint for all proxy sidecars on port 20200
. Refer to the Consul metrics for Kubernetes documentation to learn more about changing the Consul metrics collection default parameters.
Generate traffic in the demo application
In this section, you will visit your demo application, HashiCups, to generate traffic that will populate the Consul proxy metrics dashboards in Grafana.
Retrieve the HashiCups URL.
Open the Consul API Gateway's URL in your browser and explore the HashiCups UI. Notice that the HashiCups UI displays an expected error message.
In the next section, you will use the data plane logs and metrics dashboards to troubleshoot the HashiCups demo application.
Explore Consul data plane metrics and logs with Grafana
Consul proxy access logs and proxy metrics provide you with detailed health and performance information for your service mesh applications. In this section, you will use Grafana to examine how this information provides insights into your distributed applications.
Event and error insights
Consul proxy access logs provide detailed event and error information for your service mesh applications. This includes upstream/downstream application connections, request status codes, errors, and additional information that you can use to gain deep insights into your distributed applications.
Navigate to the data plane logs dashboard.
Note
The Grafana dashboard may take a few moments to fully load in your browser.
In this scenario, notice that the nginx
app is experiencing a large amount of 503: Service Unavailable
errors. When filtering for the 503
response code in the raw logs, Grafana shows nginx
returns an error when it attempts to call the /api
path. Referencing the HashiCups diagram, the /api
path sends traffic to the public-api
service.
Consul proxy access logs contain a large set of information that you can utilize to monitor your service mesh applications. Refer to the Consul access logs documentation for a complete list of available logs.
Health insights
Consul proxy metrics provide you with information for monitoring the health of your service mesh applications such as requests by status code, upstream/downstream connections, rejected connections, and Envoy cluster state that can be used for monitoring the health of your service mesh applications. The majority of these metrics are available for any service mesh applications and require no additional service configuration.
Navigate to the data plane health monitoring dashboard.
Note
The Grafana dashboard may take a few moments to fully load in your browser.
In this scenario, notice that only 5 of the 6 HashiCups services are running and that the public-api
service is not present in the list of active HTTP downstream connections. The status code related dashboards also show a large amount of 503: Service Unavailable
errors for the nginx
service.
Consul proxy metrics contain a large set of statistics that you can use to monitor your service mesh applications. Refer to the Envoy proxy statistics overview for a complete list of available metrics.
Performance insights
Consul proxy metrics provide you with information for monitoring the performance of your service mesh applications such as network traffic statistics, CPU usage by pod, Envoy connections per second, and upstream/downstream connection data. The majority of these metrics are available for any service mesh applications and require no additional application configuration.
Navigate to the data plane performance monitoring dashboard.
Note
The Grafana dashboard may take a few moments to fully load in your browser.
In this scenario, notice that the public-api
service is not present in the upstream requests dashboard. Even though the CPU and network usage dashboards show that the public-api
pod is present, the pods is processing very little activity.
Consul proxy metrics contain a large set of statistics that you can utilize to monitor your service mesh applications. Refer to the Envoy proxy statistics overview for a complete list of available metrics.
Restore HashiCups functionality
In this section, you will restore HashiCups functionality by using the insights from the data plane metrics and log dashboards.
The data plane dashboards show that only 5 of the 6 HashiCups services are running in the service mesh and that the public-api
service is not present in the list of active HTTP downstream connections. The CPU and network usage dashboards show that the public-api
pod is present, but processing very little activity.
Based on this information, you can deduce that there is an error with the public-api
service. List the pod details in the default
namespace, where the HashiCups pods are running.
Notice that public-api
only has one container in its pod. Since the service itself is running, this means that no Consul proxy sidecar exists in this pod.
Open the hashicups/public-api.yaml
and investigate the deployment resource configuration.
Notice that the Consul proxy sidecar annotation is set to false
. This signals to Consul not to inject a proxy sidecar into the public-api
pod. Update this value to true
and save your changes.
Re-deploy your public-api
deployment so Consul injects proxy sidecar in its pod.
Open the HashiCup's URL in your browser and refresh the HashiCups UI.
Notice that the HashiCups UI functions correctly. You have successfully resolved the problem using Consul's data plane observability.
Enable Consul control plane observability features
The Consul control plane is responsible for providing policy and configuration for all running data planes in your service mesh. The control plane turns all of your data planes into a distributed system.
Consul control plane observability features provide detailed statistics and logging data to give you insight into the operational health and performance of your Consul cluster.
Review and enable control plane metrics
Consul lets you expose Prometheus metrics for your service mesh applications and sidecars. Review the highlighted lines in the values file below to see the parameters that enable this feature.
Refer to the Consul metrics for Kubernetes documentation and official Helm chart values to learn more about metrics configuration options and details.
Configure your Consul cluster to let Prometheus collect metrics from your control plane.
Note
The Helm upgrade could take up to 5 minutes to complete. Feel free to explore the next sections of this tutorial while waiting for your updated Consul environment to become available.
In addition to configuring Consul, you need to modify the anonymous ACL policy to allow agent:read
permissions so Prometheus can scrape metrics from the secure Consul servers.
Review the Consul ACL Policies documentation to learn more.
Note
In a production environment, we recommend using the Prometheus Consul Exporter for the most secure, restrictive access to Consul metrics on port 8501
.
Explore Consul control plane metrics and logs on Grafana
Consul control plane metrics and logs provide you with detailed health and performance information for your Consul servers. In this section, you will use Grafana to examine how this information provides insights into your Consul control plane.
Health and performance insights
Navigate to the control plane monitoring dashboard.
Note
The Grafana dashboard may take a few moments to fully load in your browser.
Notice that the example dashboard panes provide detailed performance insights for the Consul control plane.
Consul contains a large set of statistics that you can utilize to monitor your service mesh control plane. Refer to the Consul telemetry overview for a complete list and description of available metrics.
Event and error insights
Navigate to the control plane logs dashboard.
Note
The Grafana dashboard may take a few moments to fully load in your browser.
Notice that the example dashboard panes provide detailed event and error insights for your Consul control plane.
Enable HCP Consul Observability
The HCP Consul management plane allows deeper insights to your Consul deployments via cloud-based observability and seamlessly links new and existing self-managed Consul clusters, simplifying observability for distributed Consul deployments.
Link your self-managed Consul cluster to HCP
Login to the HCP cloud portal in your browser.
Click Get Started with Consul.
Click Self-Managed Consul and for linking method select Link existing. Click the Get Started button once complete.
Enter a name for your Consul cluster and select the Kubernetes runtime. We recommend using the cluster’s datacenter name as the cluster ID in this field. Click the Continue button once complete.
Select your preferred tool for updating your Consul deployment, Consul-K8S CLI or Helm, then only perform the first step to set secrets to authenticate with HCP.
Confirm you set the Kubernetes secrets required for linking your self-managed Consul cluster to HCP Consul Central. You should find five secrets that start with consul-hcp
Review and link your cluster to HCP Consul Central
Consul lets you connect your self-managed cluster with HCP Consul. Review the highlighted lines in the values file below to see the parameters that enable this feature.
Configure your Consul cluster to link to HCP Consul Central.
Note
The Helm upgrade could take up to 5 minutes to complete. Feel free to explore the next sections of this tutorial while waiting for your updated Consul environment to become available.
Review the official Helm chart values to learn more about these settings.
Create intentions for the Consul telemetry collector
The Consul telemetry collector runs as a service in your mesh. To receive data plane metrics from your sidecar proxies, you need to create a service intention that authorizes proxies to push metrics to the collector.
Create intentions for the Consul telemetry collector.
Review the Consul telemetry collector documentation to learn more.
Restart sidecar proxies
You need to restart your sidecar proxies to apply the updated configuration. To do so, redeploy your HashiCups application.
Your sidecars will now begin forwarding metrics to your HCP observability dashboard.
Explore HCP Consul observability dashboard
HCP Consul control plane metrics provide you with detailed health and performance information for your self-managed or HCP-managed Consul servers. In this section, you will examine how these metrics and logs provide insights into your Consul control plane and data plane.
Return to the HCP dashboard page in your browser. It may take a moment to sync with your self-managed Consul cluster.
Click Observability on the navigation pane and explore the observability insights of your self-managed Consul cluster.
HCP Consul contains a large set of statistics that you can utilize to monitor your service mesh control plane. Refer to the HCP Consul observability documentation for a complete list and description of available metrics.
Clean up resources
Destroy the Terraform resources to clean up your environment. Confirm the destroy operation by inputting yes
.
Note
Due to race conditions with the various cloud resources created in this tutorial, it may be necessary to run the destroy
operation twice to ensure all resources have been properly removed.
Open the HCP Consul portal and unlink your self-managed cluster to clean up your HCP resources.
Next steps
In this tutorial, you enabled observability features for your Consul data plane and control plane to enhance the health and performance monitoring of your service mesh applications. You saw how these features can provide you with faster incident resolution, increased application understanding, and reduced operational overhead.
For more information about the topics covered in this tutorial, refer to the following resources: