metrics depiction

Long-term Metrics storage for Prometheus with Grafana Mimir

Spread the love

In a previous article, I made a comparison about different long-term Prometheus Storage options; AMP, Thanos, and Grafana Mimir. There is another article that talks about how Thanos works and it’s different components.

In this article, I shall be talking about Grafana Mimir, which is an open-source alternative to Thanos.

They have very similar features, but Mimir has some distinctions from Thanos, though they are quite similar. We shall compare their similarities and differences in a different post.

Prometheus Storage Options: AMP, Thanos and Mimir.

What is Grafana Mimir

mimir

Grafana Mimir is an open-source, horizontally scalable, and highly available time-series database designed to store Prometheus metrics efficiently. It is a key component of the LGTM stack (Loki, Grafana, Tempo, and Mimir), which provides a comprehensive observability solution for logs, metrics, and traces. Mimir enhances Prometheus by enabling long-term storage, multi-tenancy, scalability, and high availability, making it suitable for enterprise-level monitoring needs.

Originally forked from Cortex, Mimir introduces various optimizations to improve performance and simplify operations while maintaining full PromQL compatibility. This allows users to seamlessly query and visualize metrics stored in Mimir using Grafana dashboards.

Key Features of Grafana Mimir  

– Horizontal Scalability: Can scale out to handle billions of active time-series.

– High Availability: Replication and sharding ensure resilience.

– Multi-tenancy: Supports multiple independent tenants.

– Long-Term Storage: Integrates with object storage like Amazon S3, Google Cloud Storage, and MinIO.

– Compatibility with Prometheus: Supports the same APIs and query language.

How Grafana Mimir Works with Object Storage  

Unlike Prometheus, which uses a local TSDB (Time-Series Database), Mimir separates ingestion from storage. It writes metric data to object storage, enabling long-term retention and cost efficiency.

How Metrics Are Processed

1. Metrics Ingestion from Prometheus:

  • Prometheus scrapes metrics from configured targets (e.g., Kubernetes nodes, applications, infrastructure services).
  • Instead of storing data locally, Prometheus remotely writes the metrics to Grafana Mimir via the Prometheus Remote Write API.
  • Mimir’s Distributor component receives and hashes the incoming data, ensuring it is routed to the correct Ingester.

2. Data Storage in Object Storage:

  • Ingester nodes temporarily store incoming data in memory, reducing query latency for recent metrics.
  • Data is periodically flushed to object storage in a compact format for long-term retention.
  • Supported object storage backends include; Amazon S3, Google Cloud Storage, MinIO, Azure Blob Storage

 

3. Querying Data with Grafana:

 

  • When a user queries metrics in Grafana, the request is processed by Mimir’s Querier.
  • The querier fetches recent data from Ingesters (for low-latency queries) and older data from object storage via the Store-Gateway.
  • Grafana’s Prometheus data source remains fully compatible, allowing users to query Mimir in the same way they query Prometheus using PromQL queries and aggregates results.

 

Supported Object Storage Backends

– Amazon S3

– Google Cloud Storage

– MinIO

– Azure Blob Storage

– OpenStack Swift

Deploying Grafana Mimir: Binary Mode vs. Microservices Mode

Mimir supports two deployment models:

a. Binary Mode – A single-node deployment suitable for small-scale environments:

Binary mode in Grafana Mimir is a deployment model where all core components—such as the distributor, ingester, querier, query frontend, and store-gateway—run as a single process on a single machine. This mode is designed for simplicity and ease of setup, making it ideal for small-scale environments, testing, or development use cases. Since all functionalities are bundled into a single executable, deployment and configuration are straightforward, requiring minimal infrastructure. However, while binary mode is convenient, it comes with limitations—there is no built-in redundancy, meaning that if the single node fails, data ingestion and querying will be interrupted. Additionally, as the volume of metrics grows, a single node may become a bottleneck, limiting scalability and performance. Despite these drawbacks, binary mode is useful for lightweight monitoring solutions, quick evaluations, and environments where high availability is not a strict requirement.

b. Microservices Mode – A distributed deployment optimized for high availability and scalability:

Microservices mode in Grafana Mimir is a distributed deployment model where each core component—Distributor, Ingester, Querier, Query Frontend, Store-Gateway, and others—runs as a separate, independently scalable service. This architecture is designed for high availability, fault tolerance, and scalability, making it suitable for large-scale production environments that require handling massive amounts of time-series data. By decoupling services, microservices mode allows organizations to scale each component independently based on workload demands—ingesters can be increased to handle higher write loads, and queriers can be scaled horizontally to improve query performance. Additionally, redundancy is built into the system through replication and ring-based architecture, ensuring that failures in individual components do not lead to data loss or service downtime. While this model offers significant advantages in terms of resilience and performance, it also introduces operational complexity, requiring container orchestration platforms like Kubernetes and additional infrastructure components like object storage, caching layers (Memcached/Redis), and distributed key-value stores (Consul, Etcd, or memberlist). Despite these challenges, microservices mode is the preferred deployment choice for enterprises and large-scale monitoring solutions where scalability, reliability, and long-term metric retention are critical.

a. Installing Grafana Mimir in Binary Mode

Binary mode is the simplest way to deploy Mimir, where all components run as a single process.

Installation Steps

1. Download and Install Mimir Binary

wget https://github.com/grafana/mimir/releases/latest/download/mimir-linux-amd64
chmod +x mimir-linux-amd64

2. Create a Configuration File (`mimir.yaml`)

   blocks_storage:
     backend: s3
     s3:
       bucket_name: mimir-metrics
       endpoint: s3.amazonaws.com
       region: us-east-1
       access_key_id: <AWS_ACCESS_KEY>
       secret_access_key: <AWS_SECRET_KEY>

3. Run Mimir

./mimir-linux-amd64 --config.file=mimir.yaml

Advantages of Binary Mode

– Quick and easy to set up.

– Suitable for small-scale deployments.

– Fewer moving parts, reducing operational overhead.

Disadvantages

– Single point of failure (not highly available).

– Limited scalability.

– Not optimized for high-traffic environments.

b. Installing Grafana Mimir in Microservices Mode

Microservices mode runs different Mimir components as separate services. This allows horizontal scaling by adding more instances.

Key Components in Microservices Mode

As the name implies “microservice mode”, it means breaking up Mimir into individual components. This ensure the high-scalability and availability of your Mimir setup unlike the Binary mode that has a single point of failure. Let us explore the different components that make up the microservice mode of a Mimir deployment

Distributor:

The Distributor is responsible for receiving incoming metrics from Prometheus or other metric sources and efficiently routing them to the appropriate Ingesters. It acts as a load balancer by hashing the metric labels and using consistent hashing to ensure that the same series always go to the same ingester, reducing unnecessary data movement. The distributor also performs validation, rate limiting, and tenant isolation, ensuring that incoming data is well-formed and compliant with configured limits. If an ingester fails, the distributor automatically reroutes data to healthy instances, maintaining high availability.

Ingester:

The is responsible for temporarily storing incoming time-series data in memory before persisting it to long-term object storage (such as Amazon S3, Google Cloud Storage, or MinIO). It buffers and compresses data in chunks, optimizing it for efficient storage and retrieval. Ingester nodes periodically flush data to object storage based on either time intervals or chunk size thresholds, ensuring durability while minimizing write amplification. During query execution, ingesters serve recent data directly from memory, providing low-latency access, while older data is retrieved from storage via the Store-Gateway. Ingester instances are part of a replicated ring architecture, ensuring data availability and resilience by distributing series across multiple ingesters to prevent data loss in case of failures.

Querier:

Responsible for executing PromQL queries received from Grafana dashboards, Prometheus, or API clients. It retrieves recent data from Ingesters (for low-latency access) and historical data from object storage via the Store-Gateway. To optimize performance, the querier parallelizes query execution by distributing workloads across multiple instances, ensuring efficient data retrieval even in large-scale environments. It also applies aggregation, filtering, and downsampling techniques to enhance query efficiency. In a microservices deployment, the querier works in conjunction with the Query Frontend, which caches and batches requests, further improving response times and reducing computational overhead.

Query Frontend:

The Query Frontend in Grafana Mimir is an optimization layer that improves query performance by caching, batching, and splitting queries before forwarding them to the Querier. It helps reduce query latency by storing frequently accessed results in a cache (such as Memcached or Redis), preventing redundant computations. For complex or long-running queries, the query frontend splits queries into smaller subqueries, distributing them across multiple queriers to enable parallel execution, significantly improving response times for large datasets. Additionally, it queues and rate-limits incoming queries to prevent query bursts from overwhelming the system, ensuring fair resource allocation and consistent performance across multiple tenants.

Store-Gateway:

The Store-Gateway in Grafana Mimir is responsible for efficiently retrieving historical metric data from long-term object storage (such as Amazon S3, Google Cloud Storage, or MinIO) and serving it to the Querier for processing. Instead of loading entire metric blocks into memory, the store-gateway dynamically fetches and caches only the necessary index and chunk data, optimizing both memory usage and query performance. It plays a critical role in reducing query latency for older data by minimizing the amount of data that needs to be scanned. Additionally, the store-gateway supports sharding and replication, allowing multiple instances to work in parallel for better scalability and high availability, ensuring efficient access to long-term stored metrics without overwhelming the system.

Installation Steps:

1. Deploy Mimir Using Docker Compose

Create a docker-compose.yaml file:

version: '3.7'


services:
  mimir:
    image: grafana/mimir:latest
    ports:
      - "9009:9009"
    volumes:
      - ./mimir-config:/etc/mimir
    command:
      - "-config.file=/etc/mimir/mimir.yaml"

2. Configure the Microservices in `mimir.yaml`**

   auth_enabled: false

   blocks_storage:
     backend: s3
     s3:
       bucket_name: mimir-metrics
       endpoint: s3.amazonaws.com
       region: us-east-1
       access_key_id: <AWS_ACCESS_KEY>
       secret_access_key: <AWS_SECRET_KEY>

   distributor:
     ring:
       kvstore:
         store: memberlist

   ingester:
     lifecycler:
       ring:
         kvstore:
           store: memberlist

3. Start Mimir Services via Docker Compose

docker-compose up -d

4. Set Up a Kubernetes Cluster (Optional)

If deploying on Kubernetes, install Helm:

helm repo add grafana https://grafana.github.io/helm-charts
helm install mimir grafana/mimir-distributed

 

Advantages of Microservices Mode

– Highly Scalable: Can handle billions of time-series data.

– Fault Tolerant: No single point of failure.

– Efficient Resource Utilization: Each component scales independently.

Disadvantages

– Complexity: More challenging to configure and maintain.

– Higher Resource Usage: Requires multiple instances and network communication.

– Additional Infrastructure: Works best in Kubernetes or containerized environments.

For Mimir to scale properly in the micro-service mode, deploying it in Kubernetes is the best option. This ensures each component of Mimir runs as an independent entity.

Using Grafana Mimir with Grafana Dashboards

Since Mimir uses the same API as Prometheus, it can be directly added as a Prometheus data source in Grafana.

Step 1: Add Mimir as a Data Source

  1. Open Grafana and go to Settings → Data Sources.
  2. Click Add Data Source and select Prometheus.
  3. Set the URL to the Mimir endpoint (e.g., http://mimir:9009).
  4. Click Save & Test to verify the connection.

Step 2: Create a Grafana Dashboard

  • Use PromQL queries to fetch metrics stored in Mimir.
  • Apply filters, aggregations, and visualizations to monitor system health, application performance, or infrastructure metrics.

Conclusion

Grafana Mimir extends Prometheus’s capabilities by providing scalable, long-term storage using object storage. It integrates seamlessly into Grafana dashboards, allowing organizations to monitor large-scale environments efficiently. By leveraging Mimir’s distributed architecture, teams can handle massive telemetry data while ensuring high availability, efficient querying, and cost-effective storage.

Learn more about Mimir in their official documentation.


Spread the love

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top
×