Securing Kubernetes from Disaster: Introduction to Velero

Velero is an open-source backup and disaster recovery tool that makes it easy to perform backup and disaster recovery operations in Kubernetes. The following documentation explains everything you need to know and understand about Velero. The following explanation makes Velero easier to understand and comprehend from the ground up. This was extracted from here.

Why Velero?

Velero was designed specifically to back up and restore Kubernetes cluster resources. This gives Velero some advantages over etcd backups. Velero uses the Kubernetes API discovery capabilities to collect backup data. This means that Velero can back up new APIs without updating Velero itself. Velero does not need to take backups of etcd. A discovery approach allows Velero to back up clusters that include aggregated API servers, which otherwise requires creating an etcd backup of each server. Velero can also perform backups in scenarios where there is no direct access to etcd, such as a cluster running on GKE. Velero lets you select specific resources to back up because it does not create an atomic snapshot by backing up etcd. This approach means that Velero enables you to restore a subset of a backup. Velero also associates snapshots of persistent volumes with each backup. These snapshots allow Velero to restore both what was running in the cluster and the data associated with the cluster.

Backup

Velero supports on-demand backups as well as scheduled periodic backups. You can configure Velero to back up your entire cluster or only specified resources based on namespaces, resources, and labels you include or exclude.

Every cluster has different backup needs. You should carefully consider your own needs when creating a backup plan. Consider the following scenarios as you plan for your environment.

Full Cluster

A full cluster backup is the default behavior if you run velero create backup without any additional flags. This approach is the simplest way to get started with Velero, but might not be adequate for all purposes.

Per-Namespace

A per-namespace backup lets you restore a namespace with just the base velero create restore command. This approach is useful if you have multi-tenant clusters where each tenant has its namespace in the cluster. You can restore a single namespace without disrupting the other tenants. Depending on your needs, you might also want full cluster backups together with your namespace backups.

Strategy

We recommend that you identify critical functions in your system and create backups that include only the resources needed to restore those critical functions. This makes restoring simpler because you restore the entire backup instead of having to filter it. This is especially important to help reduce errors in case of disaster recovery.

Scheduling Backups

Velero supports scheduled backups with a cron syntax. You can also set a time to live (TTL). These two features allow you to create backup configurations for different recovery point objectives (RPO). A common configuration is the hourly backup. Here is an example of an hourly backup configuration that includes hourly, daily, weekly, and monthly backup schedules all with TTLs that remove backups once they are part of the larger chronological archive.

Backup Hooks

Hooks in Velero let you run a command inside a container before and after a backup. Velero provides both pre-backup and post-backup hooks. Hooks are configured using pod annotations. The Velero documentation provides an example of hooks to call fsfreeze on a file system before and after Velero performs a backup, plus the full details of hook annotations.

Restore

Velero supports restoring backups that are created manually or according to a schedule. It lets you perform full or partial restores of backups. Velero performs a full restore of a backup by default.

Lifecycle of Velero Objects

Understanding the lifecycle of a Velero object can help you understand the details of your Velero jobs.

The New phase shows that the requested backup, restore, or schedule object is created by the API, but the object has not been processed by its respective controller.
The next phase is validation. The Newobject is validated by the processing controller. If the controller cannot validate the object, it moves to the FailedValidation phase and no further processing is attempted.
After successful validation, if you are creating a schedule, the object moves to the Enabled phase. When a schedule is enabled, it triggers backups in accordance with its schedule spec.
If you are creating a backup or restore, the object moves to the InProgress phase. During the InProgress phase, Velero attempts to perform all the operations codified in the backup or restore the object. Relevant errors and warnings are counted during this process and captured in the status. The type of object being created determines the next phase.
Both backup and restore objects have a Completed phase. This phase shows that the requested backup or restore object and all its operations have been performed. The Completed phase does not automatically mean there were no errors or warnings.

Two other phases are possible for backups:

If you delete a backup, it has a Deleting phase.
A failed phase is also possible for backups. The Failed phase shows that there was a critical error that prevented the backup from completing successfully.

Monitoring

Velero provides metrics for backups and restores. You can use these metrics to add recording and alerting rules to your Prometheus configuration.

Velero Use Cases

Cluster Migration

Velero can be used to migrate Kubernetes resources from one Kubernetes cluster to another Kubernetes cluster. It enables cluster portability for all Kubernetes resources, pods, deployments, services, volumes, and more.

Disaster Recovery

VMware recommends that you create a periodic backup schedule as part of your disaster recovery plan. Velero can then help with the recovery of your cluster and its resources in case of a disaster. During a recovery, Velero can be configured to restoreOnly mode. restoreOnly mode ensures that Velero does not take any backups during the restore process so that you do not have to worry about cleaning up backups that might contain only a partially restored cluster.

Data Protection

It can be used for data protection with its scheduled backup feature that allows you to backup all the data within the cluster to a more secure location.

Security on Velero

By default, Velero runs using a service account with cluster-admin permissions and is not scoped by any role-based access controls (RBAC). These permissions allow Velero to back up and restore all Kubernetes resources. However, this means users who have permissions to back up and restore with Velero effectively have cluster-admin permissions. This means that only trusted administrators should have access to create backups and restores. You can configure Velero to run with reduced permissions, but this means that only the resources that the related service account can access can be backed up and restored. Velero does not currently support multi-tenancy in a single instance. A scenario with multiple, tightly scoped instances is untested.

Summary

This piece is to help give an overview of what Velero is, what it can do, and some basic use cases for Velero. We have learned that Velero is an open-source tool developed by VMWare, and it is used for backing up Kubernetes resources for almost any kind of Kubernetes setup. It has use cases such as data protection, disaster recovery, and cluster migration.

Securing Kubernetes from Disaster: Introduction to Velero

Why Velero?

Backup

Full Cluster

Per-Namespace

Strategy

Scheduling Backups

Backup Hooks

Restore

Lifecycle of Velero Objects

Monitoring

Velero Use Cases

Cluster Migration

Disaster Recovery

Data Protection

Security on Velero

Summary

Leave a Comment Cancel Reply

Sign up for the Newsletter

Why Velero?

Backup

Full Cluster

Per-Namespace

Strategy

Scheduling Backups

Backup Hooks

Restore

Lifecycle of Velero Objects

Monitoring

Velero Use Cases

Cluster Migration

Disaster Recovery

Data Protection

Security on Velero

Summary

Must Read

Leave a Comment Cancel Reply