From Mnesia to Khepri, Part 3: Making the Move to Khepri

In the part 2 of this blog series, we covered what to expect with Khepri, should you decide to migrate. This part will cover the different ways to get started with Khepri on CloudAMQP.

Now that RabbitMQ 4.0 is available on CloudAMQP, Khepri is ready for use! There’s a couple of ways to test Khepri, but before jumping into the Mnesia → Khepri migration steps, it’s helpful to look at Khepri’s roadmap. It shows key changes on the way, like when Khepri becomes default and when Mnesia is being phased out, helping you plan your migration smoothly.

Khepri Roadmap

RabbitMQ 3.13.x:
- Khepri is experimental and not supported.
- Upgrading to 4.0.x with Khepri enabled would not work.
RabbitMQ 4.0.x:
- Khepri is fully supported but remains optional.
- It is not enabled by default.
RabbitMQ 4.1.x:
- Mnesia will still be supported, but Khepri becomes the default metadata store for all newly created clusters.
- Existing RabbitMQ deployments will continue using Mnesia after upgrading to 4.1.x, unless Khepri is explicitly enabled by the administrator.
RabbitMQ 4.2.x
- Khepri is required.
- Mnesia support will be removed entirely.

Key takeaways:

Never use Khepri in production environments with RabbitMQ 3.13.x.
It’s okay to consider using Khepri in production with RabbitMQ 4.0.x. However, we strongly recommend thoroughly testing it in staging or development environments first, because once you enable Khepri it cannot be disabled. Plus, testing ensures compatibility with your existing workflows and allows you to identify and address any potential issues early, minimizing risk during the transition.

Before Migrating: Moving from Classic Mirrored Queues to Quorum Queues

Note: Skip this section if you are not working with Classic mirrored queues.

Classic mirrored queues are not compatible with Khepri – if you have classic queue mirroring enabled, consider migrating to Quorum queues first. Unfortunately, you cannot directly change a classic mirrored queue to a quorum queue. Migration from Classic to Quorum queues can be achieved in a number of ways, depending on how much downtime you can tolerate:

By deleting your Classic queues and redeclaring them as Quorum.
By using a shovel to move messages from an old Classic queue to the Quorum queue.
By setting up a new cluster with Quorum queues and moving messages via federation.

Read our blog on moving from Classic mirrored queues to Quorum queues to learn more.

What if I have too many Classic mirrored queues to update manually?

To efficiently migrate many queues, download your definitions, update them by removing HA policies and setting the queue type to quorum, then reimport them into a new vhost. This recreates the queues as quorum queues with the same names and settings, saving time and effort!

Here are the steps to take, but before starting, we recommend that you drain messages from your Classic queues first. If that is not possible, using shovels is an option.

Step 1: Export your instance definitions. This can be done via the RabbitMQ Management UI located in the Overview section under Export definitions.

Step 2: Create a new vhost and add permissions for your users to this vhost. For example:

 "vhosts": [
  {
      "name": "khepri-vhost",
      "description": "",
      "tags": [],
      "default_queue_type": "quorum",
      "metadata": {
          "description": "",
          "tags": [],
          "default_queue_type": "quorum"
      }
  },

Add permissions to your existing user

  "permissions": [
  {
      "user": "user1",
      "vhost": "khepri-vhost",
      "configure": ".*",
      "write": ".*",
      "read": ".*"
  },

Step 3: Update your queues with the argument {"x-queue-type": "quorum"} and with a vhost name, for example:

 "queues": [
  {
      "name": "classic-queue",
      "vhost": "mnesia-vhost",
      "durable": true,
      "auto_delete": false,
      "arguments": {
          "x-queue-type": "classic"
      }
  },

To:

{
  "name": "classic-queue",
  "vhost": "khepri-vhost",
  "durable": true,
  "auto_delete": false,
  "arguments": {
      "x-queue-type": "quorum"
  }
},

Step 4: Import the definitions back into your instance. This can be done via the RabbitMQ Management UI located in the Overview section under Import definitions.

Once imported successfully, ensure that your clients are pointing to the correct vhost and queues, and that you have removed your HA-policies. If you require any assistance, please contact our CloudAMQP Support team.

Migrate from Mnesia to Khepri

Take the following steps to make the move to Khepri.

Step 1: Ensure you are on the right RabbitMQ version

To have Khepri available to you, you will need to be on the RabbitMQ version >= 4.0.3 . There are several ways to get to 4.0.3 on CloudAMQP.

Note: When upgrading to version 4.0.x, keep in mind that the ha-mode policy is no longer supported. Ensure your systems and clients work correctly without classic queue mirroring. Once confirmed, remove any HA policies before starting the upgrade.

Upgrading via the “Get me to the latest RabbitMQ and Erlang version”

This method is the safest and easiest way to upgrade quickly. If you are using a one node instance, there will be a couple of minutes of downtime when you upgrade. If you are on a 3+ node cluster, upgrading will happen in a round robin fashion, meaning there will be no downtime and your cluster will remain online as long as your connections are pointing to the DNS load balancer.

Depending on how far you are from 4.0.3, you may need to upgrade a couple of times as this is dependent on the RabbitMQ and Erlang compatibility matrix. Check out the Erlang Version Requirements or ask our CloudAMQP Support team to help map out the number of upgrades required.

Using the Blue-Green deployment

This involves creating a new cluster directly on 4.0.x. This way, you’ll be able to have your definitions imported into the new cluster without hassle and no need to wait for more than 10 minutes. The benefit here is that you can test Khepri on a cluster separate from your customer-facing clusters and avoid any disruptions to your business.

Now that you are on the right RabbitMQ version and your cluster is ready, it’s time to take the next step.

Step 2: Enable Khepri

At the moment, CloudAMQP users will need to contact CloudAMQP Support to enable Khepri.

Enabling Khepri in RabbitMQ triggers a two-phase migration from Mnesia, managed by the khepri_mnesia_migration library.

Phase 1: Cluster Membership Synchronization

First, the cluster membership is synchronized, ensuring Khepri's cluster matches the existing Mnesia setup. This includes aligning nodes and resolving any discrepancies. The goal here is to ensure that the cluster membership view is the same between Mnesia and Khepri.

Phase 2: Schema Records Copy

Once the first phase is completed, data is copied from Mnesia to Khepri via Mnesia’s Backup and Restore API. While for the most part RabbitMQ will remain online during the migration process, it is paused towards the end. Client operations may time out as a consequence. Once the migration is complete, RabbitMQ switches to Khepri, and Mnesia is cleaned up.

Reverting to Mnesia (and points of no return)

As mentioned earlier, by design, migrating to Khepri is irreversible, so there isn’t a one-click solution for reverting to Mnesia from Khepri. However, while you’re still in the experimental phase, it’s worth noting how to go back to using Mnesia.

Generally, if an error occurs during the migration process, all changes are undone, and RabbitMQ will continue operating with Mnesia as it did before.
If the migration went through and you just want to revert to Mnesia, we recommend the blue-green deployment, where you:
- Create a new Mnesia-based cluster.
- Import definitions from the Khepri-based cluster into the Mnesia-based cluster.
- Move messages from the Khepri-based cluster to the Mnesia-based cluster. This can be done via queue federation with the old cluster as the federation upstream and the new cluster the federation downstream.
- Move consumers to the new cluster. Once the old cluster is drained, move the producers to the new cluster as well.

Conclusion

Migrating to Khepri on CloudAMQP requires some preparation. Understanding Khepri's roadmap is an essential part of the preparation, as it outlines when Khepri becomes the default and Mnesia is phased out. Testing in a staging environment is critical since enabling Khepri is irreversible. If you use Classic Mirrored Queues, migrating to Quorum Queues is a necessary first step. Upgrading to RabbitMQ 4.0.x or higher is required, and blue-green deployments offer a safe way to test Khepri without disrupting production.

We’d be happy to hear from you! Please leave your suggestions, questions, or feedback in the comment section or reach out to our support team.