Mar 23 2023 11:22 PM
Mar 23 2023 11:22 PM
A few weeks ago, I was looking for how to migrate, data from a Cosmos DB NoSQL type account to a second Cosmos DB NoSQL account too.
On paper it seems at first glance rather simple, but ultimately not so much.
So some might ask why?
In fact, for one of my critical projects, we initially decided to deploy a Cosmos DB account in Serverless mode, because we had to have users exclusively in Western Europe.
But a few months later, the scope of the project radically changed. Now data must be accessible worldwide: Ok, no worries.
1. Potential solution: Geo-replication
That's good, the Azure Cosmos DB service offers a geo-replication feature. The problem is that this feature is not available with Serverless mode, only with Provisioned Throughput mode, which ultimately seems consistent.
So I cannot use that way.
2. Potential solution: Data restoration
After a few minutes of thinking, I tell myself that it does not matter, just restore the data via the Point In Time Restore (PiTR) option.
But I meet a new disappointment, because during the restore, the new Cosmos DB account created, is the same as the initial one, in my case a Serverless account.
Ok, for now, I am not lucky.
3. Potential solution: Well I have to look, but why not a migration?
So I start my research like Sherlock Holmes with my pipe, my magnifying glass and my K-way (sorry I didn't have a raincoat handy).
After a few minutes, I come across the official Microsoft documentation page whose title is
Options to migrate your on-premises or cloud data to Azure Cosmos DB
Hum, given the title, I might be interested, so I'm starting to take off my K-way because it's really hot.
The documentation is quite well done, as often to be honest with Microsoft, it offers different scenarios, and in addition, two types of migration are offered, namely "Online" and/or "Offline.
4. Potential solution: Migration proposed by Microsoft
I find many migration use cases there, with as a source, different types of DB such as Azure Cosmos DB of course, but also json or csv files, not to mention Oracle and Apache Cassandra.
After a few moments, I list what seems to work for my use case:
With my magnifying glass, I look at the various proposed solutions available to me...
... and the more I advance, the more I realize that they require a lot of efforts and for some of them, the deployment of new services is required.
Hm, okay !
Before going any further, I go back to my Cosmos DB account to see what it contains.
Then I count 1 DB with 3 containers, and in addition, it contains relatively little data.
When I weigh the pros and cons of each solutions, I quickly see that it almost takes a gas plant for a relatively simple need.
But on the other hand, I have no choice, this migration is mandatory, and as Terence Hill and Bud Spencer said in their movie: Go for It!
But there is no urgency, so I'll see if I can find something simpler, and in the worst case, I'll always have a reversal solution with those seen previously.
5. Considered solution: Migration with Azure Cosmos DB data migration tool
Continuing my research, I came across an announcement from Microsoft dating from April 2015, talking about the Azure Cosmos DB Data Migration tool.
Well I recognize that 2015 is far, but I'm going to dig a little so I exchange my pipe, against a small shovel.
This open source tool allows to import data to Azure Cosmos DB, from different data sources like:
You saw like me, Cosmos DB to Cosmos DB!
The pupil of my eyes has started to dilate, my hair (well what I have left of it) has fallen out, and I find myself in my underwear saying:
Once back to my normal appearance, well, my appearance at all, I start to browse the various links mentioned in the announcement and come across the Git repo of the tool.
I have the impression that luck has finally changed side, but when I come across the 1st sentence and I read:
The Azure Cosmos DB data migration tool is undergoing a full refactor to restructure the project...
Ahhhhhhhhhhhhhhh this is driving me crazy, someone is playing with me, there's no other way!
But as I'm tenacious, I still decide to visit the archive branch of the project and end up downloading version 1.8.3 which dates from August 2021, which isn't so bad when you think about it.
6. Azure Cosmos DB data migration tool testing
I launch the tool via the executable dtui.exe (Yes, I work on Windows, and I'm proud of it ), I go through the doc and the operation seems very simple.
As you can see from my example below, my source is aziedb1amo008 and my destination is aziedb1amo900:
So I wish to migrate my DB StarWars as well as the various containers, which it has namely People, Planets and Species.
What? I told you that it was a critical project
The first thing to do is therefore to define our source account by specifying the connection string, accompanied by the name of the database at the end of the fields, as well as the collection which is none other than our container.
We click on Verify in order to validate that the connection to the account is established correctly.
Bingo, we can go to the next step.
Next, we will define our destination account.
As in the previous step, we define the connection string, the name of the DB which will be created automatically if it does not exist, the collection and the partition key.
If you want, you can define a log file in csv format.
And finally the last step allows you to have a small summary, and you just have to click on Import.
Well, not quite because I also wanted to migrate the Planets and Species containers, so I follow the same steps to achieve my goal.
After a minute or two, you can therefore see that I find my DB, my containers, and even my data on the new Cosmos DB account, which is quite nice.
And of course, it also works with data other than Star Wars, like Pikachu or Marvel!
But you can also try with your own dataset