First published on MSDN on Mar 13, 2018
Editor's note: The following post was written by Data Platform MVP Johan Åhlén as part of our Technical Tuesday series. Daron Yondem of the Technical Committee served as the Technical Reviewer of this piece.
Azure Cosmos DB is Microsoft’s distributed, multi-model cloud database service for managing data at planet-scale. It started in 2010 as “Project Florence,” to address developer pain-points that were faced by large Internet-scale applications inside Microsoft. In 2015, the first API was made available externally in the form of Azure DocumentDB. Finally, Azure Cosmos DB was launched in May 2017 and was recently honored in InfoWorld’s 2018 Technology of the Year Awards .
This article will give you an introduction to Azure Cosmos DB and then take you through a sample scenario of building a globally scalable web application using Azure Cosmos DB and Azure Traffic Manager - called project Planetzine.
But first, let’s delve into Azure Cosmos DB features and pricing
Azure Cosmos DB comes with multiple APIs:
This means that if you have an existing app that stores data in, for example, Gremlin, you can just point the app to your Azure Cosmos DB.
What also makes Azure Cosmos DB stand out from other distributed database services is that it offers five different consistency models:
The most popular one, the Session consistency model , comes with a read-your-own-writes guarantee, which means that a client will never get anything older than the last version of information that the client wrote.
Some other great features of Azure Cosmos DB are:
Azure Cosmos DB Architecture
To start using Azure Cosmos DB, you need at least one account. Each account can have multiple databases (and each database can be of any model). Depending on the database model, the objects stored in the database are projected into containers/partitions/items. For instance, in the SQL API, the containers are called “Collections,” and the items are called “Documents.”
Partitioning is optional, but without a partition key, there is a 10 GB limit to the database size and throughput is capped. I strongly recommend that you always use partitioning.
Pricing
The pricing has two components:
Performance (or “throughput”) is measured in RUs (“Request Units”). Each operation has an RU cost. Reading a 1KB item costs about 1 RU while writing and changing data costs more. The RUs represent a weighted measure of CPU, disk and network cost of your operations. If your requests demand higher RUs per second than you have reserved, they will be rate-limited (throttled). Note that, currently, the minimum billing is 400 RUs per second per container (or minimum 1000 RUs per second if you want to enable unlimited growth). Since the price is per container, you could keep costs down by letting multiple datatypes share the same container.
If you distribute your data geographically, you will have to pay storage costs and reserved RUs for all the regions, so the total cost becomes approximately proportional to the number of regions where your data is distributed. The exact price details for your selected regions are available on the Azure pricing page .
For development purposes, there are free options , but they, of course, have limitations. You can also download the Azure Cosmos DB Emulator and test your application locally.
Now let’s get to work with ‘Project Planetzine’
Azure Cosmos DB is useful for a wide range of applications, but in some cases, Azure Cosmos DB is extra beneficial:
Now, imagine that we are starting a global magazine, “Planetzine.” This global magazine has to be scalable, elastic - and data consistency is important. Also, we need to make sure the website of this global magazine is very fast for visitors worldwide, so we need the content to be replicated globally.
Planetzine Architecture
To serve website visitors from the nearest region, “Planetzine” uses:
You can run this sample scenario with any number of regional websites. For best performance, and to have read operations stay within the visitors region, you should replicate your Azure Cosmos DB database to the same regions as the websites.
Creating the Azure Services
Let’s select the SQL API . Any API could be used, but my code is written for the SQL API.
Note: All the APIs are native to Cosmos DB and equally supported by Microsoft. For your own applications, you should select the API that works best for you.
Azure Cosmos DB
In the Azure Portal, click “New” and chose Cosmos DB :
Consistency will by default be set to “Session”, which is fine.
There will be no costs for your Azure Cosmos DB until you create a collection, and you don’t need to create any collection manually.
Note that you will later need your URI, and PRIMARY KEY (or SECONDARY KEY), that can be found by clicking “Keys”:
Azure Web Apps
In the Azure Portal, click “New” and choose Web App:
When completed, click on “Get publish profile” and save the file somewhere safe. You will need to import it into Visual Studio later.
Azure Traffic Manager profile
The Azure Traffic Manager will help automatically route website visitors to the nearest region. It looks at the IP address of visitors to decide which Web App they should be sent to.
In the Azure Portal, click “New” and chose to create a Traffic Manager profile :
Choose Performance as the routing method - this means traffic will automatically go to the nearest location).
When completed, click “Endpoints” and add your Azure Web Apps:
The address will be “http://profilename.trafficmanager.net”. If you want to assign a custom domain (“www.yoursite.com”), follow these instructions .
Deploying the Web App
Now I suggest you download the source code for the web app. You will need Visual Studio 2017 (or equivalent) to build and publish the web app to your Azure accounts.
The web application has been developed in Visual Studio using ASP.NET MVC. However, it could easily have been developed using other platforms such as Node.js, Java or Python, since Azure Cosmos DB provides client libraries for multiple platforms.
This is what you need to do:
Here are some screenshots of what you should see:
Web.config
These are the settings in Web.config that you should review:
<add key="ConnectionMode" value="Direct" />
<add key="ConnectionProtocol" value="Tcp" />
<add key="InitialThroughput" value="400" />
<add key="MaxConnectionLimit" value="500" />
<add key="DatabaseId" value="Planetzine" />
<add key="ConsistencyLevel" value="Session" />
<!-- CONFIGURE THESE! -->
<add key="EndpointURL" value="https://planetzine.documents.azure.com:443/" />
<add key="AuthKey" value="EnterYourSecretKeyHere" />
<!-- CONFIGURE THESE! -->
These configuration parameters are read by the DbHelper class in the web app.
The SQL API
*Note: The following is already done if you download my source code.
To use the Azure Cosmos DB SQL API in your ASP.NET web application, or any .NET application, you need to add a reference to the SDK. In Visual Studio, right-click on “Manage NuGet packages…” and install the Microsoft.Azure.DocumentDB package (note that it is called DocumentDB, due to the API being previously known as DocumentDB API).
The SQL API (Microsoft.Azure.DocumentDB) is pretty straightforward to use. Going through the API would be outside of the scope of this article, so I recommend looking at the Azure Cosmos DB: SQL API documentation .
Creation of databases and collections
Databases and collections are created (if they don’t already exist) every time the web app starts. This is done from Global.asax.cs.
If you are using a non-free Azure Cosmos DB account, remember to delete the databases/collections when you are done testing, so you don’t get charged unnecessarily.
The DbHelper class
*Note: this is information about the DbHelper class, which is also part of my source code.
One of the most important classes in my sample web application is the DbHelper class. It handles all communications with the Azure Cosmos DB database.
Note that the DbHelper class is static and that there will never be more than one client. It’s best practice never to use multiple clients to communicate with a single endpoint.
Note also that all methods of the DbHelper class are asynchronous. This is also best practice since the SQL API itself is asynchronous, and we should avoid any blocking.
The DbHelper class supports the basic operations: creating a database, creating a collection, creating a document, upserting a document (which means that it replaces a document if it already is there, otherwise creates it), reading and querying.
The Article class
The structure of the Article class looks like this (in shortened form):
public class Article
{
public const string CollectionId = "articles";
public const string PartitionKey = "/partitionId";
[JsonProperty("id")]
public Guid ArticleId;
[JsonProperty("partitionId")]
public string PartitionId => Author;
[JsonProperty("heading")]
public string Heading;
[JsonProperty("imageUrl")]
public string ImageUrl;
[JsonProperty("body")]
public string Body;
[JsonProperty("tags")]
public string[] Tags;
[JsonProperty("visible")]
public bool Visible;
[JsonProperty("author")]
public string Author;
[JsonProperty("publishDate")]
[JsonConverter(typeof(IsoDateTimeConverter))]
public DateTime PublishDate;
[JsonProperty("lastUpdate")]
[JsonConverter(typeof(IsoDateTimeConverter))]
public DateTime LastUpdate;
}
Conclusion
InfoWorld’s article outlining the “Technology of the Year 2018” writes :
“Do you need a distributed NoSQL database with a choice of APIs and consistency models? That would be Microsoft’s Azure Cosmos DB.”
Azure Cosmos DB is easy to use. It supports multiple APIs (including a SQL API) and is compatible with MongoDB, Cassandra, and Gremlin. It also comes with client libraries in multiple different programming languages (with 5-minute quickstarts available).
Azure Cosmos DB is also useful for pretty much all sizes of applications, from small to planet-scale. You can start small and seamlessly grow data volume, throughput and number of regions. Just be careful about partition keys. The choice of partition key is very important. Typically, good partition keys are customer-ids, user-ids or device-ids. Don’t worry about having too many partitions. As your data grows, it is just beneficial that also the number of partitions grows.
Azure Cosmos DB is also a stable database service that comes with comprehensive guarantees (SLAs) of high availability and low latencies. It empowers developers to make precise tradeoffs in latency, throughput, availability, and consistency by offering five well-defined consistency levels.
Finally, Microsoft has a feedback website where you can vote for Azure Cosmos DB ideas/features and submit your ideas for voting. Microsoft actively comments on the ideas, and it gives some pretty good hints about what new things to expect in Azure Cosmos DB in the future!
References
Johan Åhlén is an internationally recognized consultant and Data Platform MVP. He is passionate about innovation and new technologies, and shares this passion through articles, presentations and videos. Johan founded and co-managed PASS SQLRally Nordic, the largest SQLRally conference in the world. He also founded SolidQ in Sweden. Johan built up the Swedish SQL Server Group and served as the president of the user group between 2009-2016. In 2017 he founded the Skåne Azure User Group. He has been recognized by TechWorld/Computer Sweden as one of the top developers in Sweden.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.