One of the big new things that the Hyperscale (Citus) option in the Azure Database for PostgreSQL managed service enables you to do—in addition to being able to scale out Postgres horizontally—is that you can now shard Postgres on a single Hyperscale (Citus) node. With a new Hyperscale (Citus) feature in preview called “Basic tier”, you can start small and start cost-effectively while being ready to scale out horizontally any time.
With the new Basic tier feature in Hyperscale (Citus)—in Preview—you can now:
With Basic tier, you can use all the features you’ve come to expect in a standard Hyperscale (Citus) server group. In other words, Basic tier is not limited in any way: with Basic tier, you don’t have to sacrifice any Hyperscale (Citus) capabilities.
You can read all about the Citus feature that underpins this new Basic tier feature in my recent open source blog post about sharding Postgres on a single Citus node, which we first rolled out in the new and spectacular Citus 10 open source release.
In this post, let’s walk through how to provision Basic tier on Hyperscale (Citus) in the Azure portal. And then, how to scale out your Hyperscale (Citus) node from Basic tier to Standard tier by adding more nodes in the portal.
What’s the Standard tier you ask? Standard tier is the new name for a multi-node Hyperscale (Citus) cluster, to differentiate a multi-node cluster from a Basic tier with just a single Hyperscale (Citus) node.
Also in this post, we’ll show you some of the ways it can be useful to shard Postgres on a single Hyperscale (Citus) node—with the new Basic tier. In particular, I like the cost-effective angle, since at $0.27 USD/hour, you can try out Hyperscale (Citus) for ~8 hours and you’ll only pay $2-3 USD.
To start using Basic tier in Hyperscale (Citus) in the Azure Database for PostgreSQL managed service, as always, a good place to start is docs.microsoft.com:
All the instructions you need to provision are in the Quickstart doc (link above) for provisioning a Basic tier in Hyperscale (Citus). Still, I want to highlight some of the steps in this visual guide, with screenshots from the Azure portal. Starting with:
Once you enable the preview features and get to the Compute + storage screen in the Azure portal’s provisioning workflow, remember to choose Basic tier:
As you follow the Basic tier quickstart documentation to provision Hyperscale (Citus), be sure to pay special attention to:
Provisioning can take a few minutes. After you’ve provisioned your Hyperscale (Citus) server group with Basic tier, you’ll see a “Your deployment is complete screen.” You can click the blue “Go to resource” button to go to the Basic tier instance you just created in Azure portal. Now you’re ready to follow the steps below to connect to the database.
In this post, I’ll use psql—an interactive terminal front-end for PostgreSQL—to connect to the Hyperscale (Citus) database and start sharding Postgres by creating distributed tables. (Azure Tip: You can also use psql
as part of Azure Cloud Shell, too!)
psql
command in the command line and connect to the database.
-- Create a table with the usual PostgreSQL syntax
CREATE TABLE users_table (user_id bigserial primary key, age int);
-- Convert the table to a distributed table
SELECT create_distributed_table('users_table', 'user_id');
The Hyperscale (Citus) create_distributed_table function divides the Postgres table across 32 shards, but you can still query Postgres as if all the data were in one table. Because the shards are regular Postgres tables, you can still rely on the wealth of Postgres relational database features like transactions, indexes, database constraints, JOINs, and more.
As you can see above, in just a few clicks, we were able to provision a Hyperscale (Citus) database using Basic tier—and then create a distributed table on a single Hyperscale (Citus) node.
Next, let’s explore why you would want to use this newfangled Basic tier thing anyway? How can sharding Postgres on a single node help you?
Sharding Postgres has long been associated with large-scale data sizes. In fact, when most of you think about how Citus shards Postgres, you probably imagine a Hyperscale (Citus) cluster with 2 or 4 worker nodes, or perhaps 20 or 50 or even 100 worker nodes. But with the introduction of Basic tier on Hyperscale (Citus), we can all think about sharding differently.
Even if the data sizes are not large, sharding Postgres on a single Hyperscale (Citus) node may provide immediate benefits. By using a distributed data model, you can get:
For example, after sharding the users_table
above with the create_distributed_table
function, the following SQL commands will now run in parallel across the Hyperscale (Citus) shards, which can significantly decrease execution times:
-- load data, ingestion happens in parallel across shards
INSERT INTO users_table (age)
SELECT 20 + (random() * 70)::int
FROM generate_series(0, 100000);
-- this query runs in parallel across all shards
SELECT avg(age) FROM users_table;
-- index created in parallel across all shards
CREATE INDEX user_age ON users_table (age);
When your data no longer fits in memory—or if the database server becomes CPU bound—then with Basic tier in Hyperscale (Citus), your data will already be sharded and you’ll be able to easily add more nodes to your cluster to keep your database performant.
In other words, with Basic tier in Hyperscale (Citus) you are already prepared to scale out, or what I like to call “scale-out-ready.”
When you’re ready to scale, it’s easy to add worker nodes to your Hyperscale (Citus) Basic tier. After all, your data model is already distributed. And you don’t need to migrate, or to upgrade, or to go through any kind of hassle. You simply add more worker nodes in the portal and graduate from the Basic tier to the Standard tier, as we’ll do together below.
When you add worker nodes to an Hyperscale (Citus) cluster, there’s no data on them yet. This is because all the shards are still on the old nodes—in this case, with Basic tier, all the data is on the single Hyperscale (Citus) node. So, initially all these new worker nodes will just be doing nothing.
This is where the Hyperscale (Citus) shard rebalancing feature comes in. Shard rebalancing ensures that the shards are distributed fairly across all the nodes in the cluster. The Citus shard rebalancer does this by moving shards from one node to another.
While a shard is being moved, all read and write queries can continue. In other words, Hyperscale (Citus) enables online rebalancing—or what some people call zero-downtime rebalancing—of the data.
Below, we’ll add new nodes to the Hyperscale (Citus) database cluster and rebalance shards across the cluster. We’ll just add 2 worker nodes, but you can scale out your Hyperscale (Citus) server group to as many nodes as you think required.
Once you’ve added the new Hyperscale (Citus) worker nodes and you’ve graduated your Basic tier to a Standard tier, you are ready to do some shard rebalancing.
You can connect back to the coordinator via psql
with the same connection string used earlier.
As I mentioned earlier, during the rebalance operation, all read and write queries can continue on the Hyperscale (Citus) database. In other words, your production workloads are not affected by the shard rebalancing.
rebalance_table_shards
.
-- move shards to new worker node(s)
SELECT rebalance_table_shards();
NOTICE: Moving shard 102008 from c.blogtest.postgres.database.azure.com:5432 to w1.blogtest.postgres.database.azure.com:5432 ...
NOTICE: Moving shard 102009 from c.blogtest.postgres.database.azure.com:5432 to w0.blogtest.postgres.database.azure.com:5432 ...
....
NOTICE: Moving shard 102028 from c.blogtest.postgres.database.azure.com:5432 to w1.blogtest.postgres.database.azure.com:5432 ...
During the shard rebalance operation in Hyperscale (Citus), you can monitor its progress. To do that, follow the steps below:
Now that the shards have been distributed across the Hyperscale (Citus) cluster, your application can use the resources on the worker node(s) as well as the coordinator node. Other than the increased compute, memory, and disk available to your application though—from your application’s perspective, nothing has changed.
After adding 2 worker nodes to the Hyperscale (Citus) database cluster, and after rebalancing shards across the cluster, your application is still talking to the same Postgres database. Congratulations, you have seamlessly scaled out your Postgres database with Hyperscale (Citus)!
You may be wondering:
“Why use Hyperscale (Citus) if I don’t need to scale out Postgres right now?”
Well, if you think that your database is not going to grow in the future (and that your database will remain less than ~100GB), then one of the other deployment options in our Azure Database for PostgreSQL managed service—such as Single Server or Flexible Server—is likely to handle your workload quite well. However, if you expect your application (and therefore your Postgres database) to grow over time, then the answer becomes relevant to you.
If you start small with Hyperscale (Citus)—by sharding Postgres on a single Hyperscale (Citus) node Basic tier—then as your application grows (in terms of users or activity or features or database size), you’ll be able to easily add nodes and use the zero downtime shard rebalancer on Hyperscale (Citus). Adding more nodes to your Hyperscale (Citus) server group will enable you to fit more data into memory, have higher I/O bandwidth, and provide more compute power (CPU) for your Postgres database. In other words, you will have all the means to keep your database performant for your application even as your application grows.
There are a few things you need to take into account to make Hyperscale (Citus) scale-out-ready with the new Basic tier. In scale-out architectures, the data is distributed based on a sharding key (what we call a distribution column in Citus.) The sharding key you choose will influence not only how the data gets distributed, but also what kind of query performance you will get. So a bit of up-front thinking about your query patterns and your data model can go a long way. For example, it is often not performant (or even possible) to enforce UNIQUE constraints on the Postgres columns that do not include distribution keys.
If you follow the data modeling best practices of a distributed database when using the Basic tier in Hyperscale (Citus) in Azure Database for PostgreSQL, you can easily scale out your database. The Citus best practices for scaling out your Postgres database seamlessly include:
If you build your database conforming to the best practices outlined above, the promise of Hyperscale (Citus) is that you’ll be able to scale out your Postgres database to some pretty large database cluster sizes. The key point here is that once you get used to the mindset of scale-out systems, you’ll realize how easy it is to follow the data modeling best practices.
Another interesting scenario for using Basic tier in Hyperscale (Citus) is with your development and test environments. As I mentioned earlier, the Basic tier does not have any limitations compared to Standard tier. So, for many users, Basic tier can be a practical and cost-effective way of provisioning development and testing databases.
As I mentioned earlier in the blog, the Basic tier is currently in preview. In fact, there are several super useful Hyperscale (Citus) features in preview, including:
We are so excited to bring you a preview of the new Basic tier feature (in preview!) in Hyperscale (Citus) on Azure Database for PostgreSQL.
And while “opens the door to new possibilities” may sound lofty, it’s true. Basic tier in Hyperscale (Citus) gives you a way to be “scale-out-ready” on day zero in the cloud. If your application is already running on a single-node Postgres, you now have a way to adopt a sharded data model that allows you to scale your Postgres database as much as you need in the future. In essence, as your application grows and you need to scale out, you won’t be faced with any type of database migration challenges.
Some useful links & resources when you are ready to try out Citus:
And if you have any questions about our Hyperscale (Citus) option in our Azure Database for PostgreSQL managed service, my product teammates would love to hear from you. You can always reach out to us over email at AskAzureDBforPostgreSQL. Or follow us on Twitter @AzureDBPostgres. We’d love to know what you think.
Footnotes
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.