Citus is an extension to Postgres that lets you distribute your application’s workload across multiple nodes. Whether you are using Citus open source or using Citus as part of a managed Postgres service in the cloud, one of the first things you do when you start using Citus is to distribute your tables. While distributing your Postgres tables you need to decide on some properties such as distribution column, shard count, colocation. And even before you decide on your distribution column (sometimes called a distribution key, or a sharding key), when you create a Postgres table, your table is created with an access method.
Previously you had to decide on these table properties up front, and then you went with your decision. Or if you really wanted to change your decision, you needed to start over. The good news is that in Citus 10, we introduced 2 new user-defined functions (UDFs) to make it easier for you to make changes to your distributed Postgres tables.
Before Citus 9.5, if you wanted to change any of the properties of the distributed table, you would have to create a new table with the desired properties and move everything to this new table. But in Citus 9.5 we introduced a new function, undistribute_table. With the
undistribute_table function you can convert your distributed Citus tables back to local Postgres tables and then distribute them again with the properties you wish. But undistributing and then distributing again is … 2 steps. In addition to the inconvenience of having to write 2 commands, undistributing and then distributing again has some more problems:
So, in Citus 10, we introduced 2 new functions to reduce the steps you need to make changes to your tables:
In this post you’ll find some tips about how to use the
alter_distributed_table function to change the shard count, distribution column, and the colocation of a distributed Citus table. And we'll show how to use the
alter_table_set_access_method function to change, well, the access method. An important note: you may not ever need to change your Citus table properties. We just want you to know, if you ever do, you have the flexibility. And with these Citus tips, you will know how to make the changes.
When you distribute a Postgres table with the create_distributed_table function, you must pick a distribution column and set the
distribution_column parameter. During the distribution, Citus uses a configuration variable called shard_count for deciding the shard count of the table. You can also provide
colocate_with parameter to pick a table to colocate with (or colocation will be done automatically, if possible).
However, after the distribution if you decide you need to have a different configuration, starting from Citus 10, you can use the alter_distributed_table function.
alter_distributed_table has three parameters you can change:
Citus divides your table into shards based on the distribution column you select while distributing. Picking the right distribution column is crucial for having a good distributed database experience. A good distribution column will help you parallelize your data and workload better by dividing your data evenly and keeping related data points close to each other. However, choosing the distribution column might be a bit tricky when you’re first getting started. Or perhaps later as you make changes in your application, you might need to pick another distribution column.
distribution_column parameter of the new
alter_distributed_table function, Citus 10 gives you an easy way to change the distribution column.
Let’s say you have customers and orders that your customers make. You will create some Postgres tables like these:
CREATE TABLE customers (customer_id BIGINT, name TEXT, address TEXT); CREATE TABLE orders (order_id BIGINT, customer_id BIGINT, products BIGINT);
When first distributing your Postgres tables with Citus, let’s say that you decided to distribute the
customers table on
customer_id and the
orders table on
SELECT create_distributed_table ('customers', 'customer_id'); SELECT create_distributed_table ('orders', 'order_id');
Later you might realize distributing the
orders table by the
order_id column might not be the best idea. Even though
order_id could be a good column to evenly distribute your data, it is not a good choice if you frequently need to join the
orderstable with the
customers table on the
customer_id. When both tables are distributed by
customer_id you can use colocated joins, which are very efficient compared to joins on other columns.
So, if you decide to change the distribution column of
orders table into
customer_id here is how you do it:
SELECT alter_distributed_table ('orders', distribution_column := 'customer_id');
orders table is distributed by
customer_id. So, the customers and the orders of the customers are in the same node and close to each other, and you can have fast joins and foreign keys that include the
You can see the new distribution column on the citus_tables view:
SELECT distribution_column FROM citus_tables WHERE table_name::text = 'orders';
Shard count of a distributed Citus table is the number of pieces the distributed table is divided into. Choosing the shard count is a balance between the flexibility of having more shards, and the overhead for query planning and execution across the shards. Like distribution column, the shard count is also set while distributing the table. If you want to pick a different shard count than the default for a table, during the distribution process you can use the citus.shard_count configuration variable, like this:
CREATE TABLE products (id BIGINT, name TEXT); SET citus.shard_count TO 20; SELECT create_distributed_table ('products', 'id');
After distributing your table, you might decide the shard count you set was not the best option. Or your first decision on the shard count might be good for a while but your application might grow in time, you might add new nodes to your Citus cluster, and you might need more shards. The
alter_distributed_table function has you covered in the cases that you want to change the shard count too.
To change the shard count you just use the
SELECT alter_distributed_table ('products', shard_count := 30);
After the query above, your table will have 30 shards. You can see your table’s shard count on the
SELECT shard_count FROM citus_tables WHERE table_name::text = 'products';
When two Postgres tables are colocated in Citus, the rows of the tables that have the same value in the distribution column will be on the same Citus node. Colocating the right tables will help you with better relational operations. Like the shard count and the distribution column, the colocation is also set while distributing your tables. You can use the
colocate_with parameter to change the colocation.
SELECT alter_distributed_table ('products', colocate_with := 'customers');
Again, like the distribution column and shard count, you can find information about your tables’ colocation groups on the
SELECT colocation_id FROM citus_tables WHERE table_name IN ('products', 'customers');
You can also use
none keywords with
colocate_with parameter to change the colocation group of the table to default, or to break any colocation your table has.
To colocate distributed Citus tables, the distributed tables need to have the same shard counts. But if the tables you want to colocate don’t have the same shard count, worry not, because
alter_distributed_table will automatically understand this. Then your table’s shard count will also be updated to match the new colocation group’s shard count.
Here is a tip! If you want to change multiple properties of your distributed Citus tables at the same time, you can simply use multiple parameters of the
For example, if you want to change both the shard count and the distribution column of a table here's how you do it:
SELECT alter_distributed_table ('products', distribution_column := 'name', shard_count := 35);
If your table is colocated with some other tables and you want to change the shard count of all of the tables to keep the colocation, you might be wondering if you have to alter them one by one... which is multiple steps.
Yes (you can see a pattern here) the Citus tip is that you can use the
alter_distributed_table function to change the properties of all of the colocation group.
If you decide the change you make with the
alter_distributed_table function needs to be done to all the tables that are colocated with the table you are changing, you can use the
SET citus.shard_count TO 10; SELECT create_distributed_table ('customers', 'customer_id'); SELECT create_distributed_table ('orders', 'customer_id', colocate_with := 'customers'); -- when you decide to change the shard count -- of all of the colocation group SELECT alter_distributed_table ('customers', shard_count := 20, cascade_to_colocated := true);
You can see the updated shard count of both tables on the
SELECT shard_count FROM citus_tables WHERE table_name IN ('customers', 'orders');
Another amazing feature introduced in Citus 10 is columnar storage. This Citus 10 columnar blog post walks you through how it works and how to use columnar tables (or partitions) with Citus—complete with a Quickstart. Oh, and Jeff made a short video demo about the new Citus 10 columnar functionality too—it’s worth the 13 minutes to watch IMHO.
With Citus columnar, you can optionally choose to store your tables grouped by columns—which gives you the benefits of compression, too. Of course, you don’t have to use the new columnar access method—the default access method is “heap” and if you don’t specify an access method, then your tables will be row-based tables (with the heap access method.)
It would not be fair to introduce this cool new Citus columnar access method without also giving you a way to convert your tables to columnar. So Citus 10 also introduced a way to change the access method of tables.
SELECT alter_table_set_access_method('orders', 'columnar');
You can use alter_table_set_access_method to convert your table to any other access method too, such as
heap, Postgres’s default access method. Also, your table doesn’t even need to be a distributed Citus table. You can also use
alter_table_set_access_method with Citus reference tables as well as regular Postgres tables. You can even change the access method of a Postgres partition with
If you’ve read the blog post about undistribute_table, the function Citus 9.5 introduced for turning distributed Citus tables back to local Postgres tables, you mostly know how the
alter_table_set_access_method functions work. Because we use the same underlying methodology as the
undistribute_table function. Well, we improved upon it.
Dropping a table for the purpose of re-creating the same table with different properties is not a simple task. Dropping the table will also drop many things that depend on the table.
Just like the
undistribute_table function, the
alter_table_set_access_method functions do a lot to preserve the properties of the table you didn’t want to change. The functions will handle indexes, sequences, views, constraints, table owner, partitions and more—just like undistribute_table.
alter_table_set_access_method will also recreate the foreign keys on your tables whenever possible. For example, if you change the shard count of a table with the
alter_distributed_table function and use
cascade_to_colocated := true to change the shard count of all the colocated tables, then foreign keys within the colocation group and foreign keys from the distributed tables of the colocation group to Citus reference tables will be recreated.
If you want to learn more about our previous work which we build on for
alter_table_set_access_method functions go check out our blog post on undistribute_table.
In Citus 10 we worked to give you more tools and more capabilities for making changes to your distributed database. When you’re just starting to use Citus, the new
alter_table_set_access_method functions—along with the
undistribute_table function—are all here to help you experiment and find the database configuration that works the best for your application. And in the future, if and when your application evolves, these three Citus functions will be ready to help you evolve your Citus database, too.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.