Previously we introduced Getting started with Azure Cosmos Database (A Deep Dive) blog which is an end-to-end introduction of Azure Cosmos DB. In this blog we are going to talk about one of the Azure Cosmos Database API known as Azure Cosmos Database for PostgreSQL.
Imagine a scenario where you are developing a multi-tenant application, as a developer, you have been working with PostgreSQL for years, you have joined a team that is developing an application that needs to handle data for thousands of tenants, each with isolated data. The team decides to use Azure Cosmos DB for PostgreSQL.
In this blog, we shall leverage the azure cosmos DB for PostgreSQL which extends PostgreSQL with more enhanced capabilities. It will be a step-by-step guide on how to leverage this service which we are being provided and managed on Azure.
Azure Cosmos DB for PostgreSQL is a managed service for PostgreSQL extended with Citus superpower of distributed tables. Citus enables you to scale your application as requirements grow. This can be done by distributing them into different nodes of distributed tables.
Azure cosmosDB for postgreSQL empowers you to build a scalable and globally distributed application while leveraging your existing postgreSQL
Citus for PostgreSQL is a powerful extension that transforms Postgres into a distributed database, allowing you to achieve high performance at any scale. Each tenant’s data can reside on a separate shard, enabling parallel processing and efficient queries. As new tenants join, you can dynamically add shards without downtime.
We will have a look for more about Citus later in the blog. To learn more about Citus, visit the following link: citus documentation
Step 1: Create a Cluster.
Step 2: Click on create.
Step 3: select Create on the PostgreSQL tile
Step 4: Fill in the form to provision our cluster
Options you have:
Step 5: Select Next to proceed to networking
For our demo purposes, adding a firewall rule using Add 0.0.0.0-255.255.255.255, It will make your IP address access any host on the internet and it poses a threat. It is advised to use the rule for temporary purposes like testing a feature and later remove the rule.
Congratulations, you have Created an Azure Cosmos DB for PostgreSQL cluster.
Open PSQL shell (a feature on preview) is a terminal based front-end to PostgreSQL that enables you to type in queries interactively, issue them to PostgreSQL, and see the query results. With this feature, you don't need to copy your connection strings to connect. To be authenticated, you will be required to enter your password.
In the image below, click on quick start to open the open PSQL shell window, click on Open PSQL shell icon indicated with number (2) at the top. You will open a terminal which will require you to enter the password you provided earlier.
To check the available databases, write:
As earlier stated, I promised to explain more about citus, let us get a better understanding of citus for PostgreSQL.
Citus for PostgreSQL is an extension that transforms PostgreSQL into a distributed database system. It is designed to horizontally scale out PostgreSQL across multiple servers, enabling it to handle larger datasets and higher query loads than a single PostgreSQL instance could manage alone.
Citus achieves this by partitioning tables and distributing data across a cluster of PostgreSQL servers. It also provides parallel query execution across these servers, allowing queries to be processed more quickly.
Citus is particularly useful for applications that require high scalability, such as large-scale analytics, real-time data processing, and multi-tenant applications.
We shall see how citus will enable us to create distributed tables in PostgreSQL, but first let us create tables for our citus database to see its importance.
CREATE TABLE github_users ( user_id bigint, url text, login text, avatar_url text, gravatar_id text, display_login text );
CREATE TABLE github_events(event_id bigint,event_type text,event_public boolean,repo_id bigint,payload jsonb,repo jsonb,user_id bigint,org jsonb,created_at timestamp);
The indexes created are used to allow fast querying in the JSONB.
CREATE INDEX event_type_index ON github_events (event_type);
CREATE INDEX payload_index ON github_events USING GIN (payload jsonb_path_ops);
You should see the following on executing the commands.
create_distributed_table() is the magic function that Azure Cosmos DB for PostgreSQL provides to distribute tables and use resources across multiple machines. The function decomposes tables into shards, which can be spread across nodes for increased storage and compute performance.
SELECT create_distributed_table('github_users', 'user_id');
SELECT create_distributed_table('github_events', 'user_id');
Note that if you do not distribute tables, the worker nodes cannot help running your queries.
To load a large amount of data, we are using data on azure blob storage. To achieve this, we first must create the extension in our database:
SELECT * FROM create_extension('azure_storage');
-- download users and store in table
COPY github_users FROM 'https://pgquickstart.blob.core.windows.net/github/users.csv.gz';
-- download events and store in table
COPY github_events FROM 'https://pgquickstart.blob.core.windows.net/github/events.csv.gz';
You should see 264308 and 126245 has been successfully loaded within a short period of time.
We can review details of our distributed tables, including their sizes. Run the following commands:
SELECT * FROM citus_tables;
Like any other database, we make use of it by querying to get information. Let us run some queries for our citus database which is powered with distributed tables.
SELECT count(*) FROM github_users;
Azure Cosmos DB for PostgreSQL automatically runs the count on all shards in parallel, and combines the results because data is divided between multiple nodes.
SELECT created_at, event_type, repo->>'name' AS repo_name FROM github_events WHERE user_id = 3861633;
The results came in within a few seconds because of the power of the distributed tables in Azure Cosmos DB for PostgreSQL.
I do hope you have learnt from the blog and now have an idea of the power of Azure Cosmos DB for PostgreSQL. It is now time to practice and develop applications that make use AI (Artificial Intelligence) capabilities for fast processing of data.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.