Previously we introduced Getting started with Azure Cosmos Database (A Deep Dive) blog which is an end-to-end introduction of Azure Cosmos DB. In this blog we are going to talk about one of the Azure Cosmos Database API known as Azure Cosmos Database for PostgreSQL.
Imagine a scenario where you are developing a multi-tenant application, as a developer, you have been working with PostgreSQL for years, you have joined a team that is developing an application that needs to handle data for thousands of tenants, each with isolated data. The team decides to use Azure Cosmos DB for PostgreSQL.
What is covered in this Blog
- Challenges with traditional PostgreSQL
- What is Azure Cosmos DB for PostgreSQL
- Citus for PostgreSQL.
- Provisioning Azure Cosmos DB for PostgreSQL via azure Portal.
- Open PSQL Shell on Azure (new feature on preview)
- How to create & query distribute tables in Azure Cosmos DB for PostgreSQL
In this blog, we shall leverage the azure cosmos DB for PostgreSQL which extends PostgreSQL with more enhanced capabilities. It will be a step-by-step guide on how to leverage this service which we are being provided and managed on Azure.
Challenges with traditional PostgreSQL
- Scaling Horizontally: As the user base grows, you need to scale your database horizontally to accommodate more tenants. Traditional PostgreSQL doesn’t natively support easy horizontal scaling across multiple nodes.
- Global Distribution: You need a database solution that can seamlessly replicate data across different regions for low-latency access, which is normally not the case with PostgreSQL.
- High Availability and Failover: Traditional PostgreSQL requires manual setup for replication and failover.
So, what is Azure Cosmos DB for PostgreSQL?
Azure Cosmos DB for PostgreSQL is a managed service for PostgreSQL extended with Citus superpower of distributed tables. Citus enables you to scale your application as requirements grow. This can be done by distributing them into different nodes of distributed tables.
Azure cosmosDB for postgreSQL empowers you to build a scalable and globally distributed application while leveraging your existing postgreSQL
Features of Azure Cosmos DB for PostgreSQL.
- Automatic High availability - Azure Cosmos DB for PostgreSQL ensures that your database remains available even in the face of failures. It automatically replicates data across multiple nodes and regions, minimizing downtime.
- Backups - Azure Cosmos DB for PostgreSQL automatically backs up your data, allowing you to restore it in case of accidental deletions, corruption, or other issues.
- Read-Replicas - Read-replicas allows you to offload read-heavy workloads from the primary database. This improves read performance and ensures that your application can handle high read traffic.
- Easy Monitoring - Monitoring your database’s health and performance is essential. Azure Cosmos DB for PostgreSQL provides built-in monitoring tools.
- Private Endpoint - By using private endpoints, you can restrict access to authorized networks, preventing exposure to the public internet.
- Encryption - Azure Cosmos DB for PostgreSQL encrypts data both at rest and in transit.
Citus for PostgreSQL.
Citus for PostgreSQL is a powerful extension that transforms Postgres into a distributed database, allowing you to achieve high performance at any scale. Each tenant’s data can reside on a separate shard, enabling parallel processing and efficient queries. As new tenants join, you can dynamically add shards without downtime.
We will have a look for more about Citus later in the blog. To learn more about Citus, visit the following link: citus documentation
Create an Azure Cosmos DB for PostgreSQL cluster in Azure portal
Prerequisites
- An azure account create azure account
- Azure subscription learn more on subscriptions
Part A: Basics
Step 1: Create a Cluster.
- Sign in to your azure account: signin to Azure and you will be taken to the dashboard.
- Create an Azure Cosmsos DB for PostgreSQL cluster using the direct link: Create an Azure Cosmos DB for PostgreSQL cluster or use the search option to find the resource.
Step 2: Click on create.
Step 3: select Create on the PostgreSQL tile
Step 4: Fill in the form to provision our cluster
Options you have:
- Subscription - choose the subscription to use.
- Resource Group - create resource group for your resource
- Cluster name - will determine the DNS name of your application.
- Location – Choose location near you.
- PostgreSQL version - choose major option like 15 or the latest you have.
- Database name - you can leave it as citus or change to your preferred name.
- Admin username - must be citus, do not change it.
- Password – choose a password for your access
- Scale – this option allows you to specify the amount of computing you need for your resource, feel free to click configure to adjust.
Part B: Networking
Step 5: Select Next to proceed to networking
- Select Allow public access from Azure services and resources within Azure to this cluster.
For our demo purposes, adding a firewall rule using Add 0.0.0.0-255.255.255.255, It will make your IP address access any host on the internet and it poses a threat. It is advised to use the rule for temporary purposes like testing a feature and later remove the rule.
- Select Review+ Create to do a validation then select Create to create the cluster. It will take a few minutes to deploy your resource, after it is done, click Go to resource.
Congratulations, you have Created an Azure Cosmos DB for PostgreSQL cluster.
Connect to a cluster via Open PSQL Shell on Azure
Open PSQL shell (a feature on preview) is a terminal based front-end to PostgreSQL that enables you to type in queries interactively, issue them to PostgreSQL, and see the query results. With this feature, you don't need to copy your connection strings to connect. To be authenticated, you will be required to enter your password.
In the image below, click on quick start to open the open PSQL shell window, click on Open PSQL shell icon indicated with number (2) at the top. You will open a terminal which will require you to enter the password you provided earlier.
To check the available databases, write:
Create and distribute tables in Azure Cosmos DB for PostgreSQL
As earlier stated, I promised to explain more about citus, let us get a better understanding of citus for PostgreSQL.
Citus for PostgreSQL is an extension that transforms PostgreSQL into a distributed database system. It is designed to horizontally scale out PostgreSQL across multiple servers, enabling it to handle larger datasets and higher query loads than a single PostgreSQL instance could manage alone.
Citus achieves this by partitioning tables and distributing data across a cluster of PostgreSQL servers. It also provides parallel query execution across these servers, allowing queries to be processed more quickly.
Citus is particularly useful for applications that require high scalability, such as large-scale analytics, real-time data processing, and multi-tenant applications.
We shall see how citus will enable us to create distributed tables in PostgreSQL, but first let us create tables for our citus database to see its importance.
Create Tables
- Run the following queries to create github_users and github_events tables:
CREATE TABLE github_users ( user_id bigint, url text, login text, avatar_url text, gravatar_id text, display_login text );
CREATE TABLE github_events(event_id bigint,event_type text,event_public boolean,repo_id bigint,payload jsonb,repo jsonb,user_id bigint,org jsonb,created_at timestamp);
- Create indexes event_type_index and payload_index on github_users and github_events respectively.
The indexes created are used to allow fast querying in the JSONB.
CREATE INDEX event_type_index ON github_events (event_type);
CREATE INDEX payload_index ON github_events USING GIN (payload jsonb_path_ops);
You should see the following on executing the commands.
Distributed tables
create_distributed_table() is the magic function that Azure Cosmos DB for PostgreSQL provides to distribute tables and use resources across multiple machines. The function decomposes tables into shards, which can be spread across nodes for increased storage and compute performance.
- Run the following commands to create distributed tables.
SELECT create_distributed_table('github_users', 'user_id');
SELECT create_distributed_table('github_events', 'user_id');
Note that if you do not distribute tables, the worker nodes cannot help running your queries.
- Load Data into Distributed Tables
To load a large amount of data, we are using data on azure blob storage. To achieve this, we first must create the extension in our database:
SELECT * FROM create_extension('azure_storage');
- Copy data from Blob Storage: Run the following commands to fetch data from CSV files and load into database tables:
-- download users and store in table
COPY github_users FROM 'https://pgquickstart.blob.core.windows.net/github/users.csv.gz';
-- download events and store in table
COPY github_events FROM 'https://pgquickstart.blob.core.windows.net/github/events.csv.gz';
You should see 264308 and 126245 has been successfully loaded within a short period of time.
- Viewing Distributed Tables
We can review details of our distributed tables, including their sizes. Run the following commands:
SELECT * FROM citus_tables;
Run queries in Azure Cosmos DB for PostgreSQL
Like any other database, we make use of it by querying to get information. Let us run some queries for our citus database which is powered with distributed tables.
- To check count of GitHub users:
- count all rows (across shards)
SELECT count(*) FROM github_users;
Azure Cosmos DB for PostgreSQL automatically runs the count on all shards in parallel, and combines the results because data is divided between multiple nodes.
- To find all events for a single user, run the following command.
SELECT created_at, event_type, repo->>'name' AS repo_name FROM github_events WHERE user_id = 3861633;
The results came in within a few seconds because of the power of the distributed tables in Azure Cosmos DB for PostgreSQL.
I do hope you have learnt from the blog and now have an idea of the power of Azure Cosmos DB for PostgreSQL. It is now time to practice and develop applications that make use AI (Artificial Intelligence) capabilities for fast processing of data.
Read more
- Introduction to Azure Cosmos DB
- Azure Database for PostgreSQL Training
- Clusters in Azure Cosmos DB for PostgreSQL
- Shard data on worker nodes
Available SDKs
- Connect and run SQL commands using C#
- Connect and run SQL commands using Java
- Connect and run SQL commands using Ruby