R + Azure Database for PostgreSQL
Published Aug 14 2019 03:16 PM 3,408 Views
Microsoft

Azure Database for PostgreSQL and R can be used together for data analysis – PostgreSQL as database engine and R as statistical tool. When dealing with large datasets that potentially exceed the memory of your machine it is recommended to push the data into database engine, where you can query the data in smaller digestible chunks.

 

RPostgres.jpg

In this article we will learn how to use R to perform the following tasks:

 

  • Create Azure Database for PostgreSQL using AzureRMR package
  • Connect to Azure Database for PostgreSQL using RPostgres package
  • Create databases and tables
  • Load data from dataframe into a table
  • Query data from table using dplyr grammar
  • Visualize data from table using ggplot2
  • Delete table, database and Azure Database for PostgreSQL server
# Install and load required packages
ipak <- function(pkg){
  new.pkg <- pkg[!(pkg %in% installed.packages()[, "Package"])]
  if (length(new.pkg)) 
    install.packages(new.pkg, dependencies = TRUE)
  sapply(pkg, require, character.only = TRUE)
}
packages <- c("AzureRMR", "RPostgres", "tidyverse", "curl", "fun")
ipak(packages)

# Create Azure Database for PostgreSQL using AzureRMR package
subscriptionId <- "ffffffff-ffff-ffff-ffff-ffffffffffff"
resourceGroup <- "test_group"
location <- "southcentralus"
pgUserName <- "azureuser"
pgPassword <- random_password(length = 12, replace = FALSE, extended = FALSE)
pgServerName <- "testserver"

az <- create_azure_login()
sub <- az$get_subscription(subscriptionId)
rg <- sub$create_resource_group(resourceGroup, location)
parameters <- jsonlite::toJSON(list(
  administratorLogin=list(value=pgUserName),
  administratorLoginPassword=list(value=pgPassword),
  location=list(value=location),
  serverName=list(value=pgServerName),
  skuCapacity=list(value=2),
  skuFamily=list(value="Gen5"),
  skuName=list(value="GP_Gen5_2"),
  skuSizeMB=list(value=5120),
  skuTier=list(value="GeneralPurpose"),
  version=list(value="10"),
  backupRetentionDays=list(value=7),
  geoRedundantBackup=list(value="Disabled")
), auto_unbox=TRUE)
template <- "https://raw.githubusercontent.com/Azure/azure-postgresql/master/arm-templates/ExampleWithFirewallRule/template.json"
vm_tpl <- rg$deploy_template("myNewPostgreSQLServer",
                             template=template,
                             parameters=parameters,
                             wait=TRUE)

# Connect to Azure Database for PostgreSQL
con <- dbConnect(RPostgres::Postgres(),
                 host= paste0(pgServerName, ".postgres.database.azure.com"),
                 dbname="postgres",
                 user=paste0(pgUserName, "@", pgServerName),
                 password=pgPassword)

# create iris database
irisTableName <- "iris"
dbSendQuery(con, paste("CREATE DATABASE", irisTableName))

# connect to iris database
con <- dbConnect(RPostgres::Postgres(),
                 host= paste0(pgServerName, ".postgres.database.azure.com"),
                 dbname=irisTableName,
                 user=paste0(pgUserName, "@", pgServerName),
                 password=pgPassword)

# create table iris and load data from iris dataframe
dbCreateTable(con, irisTableName, iris)
dbAppendTable(con, irisTableName, iris)
dbReadTable(con, irisTableName)
dbListFields(con, irisTableName)

# query iris table using dplyr
iristbl <- tbl(con, irisTableName)
iristbl %>% 
  group_by(Species) %>% 
  summarize(count=n())

# show the query string for dplyr 
iristbl %>% 
  group_by(Species) %>% 
  summarize(count=n()) %>% 
  show_query()

# visualize data using ggplot
irisTableData <- dbReadTable(con, irisTableName)
ggplot(data=irisTableData, aes(x = Sepal.Length, y = Sepal.Width)) +
  geom_point(aes(color=Species, shape=Species)) +
  xlab("Sepal Length") + 
  ylab("Sepal Width") +
  ggtitle("Sepal Length vs Width")

# Cleanup
dbRemoveTable(con, irisTableName)
dbSendQuery(con, paste("DROP DATABASE", irisTableName))
dbDisconnect(con)
rg$delete(confirm=FALSE)
rm(list = ls(all.names = TRUE))

 

REFERENCES : 

 

Getting started with PostgreSQL in R

 

Using PostgreSQL in R: A quick how-to

 

R and PostgreSQL – using RPostgreSQL and sqldf

Version history
Last update:
‎Aug 14 2019 03:16 PM
Updated by: