As enterprise data scientists access SQL Server R Services to perform their in-database analytic work such as data mining and machine learning, the R language's full power cannot be materialized without custom yet secure R package installation. SQL Server has offered R package management functionality since 2016 (SQL R Services 9.0) which enables a team of data scientists to, for example, manage R packages on a per-database basis, keep R packages in sync with the database, share R packages and/or each have his/her own custom installation and automatically resolve R package dependencies during uninstallation. This also eliminates the need for the database administrator to understand R and its package dependencies and gives R packages enterprise-level authentication and security.
Currently the rx function APIs are only for use in a
SQL Server compute context
. They are a convenient wrapper of the suite of package management functions like
install.packages()
in base R.
Custom and secure package management is governed by scopes
and database roles
By default, any user can use the packages in the shared scope. The beauty is that, each rpkgs-private role user can also install his/her own version of this package in his/her private scope. And when this happens, the shared-scope package will be masked, meaning that when this user calls
library(<package name>)
, only his/her installed private-scope version of this package will be loaded.
We will walk through a simple example to mimic a mock yet typical scenario where data scientists collaborate in a database.
First we create a database called RevoTestDB. Then we need to
Step 2 will create the additional database roles rpkgs-private, rpkgs-shared and rpkgs-users which are specific for package management as shown on the left. Then we run the T-SQL script below to create the following database users, each taking a specific database role, with their level of permissions in descending order:
data scientist lead | db_owner |
senior data scientist | rpkgs-shared |
junior data scientist | rpkgs-private |
other users | rpkgs-users |
The script also grants other roles and permissions needed for these users to use the database as required in the
set up SQL Server R Services
step.
To install the package 'scales' in shared scope for anyone to use, the senior data scientist can run the following script, and by default he/she installed the lastest version (0.4.1) of 'scales' from MRAN :
Then any user can use the package by either passing the package name to
transformPackages
in an rx fucntion or loading the package in an
rxExec
call. For example,
Now say the junior data scientist wants to install an older version of scales (0.4.0) for himself/herself's own use, he/she can, for example, download the .zip file to his local working directory and run the following script to install scales 0.4.0 from the downloaded .zip file by specifying
repos = NULL
in
rxInstallPackages
and verify that's actually the version that's loaded. Other users will not be affected by the junior data scientist's installation and will still be using scales 0.4.1.
The data scientist lead can install/uninstall any packages on behalf of any user. And when he/she uninstalls a package, the dependencies are automatically taken care of by specifying
dependencies = TRUE
in
rxRemovePackages
.
The output will be something like this:
To change a SQL Server instance's system package installation manually, refer to Install Additional R Packages on SQL Server .
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.