In this blog post, we describe our work on enabling machine learning (ML) inference (aka scoring) of previously trained ML models using the newly introduced language extensions of SQL Server 2019. By implementing a set of APIs, users can interface SQL Server with an external process (such as an ML runtime in our scenario) in order to move data and results between the main execution engine and the external process (which will perform the model scoring). As we will show in this post, this method is more efficient and more intuitive than performing model scoring using SQL Server's ML Services capability via Python scripts.
To ground the description of such an external language implementation for model scoring with a concrete example, we will discuss scoring for ONNX models, as well as models that can be converted to ONNX (e.g., TensorFlow, PyTorch, scikit-learn). However, the technique described here is generic and can be used to invoke any other ML runtime.
Note that another alternative is SQL Server’s native scoring capability using the PREDICT keyword for scoring ONNX models. Although this capability is available on Azure SQL Edge and Azure SQL Managed Instance (in preview), it is not yet present in all SQL Server flavors, including on-premises versions.
The first step to use ONNX as an external language is to register that language in SQL Server with the following command.
CREATE EXTERNAL LANGUAGE ONNX
FROM (CONTENT=N'onnxextension.zip', FILE_NAME=N'onnxextension.dll')
To perform the language registration, we point to the onnxextension.zip file that contains the onnxextension.dll. This dll includes our implementation of the extensibility framework C++ APIs, as we will describe later.
Next, the user can create a stored procedure that uses the ONNX external language we registered above to perform model scoring of a model over some data using the sp_execute_external_script capability. The procedure takes as input the ID of the model to be used for scoring (the model is stored as a binary object in a SQL Server table and is identified by that ID) and a SQL query that defines the data to be used.
CREATE OR ALTER PROCEDURE dbo.sp_onnx_scoring (@model_id int, @input_query nvarchar(max))
AS
BEGIN
DECLARE @model_binary varbinary(max) = (SELECT Data FROM models WHERE id = @model_id);
EXECUTE sp_execute_external_script
@language = N'ONNX',
@script = N'',
@input_data_1=@input_query,
@parallel = 1,
@params = N'@model varbinary(max)',
@model = @model_binary
WITH RESULT SETS (("length of stay" bigint))
END;
The example stored procedure declares (by setting the @language variable) that it is using the ONNX external language extension (created earlier). The input data is read into the @input_data_1 variable and the model is read into the @model_binary and @model variables.
The ‘EXECUTE sp_execute_external_script’ invokes execution of the language extension. The setting of @parallel to 1 enables parallel (multi-threaded) execution of the language extension.
The ‘WITH RESULT SETS’ directive specifies the return types of the results. In this case, the model returns an integer. In general, the type of the result returned by the model should match or be able to be implicitly converted to the type specified in the ‘RESULT SETS’ clause.
The stored procedure needs to be saved once and can then be invoked later for scoring different models with input data as follows.
EXEC sp_onnx_scoring <model num>, “SELECT <feature list> FROM dbo.patients_table"
The ‘EXEC’ command in the example invokes the stored procedure with a model number and a T-SQL query string that specifies the input data consisting of all rows and a subset of columns (used as features for the model) from a database table (patients_table). The model number identifies a model that is stored as a binary string (along with other models) in another table in the database.
Fig. 1. High-level architecture diagram of ONNX extension for SQL Server
Figure 1 shows a high-level architecture of the implementation of ONNX as external language in SQL Server. The steps/components marked with a numbered circle are briefly described below.
As we mentioned earlier, SQL Server 2019 introduced the extensibility framework that users can exploit to create their own language extension and invoke execution in that language through T-SQL queries. This framework consists of a set of C++ APIs that users can implement to copy data and parameters from SQL Server, perform operations on that data, and return results and performance telemetry to SQL Server. Reference implementations of Java, R, and Python using this extensibility framework have been open-sourced.
We now briefly discuss how some of the APIs of the SQL Server language extensibility framework can be implemented to support ONNX as an external language. The implementation is compiled into a dll that can then be registered with SQL Server for this external language (step 1 in the architecture description shown earlier).
The Windows and SQL Server data types used in the code snippets below for the implementation are the same as those used in the other open-sourced reference implementations for the extensibility framework, e.g., in Java as an external language.
LPCWSTR onnxDllPath = L"onnxruntime.dll";
std::wstring fullPath = wstr.get();
fullPath += L"\\onnxruntime.dll";
auto load_result = LoadLibraryExW(fullPath.c_str(), NULL, 0);
const OrtApi *api = OrtGetApiBase()->GetApi(ORT_API_VERSION);
……
return SQL_SUCCESS;
Note that the ONNX extension dll needs to be built with the delay load dlls option set for linker input properties, with the ORT dll specified as the dll to be delay loaded.
The OrtApi type definition is provided by the ORT C runtime API. The ORT_API_VERSION is set to the value corresponding to the ORT version, e.g., 4 for ORT v1.4.
We will call methods provided by the ORT by dereferencing the ‘api’ pointer. In Init, we call the api->CreateEnv(…) method provided by the ORT to create the runtime environment.
The call to Init and its execution corresponds to steps 4 and 5 of the architecture description shown earlier.
return EXTERNAL_LANGUAGE_EXTENSION_API;
if (!strcmp(reinterpret_cast<const char *>(ParamName), "@model")) {
SQLPOINTER m_modelPtr = new unsigned char[StrLen_or_Ind];
std::memcpy(m_modelPtr, ArgValue, StrLen_or_Ind);
}
……
return SQL_SUCCESS;
*RowsNumber = elemCount;
*Data = reinterpret_cast<SQLPOINTER *>(&(DataBuffer));
……
return SQL_SUCCESS;
We evaluated the performance of the following three different ways of scoring ONNX models with SQL Server. We used a SQL Server 2019 installation.
We used an input dataset ranging in size from 10K rows to approx. 3M rows. We measure end-to-end query runtimes, that include the total prediction time over all the rows in the particular table, with scoring for two ML models, namely a Decision Tree and a Logistic Regression. We average the runtimes over five executions of the same query.
Fig. 2. Query execution times with inference using different input sizes and models.
Figure 2 shows the average query run times as a function of the number of rows in the input data set, which is also equal to the number of predictions done with the models.
We make the following two observations from these experimental results.
Fig. 3. Query slowdowns, relative to in-process PREDICT, for different input sizes and models.
Figure 3 shows the average query run times relative to that with PREDICT with the same number of input rows. At small number of input rows, the computation for the model scoring is small. So, the overheads of setting up the external execution environment dominates. At larger input sizes, the slowdown values become smaller. In all cases, the slowdowns with Ext. C++ are smaller than with Ext. Python.
SQL Server 2019’s C++ extensibility framework offers a powerful capability to users for calling a wide range of ML model scoring engines to score corresponding ML models on tabular data provided by SQL Server. Our implementation enables users to perform scoring of ONNX models faster and more intuitively than with the existing Python execution. This blog post also provided an overview of how users can write their own ML engine interfacing extensions with the new extensibility framework.
We thank Ivan Popivanov, Jasraj Dange, Michelle Wallig, Carlo Curino, and members of the SQL Server extensibility team for their help and feedback on this work.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.