First published on MSDN on Jan 20, 2017
Microsoft R Server
supports four cases of R transformations, such as
(lists of transform statements),
(a logical expression) and
in-line expressions in formulas
. In this article, let's focus on how to use "
" and "
" to do variable transformation. For all the following example, we use a
source from the test database of
Microsoft R Server
as the input data of function
, and run the examples in SQL compute context (
please note: transforms, transformFunc and their related parameters can be used in all compute contexts, including Teradata, Hadoop and Spark
). To understand how data transformation works in RevoScaleR, let's go through some concepts first:
: when R executes an expression, it first looks at the objects within the local environment, if the object is not found by name in that environment, R searches the enclosing environment of the local environment; if the object is not in the enclosing environment, R searches the enclosing environment of the enclosing environment, and so on.
: looking up variables in the calling environment rather than in the enclosing environment.
: the environment where the function was called.
used to get the calling environment.
: the environment where the function was created and used for lexical scoping. Every function has one and only one enclosing environment.
used to get the enclosing environment.
In rx functions of RevoScaleR,
argument is designed to use an expression of the form list (name = expression, ...) representing the first round of variable transformations, expression returns a vector. You can change the content, datatype of the vector, or remove it in the expression.
The Original data:
2. Using transformFunc
is different from
argument is a R function whose first argument and return value are named R lists with equal length vector elements. The output list of transformFunc can contain modifications or newly named elements. It's recommended way to do variable transformation.
3. Using transforms with UDF and "unknown" variable in UDF
can also be defined in the function call using the expression function. When you use UDF for transforms, you need to pass the UDF to the remote by using argument
since transforms expression gets evaluated in the server side
If an "unknown" variable is referred to in the UDF, you also need to specified the "unknown" variable in
which will pass the object into calling environment
To access the "unknown" variable in the UDF, you have to use dynamic scoping so that R looks up the "unknown" variable in calling environment, otherwise, R looks up the "unknown" variable in the enclosing environment of the UDF according to the lexical scoping.
here, R expression
is the dynamic scoping.
4. Using transformFunc with UDF and "unknown" variable in UDF
is different from
, it will be get evaluated at client side, you do not need to pass the UDF name to server side. In addition, the objects specified by
will be passed to enclosing environment of the transformation function, so you do not need to do dynamic scoping when you use
to do variable transformation.
5. Using transformEnvir with transforms
is a user-defined environment. It's used as parent environment of the transformation functions and contains the data specified by
. If there are multiple objects referenced by transform functions, you can bind those objects to an user-defined environment, and then just pass the environment in
to remote, instead of listing all the objects in
However, when using
to do variable transformation, you should set the user-defined environment as the enclosing environment of transformation function, otherwise R cannot find the "unknown" variable and the function in calling or enclosing environment.
6. Using transformEnvir with transformFunc
When you use
, the user-defined environment specified in
is passed to the remote. All the variables and functions binding to this user-defined environment will be in the calling and enclosing environment. So you do not need to set up the enclosing environment for the R transformation function in the R script.
is the recommended way to do variable transformation, for how to use
, please see
. Even though
can be used to do variable transformation as well, there are some difference about R scoping/environment and where to get evaluated between these two arguments.