conducting regression analysis using R via SQL 2017

Copper Contributor

i want perform regression analysis using R code via SQL MS 2017 (it's integrated here)

here native R code working with csv

The main matter of code that we perform regression  separately by groups [CustomerName]+[ItemRelation]+[DocumentNum]+[DocumentYear]

df=read.csv("C:/Users/synthex/Desktop/re.csv", sep=";",dec=",")

#load needed library

#order dataset
df=df[ order(df[,5]),]
df=df[ order(df[,6]),]
#delete signs

#create lm function for separately by group regression

my_lm <- function(df) {
  lm(SaleCount~IsPromo, data = df)

reg=df %>%
  group_by(CustomerName,ItemRelation,DocumentNum,DocumentYear) %>%
  nest() %>%
  mutate(fit = map(data, my_lm),
         tidy = map(fit, tidy)) %>%
  select(-fit, - data) %>%

w=aggregate(df$action, by=list(CustomerName=df$CustomerName,ItemRelation=df$ItemRelation, DocumentNum=df$DocumentNum, DocumentYear=df$DocumentYear), FUN=sum)

# multiply each group by the number of days of the action
EA<-data.frame(reg$CustomerName,reg$ItemRelation,reg$DocumentNum,reg$DocumentYear, reg$estimate*w$x)

#del intercepts
toDelete <- seq(2, nrow(EA), 2)
newdat=EA[ toDelete ,]


The finish result :this code runs in SSMS

So what  i did

EXECUTE sp_execute_external_script
      @language = N'R'
    , @script = N' OutputDataSet <- InputDataSet;'
    , @input_data_1 = N' SELECT     [CustomerName]
 FROM [Action].[dbo].[promo_data];'

    WITH RESULT SETS (([CustomerName] nvarchar(max) NOT NULL, [ItemRelation] int NOT NULL,
     [SaleCount] int NOT NULL,[DocumentNum]  int NOT NULL,
    [DocumentYear] int NOT NULL, [IsPromo] int NOT NULL));

Message 102, level 15, state 1, line 17
Incorrect syntax near the "=" construct.


So, how perform regression analysis in SQL separately by groups?

Note ,all coefficients must  be saved,  because new data come to the sql, should already automatically calculate by  the equation  of  constructed model for each group


The above code simply estimates the impact of the action, the beta coefficients of each group multiplies by the number of days of the action  for each group.


if it is need,here tge data in csv

0 Replies