Apr 24 2018 07:35 AM
i want perform regression analysis using R code via SQL MS 2017 (it's integrated here)
here native R code working with csv
The main matter of code that we perform regression separately by groups [CustomerName]+[ItemRelation]+[DocumentNum]+[DocumentYear]
df=read.csv("C:/Users/synthex/Desktop/re.csv", sep=";",dec=",")
#load needed library
library(tidyverse)
library(broom)
#order dataset
df=df[ order(df[,5]),]
df=df[ order(df[,6]),]
#delete signs
df$Customer<-gsub("\\-","",df$Customer)
#create lm function for separately by group regression
my_lm <- function(df) {
lm(SaleCount~IsPromo, data = df)
}
reg=df %>%
group_by(CustomerName,ItemRelation,DocumentNum,DocumentYear) %>%
nest() %>%
mutate(fit = map(data, my_lm),
tidy = map(fit, tidy)) %>%
select(-fit, - data) %>%
unnest()
w=aggregate(df$action, by=list(CustomerName=df$CustomerName,ItemRelation=df$ItemRelation, DocumentNum=df$DocumentNum, DocumentYear=df$DocumentYear), FUN=sum)
View(w)
# multiply each group by the number of days of the action
EA<-data.frame(reg$CustomerName,reg$ItemRelation,reg$DocumentNum,reg$DocumentYear, reg$estimate*w$x)
#del intercepts
toDelete <- seq(2, nrow(EA), 2)
newdat=EA[ toDelete ,]
View(newdat)
The finish result :this code runs in SSMS
So what i did
EXECUTE sp_execute_external_script
@language = N'R'
, @script = N' OutputDataSet <- InputDataSet;'
, @input_data_1 = N' SELECT [CustomerName]
,[ItemRelation]
,[SaleCount]
,[DocumentNum]
,[DocumentYear]
,[IsPromo]
FROM [Action].[dbo].[promo_data];'
WITH RESULT SETS (([CustomerName] nvarchar(max) NOT NULL, [ItemRelation] int NOT NULL,
[SaleCount] int NOT NULL,[DocumentNum] int NOT NULL,
[DocumentYear] int NOT NULL, [IsPromo] int NOT NULL));
df=as.data.frame(InputDataSet)
Message 102, level 15, state 1, line 17
Incorrect syntax near the "=" construct.
So, how perform regression analysis in SQL separately by groups?
Note ,all coefficients must be saved, because new data come to the sql, should already automatically calculate by the equation of constructed model for each group
The above code simply estimates the impact of the action, the beta coefficients of each group multiplies by the number of days of the action for each group.
if it is need,here tge data in csv