Jun 06 2018
03:12 PM
- last edited on
Apr 07 2022
05:10 PM
by
TechCommunityAP
Jun 06 2018
03:12 PM
- last edited on
Apr 07 2022
05:10 PM
by
TechCommunityAP
Dear community!
Our team is struggling to implement the following data transformation.
Our data source provides information on how long a particular program was used on each computer.
Input data set (dummy)
Code:
datatable(Computer:string,StartTime:datetime,EndTime:datetime)
['PC01', datetime("2018-05-17T21:57:56.173"), datetime("2018-05-21T14:06:36.835"),
'PC01', datetime("2018-05-22T03:07:42.635"), datetime("2018-05-23T05:14:02.200"),
'PC02', datetime("2018-05-26T20:03:10.055"), datetime("2018-05-27T20:44:34.940")];
Preview:
Our goal is to reformat the given input rows such that each calendar day to become an own row. According to the first row PC01 was using the program for about 3 days and 16 hours, which are split up into 5 calendar days: May 17th, 18th, 19th, 20th and 21st. The program ran for 2h, 24h, 24h, 24h and 14h respectively on those days. Once we are done with this transformation, we would like to plot the total sum of usage hours per day (divided by the number of computers).
This is how the data should look after the transformation:
datatable(Computer:string,UsageDate:datetime,UsageDuration:string)
['PC01', datetime("2018-05-17"), "02:02:03.8269999",
'PC01', datetime("2018-05-18"), "1.00:00:00",
'PC01', datetime("2018-05-19"),"1.00:00:00",
'PC01', datetime("2018-05-20"),"1.00:00:00",
'PC01', datetime("2018-05-21"),"14:06:36.835",
'PC01', datetime("2018-05-22"),"20:52:17.3649999",
'PC01', datetime("2018-05-23"),"05:14:02.2000000",
'PC02', datetime("2018-05-26"), "03:56:49.9449999",
'PC02', datetime("2018-05-27"), "20:44:34.9400000"]
Preview
How far we've come:
We already figured out how to transform a single given tuple of (StartTime,EndTime) into an array of dates and an array of durations:
let getDailyUsageDates = (_StartTime:datetime, _EndTime:datetime) {
let Day0 = startofday(_StartTime);
let Day1 = startofday(_StartTime + 1d);
let Day2 = startofday(_EndTime - 1d);
let Day3 = startofday(_EndTime);
let T1 = range StartTime from Day0 to Day0 step 1d;
let T2 = range StartTime from Day3 to Day3 step 1d;
toscalar(range StartTime from Day1 to Day2 step 1d | union T1,T2 | order by StartTime asc | summarize makelist(StartTime) )
};
print getDailyUsageDates(datetime("2018-05-17T21:57:56.173"), datetime("2018-05-21T14:06:36.835"))
let getDailyUsageDuration = (_StartTime:datetime, _EndTime:datetime) {
let Day1 = startofday(_StartTime + 1d);
let Day2 = startofday(_EndTime - 1d);
let Day3 = startofday(_EndTime);
let Duration1 = endofday(_StartTime) - _StartTime;
let Duration2 = _EndTime - Day3;
let T1 = range StartTime from _StartTime to _StartTime step 1d | extend Duration = Duration1;
let T2 = range StartTime from Day3 to Day3 step 1d | extend Duration = Duration2;
toscalar(range StartTime from Day1 to Day2 step 1d | extend Duration = 1d | union T1,T2 | order by StartTime asc | summarize makelist(Duration) )
};
print getDailyUsageDuration(datetime("2018-05-17T21:57:56.173"), datetime("2018-05-21T14:06:36.835"))
That also works formatted as a table (maybe even easier to read):
let getDailyUsageDurationTable = (_Computer:string, _StartTime:datetime, _EndTime:datetime) {
let Day1 = startofday(_StartTime + 1d);
let Day2 = startofday(_EndTime - 1d);
let Day3 = startofday(_EndTime);
let Duration1 = endofday(_StartTime) - _StartTime;
let Duration2 = _EndTime - Day3;
let T1 = range StartTime from _StartTime to _StartTime step 1d | extend Duration = Duration1, Computer = _Computer ;
let T2 = range StartTime from Day3 to Day3 step 1d | extend Duration = Duration2, Computer = _Computer;
range StartTime from Day1 to Day2 step 1d | extend Duration = 1d, Computer = _Computer | union T1,T2 | order by StartTime asc
};
getDailyUsageDurationTable("PC01", datetime("2018-05-17T21:57:56.173"), datetime("2018-05-21T14:06:36.835"));
Preview:
The last table already looks very close to what we want to have, but unfortunately we could not find a way to transform more than just one row. Joins did not work in this scenario.
The following code should work, but it just returns a very generic error instead:
let getDailyUsageDates = (_StartTime:datetime, _EndTime:datetime) {
let Day0 = startofday(_StartTime);
let Day1 = startofday(_StartTime + 1d);
let Day2 = startofday(_EndTime - 1d);
let Day3 = startofday(_EndTime);
let T1 = range StartTime from Day0 to Day0 step 1d;
let T2 = range StartTime from Day3 to Day3 step 1d;
toscalar(range StartTime from Day1 to Day2 step 1d | union T1,T2 | order by StartTime asc | summarize makelist(StartTime) )
};
let getDailyUsageDuration = (_StartTime:datetime, _EndTime:datetime) {
let Day1 = startofday(_StartTime + 1d);
let Day2 = startofday(_EndTime - 1d);
let Day3 = startofday(_EndTime);
let Duration1 = endofday(_StartTime) - _StartTime;
let Duration2 = _EndTime - Day3;
let T1 = range StartTime from _StartTime to _StartTime step 1d | extend Duration = Duration1;
let T2 = range StartTime from Day3 to Day3 step 1d | extend Duration = Duration2;
toscalar(range StartTime from Day1 to Day2 step 1d | extend Duration = 1d | union T1,T2 | order by StartTime asc | summarize makelist(Duration) )
};
datatable(Computer:string,StartTime:datetime,EndTime:datetime)
['PC01', datetime("2018-05-17T21:57:56.173"), datetime("2018-05-21T14:06:36.835"),
'PC01', datetime("2018-05-22T03:07:42.635"), datetime("2018-05-23T05:14:02.200"),
'PC02', datetime("2018-05-26T20:03:10.055"), datetime("2018-05-27T20:44:34.940")]
| extend Daily_Dates = getDailyUsageDates(StartTime, EndTime), Daily_Durations = getDailyUsageDuration(StartTime, EndTime)
| mvexpand Daily_Dates, Daily_Durations
Returns: ERROR RETRIEVING DATA / Support id: 2fe33746-4990-4d2e-b5d9-6fe55ca47cbb
The approach using mvexpand in the last snippet might be heavily overcomplicating things and making the code execution inefficient. Thus, we're open to more efficient suggestions, too.
Thank you very very much in advance!
Best regards,
David
Jun 07 2018 01:06 PM
Hi,
I'm not sure I follow all your steps but is this what you are looking to have:
Jun 12 2018 12:40 AM - edited Jun 12 2018 12:41 AM
Hi Meir,
thank you for taking the time to reply. I'm afraid the code you proposed does not provide the output we are looking for.
In the first row of the input data, the user "PC01" started using our program on May 17th and closed the program on May 21st. Your code calculates how long the program was used in total
usage_duration = EndTime - StartTime
but we would rather like to know how long the program was used separately on each day. For example
After the transformation the data should look like this:
Best regards,
David
Jun 12 2018 02:40 AM
Hi,
I think the following query is solving the problem you are looking for:
Please make sure you test it. The sample you gave below have one calculation error that I have spotted.
Good luck,
Meir
Jun 19 2018 03:26 AM - edited Jun 19 2018 03:33 AM
SolutionDear Meir,
thank you very much for the proposed solution. Now it looks very close to what we would like to have, however I found that one row is missing.
datetime("2018-05-17T21:57:56.173"), datetime("2018-05-21T14:06:36.835")
The last day of this period is May 21st and the program was used until 14:06, which makes up 14h 06min.
The code you posted skips this row.
I found a solution for this small bug:
datatable(Computer:string,StartTime:datetime,EndTime:datetime)
['PC01', datetime("2018-05-17T21:57:56.173"), datetime("2018-05-21T14:06:36.835"),
'PC01', datetime("2018-05-22T03:07:42.635"), datetime("2018-05-23T05:14:02.200"),
'PC02', datetime("2018-05-26T20:03:10.055"), datetime("2018-05-27T20:44:34.940")]
| extend Day = range(startofday(StartTime), endofday(EndTime), 1d)
| mvexpand Day
| extend Day = startofday(todatetime(Day)) //explicit casting is required after mvexpand
| extend DayUsage = iif (StartTime < Day and EndTime > Day + 1d, 1d, 0h) // if program started before that day and executed after
| extend DayUsage = iif (StartTime < Day and EndTime <= Day + 1d, EndTime - Day, DayUsage) // if program started before that day but ended during that day
| extend DayUsage = iif (StartTime >= Day and EndTime > Day + 1d, Day + 1d - StartTime, DayUsage) // if program started during that day and executed after
| extend DayUsage = iif (StartTime >= Day and EndTime <= Day + 1d, EndTime - StartTime, DayUsage) // if program started during that day and ended during that day
| project-away StartTime, EndTime
Thank you very much for your help!!
Best,
David
Jun 19 2018 03:26 AM - edited Jun 19 2018 03:33 AM
SolutionDear Meir,
thank you very much for the proposed solution. Now it looks very close to what we would like to have, however I found that one row is missing.
datetime("2018-05-17T21:57:56.173"), datetime("2018-05-21T14:06:36.835")
The last day of this period is May 21st and the program was used until 14:06, which makes up 14h 06min.
The code you posted skips this row.
I found a solution for this small bug:
datatable(Computer:string,StartTime:datetime,EndTime:datetime)
['PC01', datetime("2018-05-17T21:57:56.173"), datetime("2018-05-21T14:06:36.835"),
'PC01', datetime("2018-05-22T03:07:42.635"), datetime("2018-05-23T05:14:02.200"),
'PC02', datetime("2018-05-26T20:03:10.055"), datetime("2018-05-27T20:44:34.940")]
| extend Day = range(startofday(StartTime), endofday(EndTime), 1d)
| mvexpand Day
| extend Day = startofday(todatetime(Day)) //explicit casting is required after mvexpand
| extend DayUsage = iif (StartTime < Day and EndTime > Day + 1d, 1d, 0h) // if program started before that day and executed after
| extend DayUsage = iif (StartTime < Day and EndTime <= Day + 1d, EndTime - Day, DayUsage) // if program started before that day but ended during that day
| extend DayUsage = iif (StartTime >= Day and EndTime > Day + 1d, Day + 1d - StartTime, DayUsage) // if program started during that day and executed after
| extend DayUsage = iif (StartTime >= Day and EndTime <= Day + 1d, EndTime - StartTime, DayUsage) // if program started during that day and ended during that day
| project-away StartTime, EndTime
Thank you very much for your help!!
Best,
David