SOLVED

Transforming/Dividing Timespans into smaller Timespans (Datatable to Datatable)

Deleted
Not applicable

Dear community!

 

Our team is struggling to implement the following data transformation. 

Our data source provides information on how long a particular program was used on each computer.

Input data set (dummy)

Code:

datatable(Computer:string,StartTime:datetime,EndTime:datetime)
['PC01', datetime("2018-05-17T21:57:56.173"), datetime("2018-05-21T14:06:36.835"),
'PC01', datetime("2018-05-22T03:07:42.635"), datetime("2018-05-23T05:14:02.200"),
'PC02', datetime("2018-05-26T20:03:10.055"), datetime("2018-05-27T20:44:34.940")];

Preview:

2018-05-28_09h00_14.png

 

Our goal is to reformat the given input rows such that each calendar day to become an own row. According to the first row PC01 was using the program for about 3 days and 16 hours, which are split up into 5 calendar days: May 17th, 18th, 19th, 20th and 21st. The program ran for 2h, 24h, 24h, 24h and 14h respectively on those days. Once we are done with this transformation, we would like to plot the total sum of usage hours per day (divided by the number of computers).

 

This is how the data should look after the transformation:

 

datatable(Computer:string,UsageDate:datetime,UsageDuration:string)
['PC01', datetime("2018-05-17"), "02:02:03.8269999",
'PC01', datetime("2018-05-18"), "1.00:00:00",
'PC01', datetime("2018-05-19"),"1.00:00:00",
'PC01', datetime("2018-05-20"),"1.00:00:00",
'PC01', datetime("2018-05-21"),"14:06:36.835",
'PC01', datetime("2018-05-22"),"20:52:17.3649999",
'PC01', datetime("2018-05-23"),"05:14:02.2000000",
'PC02', datetime("2018-05-26"), "03:56:49.9449999",
'PC02', datetime("2018-05-27"), "20:44:34.9400000"]

Preview

2018-05-28_08h58_59.png

 

How far we've come:

 

We already figured out how to transform a single given tuple of (StartTime,EndTime) into an array of dates and an array of durations:

 

let getDailyUsageDates = (_StartTime:datetime, _EndTime:datetime) {
    let Day0 = startofday(_StartTime);
    let Day1 = startofday(_StartTime + 1d);
    let Day2 = startofday(_EndTime - 1d);
    let Day3 = startofday(_EndTime);
    let T1 = range StartTime from Day0 to Day0 step 1d;
    let T2 = range StartTime from Day3 to Day3 step 1d;
    toscalar(range StartTime from Day1 to Day2 step 1d | union T1,T2 | order by StartTime asc | summarize makelist(StartTime) )
};
print getDailyUsageDates(datetime("2018-05-17T21:57:56.173"), datetime("2018-05-21T14:06:36.835"))

 

 

let getDailyUsageDuration = (_StartTime:datetime, _EndTime:datetime) {
    let Day1 = startofday(_StartTime + 1d);
    let Day2 = startofday(_EndTime - 1d);
    let Day3 = startofday(_EndTime);
    let Duration1 = endofday(_StartTime) - _StartTime;
    let Duration2 = _EndTime - Day3;
    let T1 = range StartTime from _StartTime to _StartTime step 1d | extend Duration = Duration1;
    let T2 = range StartTime from Day3 to Day3 step 1d | extend Duration = Duration2;
    toscalar(range StartTime from Day1 to Day2 step 1d | extend Duration = 1d  | union T1,T2 | order by StartTime asc | summarize makelist(Duration) )
};
print getDailyUsageDuration(datetime("2018-05-17T21:57:56.173"), datetime("2018-05-21T14:06:36.835"))

 

 That also works formatted as a table (maybe even easier to read):

 

let getDailyUsageDurationTable = (_Computer:string, _StartTime:datetime, _EndTime:datetime) {
    let Day1 = startofday(_StartTime + 1d);
    let Day2 = startofday(_EndTime - 1d);
    let Day3 = startofday(_EndTime);
    let Duration1 = endofday(_StartTime) - _StartTime;
    let Duration2 = _EndTime - Day3;
    let T1 = range StartTime from _StartTime to _StartTime step 1d | extend Duration = Duration1, Computer = _Computer ;
    let T2 = range StartTime from Day3 to Day3 step 1d | extend Duration = Duration2, Computer = _Computer;
    range StartTime from Day1 to Day2 step 1d | extend Duration = 1d, Computer = _Computer  | union T1,T2 | order by StartTime asc
};
getDailyUsageDurationTable("PC01", datetime("2018-05-17T21:57:56.173"), datetime("2018-05-21T14:06:36.835"));

Preview:

2018-05-28_09h13_30.png

 

The last table already looks very close to what we want to have, but unfortunately we could not find a way to transform more than just one row. Joins did not work in this scenario.

 

The following code should work, but it just returns a very generic error instead:

 

let getDailyUsageDates = (_StartTime:datetime, _EndTime:datetime) {
    let Day0 = startofday(_StartTime);
    let Day1 = startofday(_StartTime + 1d);
    let Day2 = startofday(_EndTime - 1d);
    let Day3 = startofday(_EndTime);
    let T1 = range StartTime from Day0 to Day0 step 1d;
    let T2 = range StartTime from Day3 to Day3 step 1d;
    toscalar(range StartTime from Day1 to Day2 step 1d | union T1,T2 | order by StartTime asc | summarize makelist(StartTime) )
};
let getDailyUsageDuration = (_StartTime:datetime, _EndTime:datetime) {
    let Day1 = startofday(_StartTime + 1d);
    let Day2 = startofday(_EndTime - 1d);
    let Day3 = startofday(_EndTime);
    let Duration1 = endofday(_StartTime) - _StartTime;
    let Duration2 = _EndTime - Day3;
    let T1 = range StartTime from _StartTime to _StartTime step 1d | extend Duration = Duration1;
    let T2 = range StartTime from Day3 to Day3 step 1d | extend Duration = Duration2;
    toscalar(range StartTime from Day1 to Day2 step 1d | extend Duration = 1d  | union T1,T2 | order by StartTime asc | summarize makelist(Duration) )
};
datatable(Computer:string,StartTime:datetime,EndTime:datetime)
['PC01', datetime("2018-05-17T21:57:56.173"), datetime("2018-05-21T14:06:36.835"),
'PC01', datetime("2018-05-22T03:07:42.635"), datetime("2018-05-23T05:14:02.200"),
'PC02', datetime("2018-05-26T20:03:10.055"), datetime("2018-05-27T20:44:34.940")]
| extend Daily_Dates = getDailyUsageDates(StartTime, EndTime), Daily_Durations = getDailyUsageDuration(StartTime, EndTime)
| mvexpand Daily_Dates, Daily_Durations

 

Returns: ERROR RETRIEVING DATA / Support id: 2fe33746-4990-4d2e-b5d9-6fe55ca47cbb

 

The approach using mvexpand in the last snippet might be heavily overcomplicating things and making the code execution inefficient. Thus, we're open to more efficient suggestions, too.

 

Thank you very very much in advance!

 Best regards,

David

4 Replies

Hi,

 

I'm not sure I follow all your steps but is this what you are looking to have:

 

datatable(Computer:string,StartTime:datetime,EndTime:datetime)
['PC01', datetime("2018-05-17T21:57:56.173"), datetime("2018-05-21T14:06:36.835"),
'PC01', datetime("2018-05-22T03:07:42.635"), datetime("2018-05-23T05:14:02.200"),
'PC02', datetime("2018-05-26T20:03:10.055"), datetime("2018-05-27T20:44:34.940")]
| extend StartTimeDay = startofday(StartTime)
| extend usage_duration = EndTime - StartTime
| summarize sum(usage_duration) by StartTimeDay, Computer
 
Hope it helps,
Meir 

Hi Meir,

 

thank you for taking the time to reply. I'm afraid the code you proposed does not provide the output we are looking for. 

 

In the first row of the input data, the user "PC01" started using our program on May 17th and closed the program on May 21st. Your code calculates how long the program was used in total

usage_duration = EndTime - StartTime

but we would rather like to know how long the program was used separately on each day. For example

  • May 17th the program was used for 2 hours and 2 minutes
  • May 18th the program was used for 24 hours ... 
  • May 21st the program was used for 14 hours and 6 minutes

After the transformation the data should look like this:

2018-05-28_08h58_59.png

 

Best regards,

David

Hi,

 

I think the following query is solving the problem you are looking for:

 

datatable(Computer:string,StartTime:datetime,EndTime:datetime)
['PC01', datetime("2018-05-17T21:57:56.173"), datetime("2018-05-21T14:06:36.835"),
'PC01', datetime("2018-05-22T03:07:42.635"), datetime("2018-05-23T05:14:02.200"),
'PC02', datetime("2018-05-26T20:03:10.055"), datetime("2018-05-27T20:44:34.940")]
| extend Day = range(StartTime,EndTime, 1d)
| mvexpand Day
| extend Day = startofday(todatetime(Day)) //explicit casting is required after mvexpand
| extend DayUsage = iif (StartTime < Day and EndTime > Day + 1d, 1d, 0h) // if program started before that day and executed after
| extend DayUsage = iif (StartTime < Day and EndTime <= Day + 1d, EndTime - Day, DayUsage) // if program started before that day but ended during that day
| extend DayUsage = iif (StartTime >= Day and EndTime > Day + 1d, Day + 1d - StartTime, DayUsage) // if program started during that day and executed after
| extend DayUsage = iif (StartTime >= Day and EndTime <= Day + 1d, EndTime - StartTime, DayUsage) // if program started during that day and ended during that day 

 

Please make sure you test it. The sample you gave below have one calculation error that I have spotted.

 

Good luck,

Meir 

best response
Solution

Dear Meir,

 

thank you very much for the proposed solution. Now it looks very close to what we would like to have, however I found that one row is missing.

datetime("2018-05-17T21:57:56.173"), datetime("2018-05-21T14:06:36.835")

The last day of this period is May 21st and the program was used until 14:06, which makes up 14h 06min.

The code you posted skips this row.

Left = Your result, right = the solution we wantLeft = Your result, right = the solution we want

 I found a solution for this small bug: 

datatable(Computer:string,StartTime:datetime,EndTime:datetime)
['PC01', datetime("2018-05-17T21:57:56.173"), datetime("2018-05-21T14:06:36.835"),
'PC01', datetime("2018-05-22T03:07:42.635"), datetime("2018-05-23T05:14:02.200"),
'PC02', datetime("2018-05-26T20:03:10.055"), datetime("2018-05-27T20:44:34.940")]
| extend Day = range(startofday(StartTime), endofday(EndTime), 1d)
| mvexpand Day
| extend Day = startofday(todatetime(Day)) //explicit casting is required after mvexpand
| extend DayUsage = iif (StartTime < Day and EndTime > Day + 1d, 1d, 0h) // if program started before that day and executed after
| extend DayUsage = iif (StartTime < Day and EndTime <= Day + 1d, EndTime - Day, DayUsage) // if program started before that day but ended during that day
| extend DayUsage = iif (StartTime >= Day and EndTime > Day + 1d, Day + 1d - StartTime, DayUsage) // if program started during that day and executed after
| extend DayUsage = iif (StartTime >= Day and EndTime <= Day + 1d, EndTime - StartTime, DayUsage) // if program started during that day and ended during that day
| project-away StartTime, EndTime

 

Thank you very much for your help!!

Best,

David