Oct 13 2020 04:24 PM
Oct 13 2020 04:24 PM
I have eagerly adopted Power Query as a new way of interacting with data as it is so much easier.
However I've run into a challenge and was hoping to get some guidance here.
Referring to the screenshot below I have a table with 3 columns: 'Job No', 'Onsite time' and 'Date_created'. This data essentially represents trips by a service technician to service machines. It's possible that there are multiple trips and the screenshot shows job no '89452' as an example for that. In these cases I need to select the earliest date; i.e. 4/8/2020 11:09 in the example. Given it is the same job, 'Date_created' has the same timestamp. so I need help in identifying (e.g. set a flag in a new column etc.).
With the record identified I then need to look up the value in column 'Response time' which is simply the difference between 'Onsite Time' and 'Date_created'.
There's probably a few ways to address this but given I'm learning Power Query the most elegant way is probably to 'add' a 'conditional column' in the query to simply identify the record with a flag which we then can use to do a VLOOKUP() - 2.4hrs in the case of job no 89452. This logic would have to identify even single records such as job No 89491 (49.3hrs).
Another approach may be to 'add' a 'conditional column' to directly create the 'Response Time' rather than having to do the VLOOKUP() manually.
Oct 14 2020 07:18 AM
@tfmeier You mention to have a table with three columns in your screenshot. But, in fact, at least (parts of) seven columns are visible. Not clear if only the three you mention are loaded by PQ and that the others are just added outside PQ. Would be helpful tif you could upload (part of) your workbook, without confidential information, including the query that you have created. Shouldn't be too difficult to filter out the earliest occurrence of a service call. But, it will be difficult without real data to play with.
Oct 14 2020 07:48 AM
@tfmeier I agree with @Riny_van_Eekelen that a sample sheet and more info on exactly what you want would be helpful. That said and based on you looking for only the earliest case I will put out a thought. Apparently you already have a column "Duplicates" that appears to show a count of how many times that job no appears in the list (something like =countif(a:a,a1) but obviously not column A and maybe you are using Table structure references, which would be good. But let me ask why you want to show the total count? What if you changed it to =countif($A$1:$A1,A1) which would then count the duplicates (i.e. if there are 8 instances the first would be 1, the second 2, etc... Then your query can selectively pull only the 1s from that column. You could also check the above =1 to return Yes or No. If you want to know how many duplicate you could =countif($A$1:$A1,A1) & "of" & countif(A:A,A1) and it would display "1 of 8" but then the query would have to search for "1 of" because if you only look for starting with "1" then "10 of 12" would get pulled also.
Oct 14 2020 06:45 PM
Here's a bit more info and attached sanitised files.
I have 3 source data files:
And pull data via Power Query into 'Breakdown Report'
Outcome: I need to create a histogram showing response time in 24 hr blocks (bin size of 24) from column 'Response Time' in sheet 'Mobile Status Log'. This sheet contains both columns populated by Power Query and manual columns to identify valid records and for (selected columns) become the source for the graph.
Columns of interest are: 'Onsite Time', 'Date_created', 'Response time'. 'Response time' = 'Onsite Time' less 'Date_created'. However not every record is valid.
What's a valid record? It's possible to have a few service trips per job and the system creates a time stamp every time a technician is onsite; 89452 in the original post is a good example where we have multiple onsite times against the same date_created timestamp. In cases like this I need to select the earliest Onsite Time; i.e. 4/8/2020 11:09 in the example.
@mtarler's solution works and all valid records get tagged '0' and I can select these to create the histogram. Thanks for that!
However given I'm in the midst of learning Power Query I was wondering how to do this by adding a conditional column 'Valid Record' with values of 'yes' / 'no' to then allow me to only pull records tagged 'yes'. So in other words all that's left is to pull selected columns to create the presentation (histogram in this case).
Is that possible? Is there a smarter way of doing this?
Oct 14 2020 11:46 PM - edited Oct 14 2020 11:49 PMSolution
@tfmeier Played around with your files a bit and revised parts of the queries to arrive at a table "Merge1". Hopefully, it helps you exploring PQ further. You will notice that merging queries (the correct way) eliminates the need for VLOOKUP outside the PQ tables.
Just point the source for each of the queries to the files on your own system and you should be able to see the end result and follow through all the steps.
Oct 15 2020 04:13 AM
Thank you for that, @Riny_van_Eekelen. So merged queries can combine data from different tables via a common key, Job no (Job id) in this case? Do the common columns always have the be in the first / most left column?
Also what's the purpose of "expand" in power query in layman terms?
Oct 15 2020 04:29 AM
@tfmeier You can merge on the basis of any column as long as you expect them to have common values. So, you can merge the third column from one table called "JobNo" with e.g. the 10th column from another called "JobCode", as long as these columns use the same codes. Otherwise, you'll never find a match. You can also merge on the basis of multiple columns.
Merging can be done in several ways (outer, inner, full and anti). In simple terms, it adds a new column to the first table that in itself is a table of matching records. These can then be expanded so that matching records are "joined" side by side. Similar to what you would do with VLOOKUP.
There are many PQ tutorials on line. This particular one I found very useful myself.
Mar 30 2021 04:26 PM