Forum Discussion
Power Query for Workbook Merge
100% positive that the duplicates are not across files. The report is a list of patient referrals from an EHR from completely separate locations. The duplicates are from within a single location. I have traced each occurrence back to the source file and verified that only 1 row existed in that file. I can't post a screenshot because the website here won't let me.
Not sure the best way to share my M code, but happy to do so if you can provide a little direction.
Thank you for trying to help
- aflintdepmJun 21, 2023Copper ContributorHere you go
let
Source = Folder.Files("C:\Users\svc_apps\xxxxxxxxx\xxxxxxxxxxxxx\Data Downloads\Marketing Report File Download"),
#"Filtered Hidden Files1" = Table.SelectRows(Source, each [Attributes]?[Hidden]? <> true),
#"Invoke Custom Function1" = Table.AddColumn(#"Filtered Hidden Files1", "Transform File", each #"Transform File"([Content])),
#"Renamed Columns1" = Table.RenameColumns(#"Invoke Custom Function1", {"Name", "Source.Name"}),
#"Removed Other Columns1" = Table.SelectColumns(#"Renamed Columns1", {"Source.Name", "Transform File"}),
#"Expanded Table Column1" = Table.ExpandTableColumn(#"Removed Other Columns1", "Transform File", Table.ColumnNames(#"Transform File"(#"Sample File"))),
#"Changed Type" = Table.TransformColumnTypes(#"Expanded Table Column1",{{"Source.Name", type text}, {"Column1", type text}, {"Column2", type text}, {"Column3", type text}, {"Column4", type text}, {"Column5", type text}, {"Column6", type text}, {"Column7", type text}, {"Column8", type text}, {"Column9", type text}, {"Column10", type text}, {"Column11", type text}, {"Column12", type text}, {"Column13", type text}, {"Column14", type text}, {"Column15", type text}, {"Column16", type text}, {"Column17", type text}, {"Column18", type text}}),
#"Removed Columns" = Table.RemoveColumns(#"Changed Type",{"Source.Name"}),
#"Removed Top Rows" = Table.Skip(#"Removed Columns",2),
#"Promoted Headers" = Table.PromoteHeaders(#"Removed Top Rows", [PromoteAllScalars=true]),
#"Changed Type1" = Table.TransformColumnTypes(#"Promoted Headers",{{"Branch", type text}, {"MR", Int64.Type}, {"First Name", type text}, {"Middle Initial", type text}, {"Last Name", type text}, {"Status", type text}, {"Referral Date", type date}, {"Primary Insurance", type text}, {"Physician First", type text}, {"Physician Last", type text}, {"Internal Referral First", type text}, {"Internal Referral Last", type text}, {"External Referral", type text}, {"Case Manager", type text}, {"Start of Care Date", type date}, {"Episode Start", type date}, {"EpisodeEnd", type date}, {"Episode Length", Int64.Type}}),
#"Filtered Rows" = Table.SelectRows(#"Changed Type1", each ([Branch] <> "Branch" and [Branch] <> "Generated on 06/20/2023 06:52 by xxxxxxxxx" and [Branch] <> "Generated on 06/20/2023 06:53 by xxxxxxxxx" and [Branch] <> "Generated on 06/20/2023 06:54 by xxxxxxxxx" and [Branch] <> "Generated on 06/20/2023 06:55 by xxxxxxxxx" and [Branch] <> "Generated on 06/20/2023 06:56 by xxxxxxxxx" and [Branch] <> "Generated on 06/20/2023 06:57 by xxxxxxxxx" and [Branch] <> "Generated on 06/20/2023 06:58 by xxxxxxxxx" and [Branch] <> "Generated on 06/20/2023 06:59 by xxxxxxxxx" and [Branch] <> "Generated on 06/20/2023 07:00 by xxxxxxxxx" and [Branch] <> "Recert by Episode Start Date: 06/20/2023") and ([Status] = "Active"))
in
#"Filtered Rows"- LorenzoJun 22, 2023Silver Contributor
You're not Merging files but Combining files and the process you use works very well. Until your scary #"Filtered Rows" it's a pretty classic combining query where PQ very rarely makes mistakes not to says never. Looking at your code there are a couple of unecessary steps (leads to inefficiency) and if I made no mistake it can be shortened as:
let Source = Folder.Files("C:\Users\svc_apps\xxxxxxxxx\xxxxxxxxxxxxx\Data Downloads\Marketing Report File Download"), #"Filtered Hidden Files1" = Table.SelectRows(Source, each [Attributes]?[Hidden]? <> true), #"Invoke Custom Function1" = Table.AddColumn(#"Filtered Hidden Files1", "Transform File", each #"Transform File"([Content])), #"Renamed Columns1" = Table.RenameColumns(#"Invoke Custom Function1", {"Name", "Source.Name"}), #"Removed Other Columns1" = Table.SelectColumns(#"Renamed Columns1", {"Source.Name", "Transform File"}), #"Expanded Table Column1" = Table.ExpandTableColumn(#"Removed Other Columns1", "Transform File", Table.ColumnNames(#"Transform File"(#"Sample File")) ), #"Removed Columns" = Table.RemoveColumns(#"Expanded Table Column1",{"Source.Name"}), #"Removed Top Rows" = Table.Skip(#"Removed Columns", 2), #"Promoted Headers" = Table.PromoteHeaders(#"Removed Top Rows", [PromoteAllScalars=true]), #"Changed Type" = Table.TransformColumnTypes(#"Promoted Headers", { {"Branch", type text}, {"MR", Int64.Type}, {"First Name", type text}, {"Middle Initial", type text}, {"Last Name", type text}, {"Status", type text}, {"Referral Date", type date}, {"Primary Insurance", type text}, {"Physician First", type text}, {"Physician Last", type text}, {"Internal Referral First", type text}, {"Internal Referral Last", type text}, {"External Referral", type text}, {"Case Manager", type text}, {"Start of Care Date", type date}, {"Episode Start", type date}, {"EpisodeEnd", type date}, {"Episode Length", Int64.Type} } ), #"Filtered Rows" = Table.SelectRows(#"Changed Type", each ( [Branch] <> "Branch" and [Branch] <> "Generated on 06/20/2023 06:52 by xxxxxxxxx" and [Branch] <> "Generated on 06/20/2023 06:53 by xxxxxxxxx" and [Branch] <> "Generated on 06/20/2023 06:54 by xxxxxxxxx" and [Branch] <> "Generated on 06/20/2023 06:55 by xxxxxxxxx" and [Branch] <> "Generated on 06/20/2023 06:56 by xxxxxxxxx" and [Branch] <> "Generated on 06/20/2023 06:57 by xxxxxxxxx" and [Branch] <> "Generated on 06/20/2023 06:58 by xxxxxxxxx" and [Branch] <> "Generated on 06/20/2023 06:59 by xxxxxxxxx" and [Branch] <> "Generated on 06/20/2023 07:00 by xxxxxxxxx" and [Branch] <> "Recert by Episode Start Date: 06/20/2023" ) and ([Status] = "Active") ) in #"Filtered Rows"
Given you can't share anything a suggestion, tempo.:
#1 Delete step #"Removed Columns"
#2 After the #"Changed Type" step Add a new one to rename the 1st column Source.Namethis will help you identify where the duplicates come from
- aflintdepmJun 22, 2023Copper ContributorThank you for the detailed reply. I've never edited M code directly, so I want to make sure I am understanding your suggestion.
In your code sample above, you're referring to removing Line 10, then after line 20, I would add a rename instruction like this:
#"Renamed Columns1" = Table.RenameColumns(#"Invoke Custom Function1", {"Name", "Source.Name"}),
Is that correct?
- JKPieterseJun 22, 2023Silver Contributor
Thanks. I see this is can only be troubleshooted with a couple of the source files. Or a screen-share where we walk through the steps. I'm afraid I don't have the time for this at the moment.
- aflintdepmJun 22, 2023Copper ContributorThank you for trying. I appreciate your time