Aug 20 2024 06:14 AM - edited Aug 20 2024 06:14 AM
Hello there,
I have a problem importing some folders into power query. The interesting part is the different files in said folder seem to have a varying amount of columns - and power query only loads the columns that are in the example file.
I tried this twice, one set of data is a folder of PDFs. I choose the right page to extract and find out power query has only loaded the columns as in the example file - but i can load the others with some steps (picture):
The other set is a folder with CSVs. I try replicating the exact steps here, but Power Query does not load the additional column - if I refuse to choose an example file that has all columns already. If I want to automate said steps I can't choose a fitting file every time, my only option is to accept Excel losing a column of data. Is there any way of preventing this? Thanks in advance.
Aug 20 2024 06:46 AM
SolutionIn case of csv check Transform Sample File query, first step looks like
= Csv.Document(Parameter1,[Delimiter=";", Columns=10, Encoding=65001, QuoteStyle=QuoteStyle.None])
Change Columns=10 on Columns=20 or like, i.e. max possible number of columns.
When in combined file remove all columns with no data. As variant that could be function which removes all empty rows and columns as
(Source as table) as table =>
let
RemoveEmptyColumns = Table.SelectColumns(
Source,
List.Select(
Table.ColumnNames(Source),
each List.NonNullCount(Table.Column(Source,_)) <> 0
)
),
RemoveEmptyRows = Table.SelectRows(
RemoveEmptyColumns,
each not List.IsEmpty(List.RemoveMatchingItems( Record.FieldValues(_), {"", null} ) )
)
in
RemoveEmptyRows
Not sure about PDF, check the connector options if something similar.
Aug 20 2024 06:51 AM
Thank you@SergeiBaklan for the reply, i will need some time until i am able to test around with it.
I just wanted to clarify that the PDF-case is working quite fine. I only encounter said problem on the CSV-case. Maybe I will encounter more problems in future, so i can specify when and how often it occurs.
Aug 20 2024 07:49 AM
That could be one more option with csv. You may use Columns=null
= Csv.Document(Parameter1,[Delimiter=",", Columns=null, Encoding=65001, QuoteStyle=QuoteStyle.None])
If null it will will be taken actual number of columns for each file. However, you shall to adjust combining of the table in the resulting table
let
// These are automatically generated steps
Source = Folder.Files("C:\Test temp"),
#"Removed Other Columns" = Table.SelectColumns(Source,{"Content"}),
#"Filtered Hidden Files1" = Table.SelectRows(#"Removed Other Columns", each [Attributes]?[Hidden]? <> true),
#"Invoke Custom Function1" = Table.AddColumn(#"Filtered Hidden Files1", "Transform File", each #"Transform File"([Content])),
#"Removed Other Columns1" = Table.SelectColumns(#"Invoke Custom Function1", {"Transform File"}),
// add this step to generate list of column names in all files
names = List.Union(
List.Transform(
#"Removed Other Columns1"[Transform File], (q) => Table.ColumnNames(q)
)
),
// on this step instead of column names for the sample file
// use generated list of names as below
#"Expanded Table Column1" = Table.ExpandTableColumn(
#"Removed Other Columns1",
"Transform File",
//Table.ColumnNames(#"Transform File"(#"Sample File"))
names
)
in
#"Expanded Table Column1"
Aug 21 2024 12:15 AM
Hey @SergeiBaklan, again thank you for the input. My first problem is that i don't know where to put the first part:
= Csv.Document(Parameter1,[Delimiter=",", Columns=null, Encoding=65001, QuoteStyle=QuoteStyle.None])
The rest became quite messy, as i had to partly translate it, as my office runs in german. It kinda seemed to work in the end - excluding the last step. But in the second last step "names" i already seen that power query fetched 12 columns again, meaning it catched one too few.
Maybe i already broke it with neglecting the first part? The error is being unable to convert a source csv into a table...
Aug 21 2024 01:12 AM
Please check queries in the attached file. It combines 3 csv files located in local folder c:\Test temp.
Or you work with SharePoint folder?
Aug 21 2024 01:51 AM
I found your testfile working perfectly after putting the 3 imported ones in the source folder. After replacing the three test inputs by the data i want to actually use it somehow worked without giving me any errors back, which was little confusing, but also seemed to have squished all data into a single column - making me now lose the other 12.
Oddly, i can find all 13 column headers now, squished, too. It looks as it looks in the import data, some have 12 columns, some have 13. Maybe something with delimiters...
Aug 21 2024 02:13 AM
Yes, in your case delimiter shall be semicolon
[Delimiter=";", Columns=null, Encoding=65001,
perhaps another encoding. In than part take the generated script and change only number of columns on null, don't touch the rest.
If that doesn't work perhaps you could share couple of your csv files keeping only few rows and removing sensitive information.
Aug 21 2024 05:00 AM
I finally found out where the "- Csv.Doc..." part belonged (picture 1). I changed the delimiter (the encoding seems fine, but if not i can fix that myself i guess). Power Query then started adding some steps and implementing the example file table (picture 2):
The example file still only has 12 columns, which makes sense I guess. If you take a look at the final combined table it has 13 columns now, which seems like the result i wanted to see.
I am lacking the time experimenting with that right now, but i am very optimistic my problem is solved. Again my thanks for your incredible support.
Aug 21 2024 06:38 AM
@Max_Mustermann , you are welcome.
Please remove "Change Type" step in Transform sample file query. It works with hardcoded column names which is not good if you work with variable columns.
It could be applied to the final file, and there is trick to automate that somehow, i.e. not to hardcode column names. But bit complex one.
Aug 20 2024 06:46 AM
SolutionIn case of csv check Transform Sample File query, first step looks like
= Csv.Document(Parameter1,[Delimiter=";", Columns=10, Encoding=65001, QuoteStyle=QuoteStyle.None])
Change Columns=10 on Columns=20 or like, i.e. max possible number of columns.
When in combined file remove all columns with no data. As variant that could be function which removes all empty rows and columns as
(Source as table) as table =>
let
RemoveEmptyColumns = Table.SelectColumns(
Source,
List.Select(
Table.ColumnNames(Source),
each List.NonNullCount(Table.Column(Source,_)) <> 0
)
),
RemoveEmptyRows = Table.SelectRows(
RemoveEmptyColumns,
each not List.IsEmpty(List.RemoveMatchingItems( Record.FieldValues(_), {"", null} ) )
)
in
RemoveEmptyRows
Not sure about PDF, check the connector options if something similar.