SOLVED

Power Query losing Columns

Copper Contributor

Hello there,

 

 

I have a problem importing some folders into power query. The interesting part is the different files in said folder seem to have a varying amount of columns - and power query only loads the columns that are in the example file.

 

 

I tried this twice, one set of data is a folder of PDFs. I choose the right page to extract and find out power query has only loaded the columns as in the example file - but i can load the others with some steps (picture):

 

Power Query finding Columns.png

 

The other set is a folder with CSVs. I try replicating the exact steps here, but Power Query does not load the additional column - if I refuse to choose an example file that has all columns already. If I want to automate said steps I can't choose a fitting file every time, my only option is to accept Excel losing a column of data. Is there any way of preventing this? Thanks in advance.

9 Replies
best response confirmed by Max_Mustermann (Copper Contributor)
Solution

@Max_Mustermann 

In case of csv check Transform Sample File query, first step looks like

= Csv.Document(Parameter1,[Delimiter=";", Columns=10, Encoding=65001, QuoteStyle=QuoteStyle.None])

Change Columns=10 on Columns=20 or like, i.e. max possible number of columns.

When in combined file remove all columns with no data. As variant that could be function which removes all empty rows and columns as

(Source as table) as table =>

let
    RemoveEmptyColumns = Table.SelectColumns(
        Source,
        List.Select(
            Table.ColumnNames(Source),
            each List.NonNullCount(Table.Column(Source,_)) <> 0
        )
    ),
    RemoveEmptyRows = Table.SelectRows(
        RemoveEmptyColumns,
        each not List.IsEmpty(List.RemoveMatchingItems( Record.FieldValues(_), {"", null} ) )
    )
in
    RemoveEmptyRows

Not sure about PDF, check the connector options if something similar.

Thank you@SergeiBaklan for the reply, i will need some time until i am able to test around with it.

 

I just wanted to clarify that the PDF-case is working quite fine. I only encounter said problem on the CSV-case. Maybe I will encounter more problems in future, so i can specify when and how often it occurs.

@Max_Mustermann 

That could be one more option with csv. You may use Columns=null

= Csv.Document(Parameter1,[Delimiter=",", Columns=null, Encoding=65001, QuoteStyle=QuoteStyle.None])

If null it will will be taken actual number of columns for each file. However, you shall to adjust combining of the table in the resulting table

let

    // These are automatically generated steps 
    Source = Folder.Files("C:\Test temp"),
    #"Removed Other Columns" = Table.SelectColumns(Source,{"Content"}),
    #"Filtered Hidden Files1" = Table.SelectRows(#"Removed Other Columns", each [Attributes]?[Hidden]? <> true),
    #"Invoke Custom Function1" = Table.AddColumn(#"Filtered Hidden Files1", "Transform File", each #"Transform File"([Content])),
    #"Removed Other Columns1" = Table.SelectColumns(#"Invoke Custom Function1", {"Transform File"}),

    // add this step to generate list of column names in all files
    names = List.Union(
        List.Transform(
            #"Removed Other Columns1"[Transform File], (q) => Table.ColumnNames(q)
        )
    ),

    // on this step instead of column names for the sample file
    // use generated list of names as below
    #"Expanded Table Column1" = Table.ExpandTableColumn(
        #"Removed Other Columns1",
        "Transform File",
        //Table.ColumnNames(#"Transform File"(#"Sample File"))
        names
        )
in
    #"Expanded Table Column1"

Hey @SergeiBaklan, again thank you for the input. My first problem is that i don't know where to put the first part:

= Csv.Document(Parameter1,[Delimiter=",", Columns=null, Encoding=65001, QuoteStyle=QuoteStyle.None])

 

The rest became quite messy, as i had to partly translate it, as my office runs in german. It kinda seemed to work in the end - excluding the last step. But in the second last step "names" i already seen that power query fetched 12 columns again, meaning it catched one too few.

Maybe i already broke it with neglecting the first part? The error is being unable to convert a source csv into a table...

 

error.png

@Max_Mustermann 

Please check queries in the attached file. It combines 3 csv files located in local folder c:\Test temp.

Or you work with SharePoint folder?

@SergeiBaklan 

I found your testfile working perfectly after putting the 3 imported ones in the source folder. After replacing the three test inputs by the data i want to actually use it somehow worked without giving me any errors back, which was little confusing, but also seemed to have squished all data into a single column - making me now lose the other 12.

 

Oddly, i can find all 13 column headers now, squished, too. It looks as it looks in the import data, some have 12 columns, some have 13. Maybe something with delimiters...squished1.pngsquished2.png

@Max_Mustermann 

Yes, in your case delimiter shall be semicolon

[Delimiter=";", Columns=null, Encoding=65001,

perhaps another encoding. In than part take the generated script and change only number of columns on null, don't touch the rest.

If that doesn't work perhaps you could share couple of your csv files keeping only few rows and removing sensitive information.

@SergeiBaklan 

I finally found out where the "- Csv.Doc..." part belonged (picture 1). I changed the delimiter (the encoding seems fine, but if not i can fix that myself i guess). Power Query then started adding some steps and implementing the example file table (picture 2):

 

result1.pngresult2.png

The example file still only has 12 columns, which makes sense I guess. If you take a look at the final combined table it has 13 columns now, which seems like the result i wanted to see.

 

result3.png

I am lacking the time experimenting with that right now, but i am very optimistic my problem is solved. Again my thanks for your incredible support.

@Max_Mustermann , you are welcome.

 

Please remove "Change Type" step in Transform sample file query. It works with hardcoded column names which is not good if you work with variable columns.

It could be applied to the final file, and there is trick to automate that somehow, i.e. not to hardcode column names. But bit complex one. 

1 best response

Accepted Solutions
best response confirmed by Max_Mustermann (Copper Contributor)
Solution

@Max_Mustermann 

In case of csv check Transform Sample File query, first step looks like

= Csv.Document(Parameter1,[Delimiter=";", Columns=10, Encoding=65001, QuoteStyle=QuoteStyle.None])

Change Columns=10 on Columns=20 or like, i.e. max possible number of columns.

When in combined file remove all columns with no data. As variant that could be function which removes all empty rows and columns as

(Source as table) as table =>

let
    RemoveEmptyColumns = Table.SelectColumns(
        Source,
        List.Select(
            Table.ColumnNames(Source),
            each List.NonNullCount(Table.Column(Source,_)) <> 0
        )
    ),
    RemoveEmptyRows = Table.SelectRows(
        RemoveEmptyColumns,
        each not List.IsEmpty(List.RemoveMatchingItems( Record.FieldValues(_), {"", null} ) )
    )
in
    RemoveEmptyRows

Not sure about PDF, check the connector options if something similar.

View solution in original post