SOLVED

Get Data from Web

Brass Contributor

Hello

 

LInk : http://rajasthanindustries.org/ViewCompanyProfile.aspx?id=All&typet=alpha

 

There is a dataset which I want to get and transform using web, its in table but not able to identify , any trick to know which table has large data set 

 

Please suggest

4 Replies

Hi Sachin,

 

I am not an expert in websites, however, as far as I have heard, if 'Document' is the only item you see in the 'Navigator' menu when connecting to a website (so there are no other tables visible), the website is likely to be using Java Script. This makes it highly unlikely that you will be able to extract any useful information from it. Apparently, this issue is being worked on by Microsoft.

 

Yury

best response confirmed by Sachin Jain (Brass Contributor)
Solution

Yury, in general that's possible, in the code you have you have to use instead of Web.Page(Web.Contents(...)) just only Web.Contents(...) and after that parse the code more or less manually. I did that to extract some information from this Tech Comm site.

 

One of the examples for such approach is http://datachant.com/2017/05/08/web-scraping-power-bi-part-2/, the latest i've seen, but Gil published more on his site.

 

However not sure that's practical for this concrete case.

That's a good point. Thanks Sergei. I can see that with a bit of an effort we can extract the required info using Power Query transformations. We can start with filtering rows starting with "span id=" (from the extracted HTML text we get by following the initial steps outlined in Sergei's the datachant.com link), then locating field labels (e.g. lbl_companyname for Company etc), and further disseminate data from there.

 

Sachin, hope you find it helpful

 

Cheers

Yury

 

Problem Solved !!!

 

Thanks for the hint 

 

Please find the code to solve this problem 

 

let
Source = Table.FromColumns({Lines.FromBinary(Web.Contents("http://rajasthanindustries.org/ViewCompanyProfile.aspx?id=All&typet=alpha"), null, null, 65001)}),
#"Removed Top Rows" = Table.Skip(Source,550),
#"Filtered Rows" = Table.SelectRows(#"Removed Top Rows", each Text.Contains([Column1], "span id")),
#"Column From Examples" = Table.AddColumn(#"Filtered Rows", "Part of Column1", each Text.BetweenDelimiters([Column1], "_", "<", 3, 2), type text),
#"Column From Examples1" = Table.AddColumn(#"Column From Examples", "Part of Column1 (2)", each Text.BetweenDelimiters([Column1], "_", "_", 2, 0), type text),
#"Replaced Value1" = Table.ReplaceValue(#"Column From Examples1","ctl","",Replacer.ReplaceText,{"Part of Column1 (2)"}),
#"Removed Columns" = Table.RemoveColumns(#"Replaced Value1",{"Column1"}),
#"Renamed Columns" = Table.RenameColumns(#"Removed Columns",{{"Part of Column1", "RawData"}, {"Part of Column1 (2)", "CoID"}}),
#"Filtered Rows1" = Table.SelectRows(#"Renamed Columns", each Text.Contains([RawData], "lbl_")),
#"Column From Examples2" = Table.AddColumn(#"Filtered Rows1", "Part of RawData", each Text.AfterDelimiter([RawData], ">", 2), type text),
#"Column From Examples3" = Table.AddColumn(#"Column From Examples2", "Part of RawData (2)", each Text.BetweenDelimiters([RawData], ">", "<", 0, 0), type text),
#"Column From Examples4" = Table.AddColumn(#"Column From Examples3", "Part of RawData (3)", each Text.BetweenDelimiters([RawData], ":", " ", 0, 0), type text),
#"Replaced Value" = Table.ReplaceValue(#"Column From Examples4","//","",Replacer.ReplaceText,{"Part of RawData (3)"}),
#"Column From Examples5" = Table.AddColumn(#"Replaced Value", "Part of RawData (4)", each Text.BetweenDelimiters([RawData], "_", """", 0, 0), type text),
#"Reordered Columns" = Table.ReorderColumns(#"Column From Examples5",{"CoID", "Part of RawData (4)", "RawData", "Part of RawData", "Part of RawData (2)", "Part of RawData (3)"}),
#"Inserted Merged Column" = Table.AddColumn(#"Reordered Columns", "Merged", each Text.Combine({[Part of RawData], [#"Part of RawData (2)"], Text.From([#"Part of RawData (3)"], "en-IN")}, ""), type text),
#"Removed Columns1" = Table.RemoveColumns(#"Inserted Merged Column",{"RawData", "Part of RawData", "Part of RawData (2)", "Part of RawData (3)"}),
#"Renamed Columns1" = Table.RenameColumns(#"Removed Columns1",{{"Part of RawData (4)", "Lable"}, {"Merged", "Value"}}),
#"Reordered Columns1" = Table.ReorderColumns(#"Renamed Columns1",{"Lable", "CoID", "Value"}),
#"Added Custom" = Table.AddColumn(#"Reordered Columns1", "Custom", each [CoID]&[Lable]),
#"Reordered Columns2" = Table.ReorderColumns(#"Added Custom",{"Custom", "Lable", "CoID", "Value"})
in
#"Reordered Columns2"

1 best response

Accepted Solutions
best response confirmed by Sachin Jain (Brass Contributor)
Solution

Yury, in general that's possible, in the code you have you have to use instead of Web.Page(Web.Contents(...)) just only Web.Contents(...) and after that parse the code more or less manually. I did that to extract some information from this Tech Comm site.

 

One of the examples for such approach is http://datachant.com/2017/05/08/web-scraping-power-bi-part-2/, the latest i've seen, but Gil published more on his site.

 

However not sure that's practical for this concrete case.

View solution in original post