Sep 14 2019 09:36 PM
Hello. I'm having a devil of a problem trying to import data from the web and getting it into Excel in roughly the same columned format. I'm copying the relevant data (which is information that I've put together using various filter selections on a specific web site) and then pasting it into Excel. When I do this, all of the data is shown in one long column. It looks roughly like this (using the info from just 2 companies):
Loving Care Agency, Inc.
Health Services, Hospital & Health Care
$200 Million - $1 Billion
(201) 403-9300
Hasbrouck Heights, NJ
lovingcareagency.com
1971
Frontline Consulting Services Inc.
Business Services, Information Technology and Services
$100 Million - $200 Million
(704) 510-1998
Charlotte, NC
fcs-inc.com
1996
I need to get this data into 7 columns (Company Name; Industry; Yearly Revenue; Phone #; Location; Web Site; Year Founded). Using "Text to Columns" is no help as there are no delimiters (commas, tabs, spaces, etc.). Is there any way to get this data into the same columns as they are presented on the original web site, or do I have to manually insert a delimiter after each line of data? Any help will be GREATLY appreciated. Thank you.
Sep 14 2019 10:30 PM - edited Sep 14 2019 10:30 PM
You may use Power Query to get the data in the desired format.
To do so, follow these steps...
Query:
let
Source = Excel.CurrentWorkbook(){[Name="WebData"]}[Content],
#"Removed Blank Rows" = Table.SelectRows(Source, each not List.IsEmpty(List.RemoveMatchingItems(Record.FieldValues(_), {"", null}))),
#"Changed Type" = Table.TransformColumnTypes(#"Removed Blank Rows",{{"Column1", type any}}),
#"Added Index" = Table.AddIndexColumn(#"Changed Type", "Index", 0, 1),
#"Inserted Modulo" = Table.AddColumn(#"Added Index", "Modulo", each Number.Mod([Index], 7), type number),
#"Pivoted Column" = Table.Pivot(Table.TransformColumnTypes(#"Inserted Modulo", {{"Modulo", type text}}, "en-IN"), List.Distinct(Table.TransformColumnTypes(#"Inserted Modulo", {{"Modulo", type text}}, "en-IN")[Modulo]), "Modulo", "Column1"),
#"Removed Columns" = Table.RemoveColumns(#"Pivoted Column",{"Index"}),
#"Filled Up" = Table.FillUp(#"Removed Columns",{"1", "2", "3", "4", "5", "6"}),
#"Filtered Rows" = Table.SelectRows(#"Filled Up", each ([0] <> null)),
#"Renamed Columns" = Table.RenameColumns(#"Filtered Rows",{{"0", "Company Name"}, {"1", "Industry"}, {"2", "Yearly Revenue"}, {"3", "Phone#"}, {"4", "Location"}, {"5", "Website"}, {"6", "Year Founded"}})
in
#"Renamed Columns"
Also, if you are getting the web data through the From Web under Get & Transformed Data Group, you may tweak the existing query to get the data in the desired format.
Sep 15 2019 06:00 AM
Bit another transformation
let
ColumnNames = {"Company Name", "Industry", "Yearly Revenue", "Phone#", "Location", "Website", "Year Founded"},
ItemsInGroup =7,
Source = Excel.CurrentWorkbook(){[Name="WebData"]}[Content],
RemoveNulls = Table.SelectRows(
Source,
each not List.IsEmpty(
List.RemoveMatchingItems(Record.FieldValues(_), {"", null})
)
),
GroupsToLists = Table.FromList(
List.Split(
RemoveNulls[Column1],
ItemsInGroup
),
Splitter.SplitByNothing()
),
GroupsToRecords = Table.ReplaceValue(
GroupsToLists,
each [Column1] ,
each Record.FromList([Column1],ColumnNames),
Replacer.ReplaceValue,{"Column1"}
),
ExpandRecords = Table.ExpandRecordColumn(
GroupsToRecords,
"Column1", ColumnNames, ColumnNames
)
in
ExpandRecords
Sep 15 2019 09:56 AM
Subodh (and Sergei)...thank you very much for responding. I've tried both of your Power Querys and get the same message:
Expression.Error: We couldn't find an Excel table named 'WebData".
Details:
WebData
What am I doing wrong?
- Jeff
Sep 15 2019 10:33 AM
WebData is the table with raw data which you shall create. Select all raw data in column A, press Ctrl+T and create the table.
On ribbon you will stay on Table Design
Type here table name now
Sep 15 2019 11:54 AM
Good one @Sergei Baklan! :thumbs_up:
Sep 15 2019 11:58 AM - edited Sep 15 2019 12:01 PM
Hi Jeff,
If you look at the points#1 and 2 in my proposed solution, you are supposed to convert your data into an Excel Table and then rename the Excel Table as 'WebData" and then the query will work for you as desired as this query assumes that in your file there is an Excel Table called "WebData" which must be missing at present in your actual file.
To convert your data into an Excel Table, select all your data and then press Ctrl+T and it will convert your data into an Excel Table. Once your data is converted into an Excel Table, a Table Design Tab will appear in the Ribbon and if you click that and look at top right of the ribbon, you will have a TextBox to have Table Name and you can type the WebData in there and hit Enter, that will rename your Table as WebData.
Sep 16 2019 11:17 AM
And much better, if you don't copy/paste from web but query the web, combine transformations suggested here with the initial query. If that's copy/paste perhaps it's worth to try query the web, finally it'll be one click solution - open the file, refresh all.
Sep 16 2019 12:26 PM
Subodh/Sergei,
Your suggestions have been most helpful. I was finally able to get the data into the right format but it is a tremendous amount of work. I'm working off of a prospect list web site where I have to input the specific parameters for the companies I want to discover (for example, location, industry, yearly revenue, etc.). For large listings, I'm able to see only 25 companies at a time. It looks like this:
I have to copy and paste each grouping of 25 into Excel and all of this info. goes into just one column. When I've finally created a large enough list, I then run the Power Query that you've designed. This is a lot of work because if each company doesn't have the exact same number of data points (in this case 7) the report is all jumbled and I have to go back and find where the missing data should go.
Do you think I'm doing this in the most efficient way or is there a more economical way to go about this task. I don't see how I can pull the info. right off the web site as I work because the web address just shows something like: https://app.uplead.com/#. I'm unsure if Excel can pull the exact info. that I've specifically collated on the site.
You gentlemen have been so very kind in your guidance and I sincerely appreciate it.
Regards,
Jeff
;lkjasdjf
Sep 17 2019 03:55 PM
Jeff, that's not a free resource and it's hard to say what and how could be automated without playing with web site. It depends on many things, for example how the pagination is organised, by URL parameter or Java script changes pages without changing the URL. It could be some protection against connecting to their data. Etc., etc.
But in general yes, it could be possible to automate data pick-up from web site using Power Query. The evil is in the detail.