Import Excel and Sort by eMail-Domain

Brass Contributor

Hi all

 

merry XMAS to you all

 

I use the time to get some "long term wish" fixed but no luck at all - excel is not my "world" 🙂

 

my problem ist

I get every month a CSV from one of our "SaaS Services" - I nee now to import this to a Blank (or Template if it would help) and do the following

 

- take all eMail Addresses with ending @XXX.ch and put this to the Tab XXX.ch

- take all eMail Addresses with ending @XXX.de and put this to the Tab XXX.de

- take all eMail Addresses with ending @XXX.com and put this to the Tab XXX.com

 

These are just 3 examples, I have to do this for a lot of domains - every month

 

again, I have no idea to use Excel for more than my Expense report 🙂 

can someone help me here?

 

15 Replies

@PaddyB Connect to the CSV file with PowerQuery (PQ), do some transformations, sort and load back to Excel. Sound easy! And it is, though, you need to learn the basics of PQ first. The link below would be a good starting point.

https://exceloffthegrid.com/power-query-introduction/ 

You might need to use a macro to create the worksheet based on unique domain names. And then another macro to copy the relevant data for the domain names. You might need to separate the domain name from the email addresses first. Something like this will help:
RIGHT(A1,LEN(A1)-FIND("@",A1,1))
Once you separate the names, macro can do the rest.

Side note: If I were to do the work, I would not create separate sheets for each domain. I would lump them in one formatted table and filter them to display only those I need to work on.

Hi Riny

for sure a working idea - but just to learn something completely new take many hours/days for a task that I need to do once a month 🙂

so if I do not find a solution what more or less is working, I will do it by hand - will be quicker
I will give this a try - but not the Macro 🙂

@PaddyB Your choice. But I'd recommend to look into PQ anyway. Guaranteed that you'll find other uses for it that will save you huge amounts of time in other jobs you might be doing.

> find a solution what more or less is working

Hi @PaddyB 

I have combined a few things into something attached that could be suitable for you.

Sheet Input;

Column A holds the email addresses and F2 visualizes the different email domains from A2 and down.

bosinander_1-1640539951619.png

 

Sheet2 and 3 holds the email addresses from two different domains.

Similar to @NowshadAhmed suggestion, each of those sheets shows the mail addresses related to the chosen domain.

The selection list shows domains not already used on another sheet.

Since xxx.com is selectable, there is no sheet for that domain. Copy Sheet2 (right click the tab of Sheet2: Move or copy: [x] Create a copy and (move to end)) and select xxx.com in A1.

bosinander_2-1640540092553.png

Sheet PowerPage has in column A a list of the workbooks sheets. They are also linked to respective sheet.

bosinander_3-1640540758413.png

 

If you'ld like to know more about how the automatic sheet list works, you may read about version 4 macros at https://exceloffthegrid.com/using-excel-4-macro-functions/.

Column B shows the content from cell A1 in each sheet using the function INDIRECT.

F2 lists the unique domains using

=LET(
  data;    OFFSET(Input!A:A;1;0;COUNTA(Input!A:A)-1;1);
  output;  UNIQUE(MID(data;FIND("@";data)+1;4711));
  output
)

H2 lists the domains that are missing an own sheet;

=FILTER(F2#;ISERROR(XMATCH(F2#;B:B)))

Finally, cell A1 at the different domain sheets have a data validation list that suggests any missing domain, listed in H2 with its spill area (H2#)

bosinander_4-1640541526210.png

Instead using F2 would be like NowshadAhmed suggested.

-/-

The file attached contains version 4 macros and has to be saved as xlsM.

To activate them it seems nowadays necessary to include a VBA macro to get the enable question.

bosinander_5-1640541782091.png

To get the sheet list updated, you may have to save, close and repoen the file (with macros enabled without question).

The VBA macro (that trigs the yellow enable question bar) is also a shortcut to activate the index sheet/first sheet.

Sub activateSheet1()
'
' Keyboard Shortcut: Ctrl+Shift+A
'
    ThisWorkbook.Activate
    Sheets(1).Select
End Sub

Finally, try Data: From Text/CSV to connect to the csv file could be worth the while as a first step to using Power Query. You will get the csv data into a new sheet and you just have to cut that new table and paste it at cell input!A2 to have it all quite automatic with Ctrl+Alt+F5.

 

@PaddyB 

 

I would go with what Mr Van_Eekelen suggested because you set up power query once and it will repeat the process just by pressing the update all button, so if you setup a query by folder all you have to do is drop the new file into a folder then open the pq workbook and press ctrl+alt+f5 and it will update the workbood with the new data.

@Yea_So IMHO Power Query (PQ) is a nice alternative when connecting to slow changing data since it demands a specific update event in Excel.

PQ also needs a completly different set of skills compared to worksheets but it may pay back by providing a bunch of nice features like unpivot.
To get the transformed data, in this case email addresses sliced and separated by domain names to different sheets, PQ need to have a separate question for each domain. They can all be based on the same connection to the source file, but still need to be splitted into one separate 'question' for XXX.ch, another for XXX.de and a third for XXX.com since the wish is to have the result load to different sheets.

With PQ alone it is needed to prepare a sheet for each domain and manually manage (append a new question and a new sheet) whenever a new domain appears.

I do not see any smooth way around that except using a macro to loop through the months domains and setup a sheet for each one found. And macros seem to not be an alternative for this user. 

Loading the csv file via the menu Data and command From Text/CSV and then use a worksheet formula like @NowshadAhmeds ought to be a pretty good solution as above Dec 26 2021 10:24.

@bosinander 

 

yes it requires a one time query setup for each domain/sheet but after the initial query setup, the task becomes automatic just by pressing ctrl+alt+F5 to update all queries setup within the workbook so the cost benefit more than justifies the initial setup whereas with VBA the user might not be familiar with it and make maintaining more complicated as you have mentioned:

 

"To get the transformed data, in this case email addresses sliced and separated by domain names to different sheets, PQ need to have a separate question for each domain. They can all be based on the same connection to the source file, but still need to be splitted into one separate 'question' for XXX.ch, another for XXX.de and a third for XXX.com since the wish is to have the result load to different sheets.

With PQ alone it is needed to prepare a sheet for each domain and manually manage (append a new question and a new sheet) whenever a new domain appears."

if the user is not familiar with VBA how would the user maintain to update the code to add the VBA statements for the new domains?

In vba, a statement would generally check if there is a sheet by the domain name, and then create one if missing.
Another statement would lump the email addresses of a domain to the respective sheet.
This way, the user won't have to worry about the code for new domains because the codes are general and not specific to domain names.
Then assign a button to initiate the task and do everything automated.

However, for the current task the user needs, vba and power query is over engineering in my opinion. A simple formatted/structured table is more than enough, assuming it is only list of domain names. Here the formula for spliting the domain names will also help him filter effectively.

@NowshadAhmed 

 

How is creating one power query against a CSV file, then creating a reference query filtering for each domain sheet over engineering?

 

Come now iterate the procedures of updating each domain sheet every time there is a CSV update.

 

With power query by folder, after the

1. initial setup of each domain sheet

2. all the op has to do is drop the new CSV file into the queried folder

3. then press Ctrl+alt+F5 and it updates the main query and all the reference queries.

4. and if there are new domain sheets to setup just create new reference queries for the new domains, filter the appropriate domain within the reference query then close and load to a new sheet and save the workbook so the next time there is another CSV update all the op has to do is steps 1 thru 3.

 

Now iterate the procedure involved using formatted structured table to update each domain sheet below and let the op decide which solution they would prefer to use.

I interpret 'lot of domains' as between 4 and 100 and assume that quite a few of them are new from month to month.
As described 26 Dec 2021 10:24 AM above, the attached file calculates the domains in the input addresses. Column A may be pasted or loaded via Data: (Get & Transform): From Text/CSV.

bosinander_1-1640844399373.png

__/ Sheet3 has one of the domains entered into A1 and uses FILTER to fetch the addresses in that domain.

bosinander_2-1640844663802.png

A3:

=FILTER(Input!A:A;IFERROR(FIND("@"& A1;Input!A:A);0))

__/ The first sheets;

  • Column A shows the sheets in the workbook
  • Column B shows the value in each sheets cell A1 (as chosen in C2)
  • Column H shows the domains that not yet have its own sheet
    It is calculated by comparing column B with hidden column F (press Ctrl+8 to show its [+])

bosinander_3-1640844718992.png

Click the button to generate the missing sheets.

bosinander_4-1640845012769.png

__/ The macro will generate sheets for each missing domain as follows:

Sub domains2sheets()
    While Not IsError(Range("domains_without_sheet").Value)
        Sheets(Sheets.Count).Select
        If Not Range("a1").Value Like "?*.?*" Then
            MsgBox "Last sheet seems to not be a valid template?"
            Exit Sub
        Else
            Sheets(Sheets.Count).Copy After:=Sheets(Sheets.Count)
            Sheets(Sheets.Count).Name = "Sheet" & Sheets.Count
            Range("a1").Value = Range("domains_without_sheet").Value
        End If
    Wend
    Sheets(1).Select
End Sub

The defined name domains_without_sheet refers to =PowerPage!$H$2

Row 9 could just as well be 

Sheets(Sheets.Count).Name = Range("domains_without_sheet").Value

but would be misleading if cell A1 is changed.

 

Since it's just once a month, I think a paste and a click could be automatic enough.

It is also possible to select sheet4 and Shift+click the last sheet and delete them to only have the current months domains.

For this purpose or rather to get an overview of the number of items in each domain, I also added a counter in PowerPage!D:D. Minus one to exclude the sheets title.

bosinander_5-1640846315937.png

 

@bosinander 

 

You might want to look into merge query "merge" using the join type anti-left or anti-right depending on which list is selected in the dialog box to isolate and create a new domains list where the current csv list does not match the current master list.

 

if the new csv get and transform list is selected as the first list, use join type anti-right to get the new domains list

 

if the new csv get and transform list is selected as the second list use anti-left to get the new domains list

 

after getting the new domains list, use the query merge "append" to add the new domains list into the master list

Yes, but let's assume that you get 60 domains whereof 40 are new and 35 not around this month. As long as the demand is to have the addresses gathered by domain on separate sheets, there will be a lot of manual work to load each new domain question to a separate new sheet and probably delete domain sheets not in use.
I think the separation would be best done using more or less familiar worksheet functions and eventually combine them with oa few lines of code as mentioned above.
However I like the for me new approach to consider merge (aka join) anti-left and anti-right. That may become handy when describing the functionality in yet another way 🙂

@bosinander 

 

For me I tend to not separate data set unless there are certain processes that need specific data.

Maybe the op needs to put some context on which process needs separate data set specific to the process.

 

The best way to separate data set without using pq is the filter function for a specific process relative to the data set