analyzing data to remove some data but retain other data - probably a very simple issue

%3CLINGO-SUB%20id%3D%22lingo-sub-2688308%22%20slang%3D%22en-US%22%3Eanalyzing%20data%20to%20remove%20some%20data%20but%20retain%20other%20data%20-%20probably%20a%20very%20simple%20issue%3C%2FLINGO-SUB%3E%3CLINGO-BODY%20id%3D%22lingo-body-2688308%22%20slang%3D%22en-US%22%3E%3CP%3EI%20have%20a%20data%20set%20where%20a%20%22project%22%20is%20identified%20by%20a%20number%20and%20fiscal%20year.%20(xxxx-yy).%26nbsp%3B%20Each%20project%2C%20however%2C%20may%20have%20multiple%20%22departments%22%20involved%20or%20may%20have%20two%20or%20more%20people%20identified%20within%20the%20same%20department.%26nbsp%3B%20Data%20set%20looks%20something%20like%3A%3C%2FP%3E%3CP%3E%26nbsp%3B%3C%2FP%3E%3CP%3E1234-17%20-%20John%20Smith%20-%20chemistry%20-%20%2410%2C000%3CBR%20%2F%3E1234-17%20-%20Ted%20Jones%20-%20engineering%20-%20%245%2C000%3C%2FP%3E%3CP%3E2345-17%20-%20Fred%20Adams%20-%20Math%20-%20%24100%2C000%3C%2FP%3E%3CP%3E2233-17%20-%20Janet%20Smith%20-%20education%20-%20%2450%2C000%3CBR%20%2F%3E2233%20-17%20-%20Andy%20Taylor%20-%20education%20-%20%245%2C000%3C%2FP%3E%3CP%3E%3CBR%20%2F%3Eworkbook%20is%20about%205000%20rows%20and%20what%20I%20want%20to%20do%20is%20identify%20the%20total%20dollars%20associated%20with%20interdisciplinary%20work.%26nbsp%3B%20So%20I%20would%20want%20to%20count%201234-17%20as%20%2415%2C000%20because%20two%20people%20crossing%20departmental%20lines%20are%20collaborating%20on%20the%20same%20project.%26nbsp%3B%20I%20would%20NOT%20want%20to%20count%202345-17%20because%20Fred%20is%20a%20loner.%26nbsp%3B%20AND%20I%20would%20not%20want%20to%20count%202233-17%20because%20Janet%20and%20Andy%20are%20in%20the%20SAME%20department%20so%20while%20they%20are%20collaborating%2C%20it%20is%20not%20interdisciplinary%20per%20se.%26nbsp%3B%20My%20excel%20skills%20are%20kind%20of%20weak%20and%20I%20have%20tried%20sorts%20and%20subtotals%20and%20all%20kinds%20of%20wonky%20things%20but%20cant%20seem%20to%20find%20an%20easy%20solution.%26nbsp%3B%20Some%20general%20guidance%20of%20how%20to%20figure%20this%20out%20without%20going%20line%20by%20line%20and%20deleting%20stuff%20I%20dont%20need%20would%20be%20great.%26nbsp%3B%20This%20is%20the%2021%20century%20afterall.%26nbsp%3B%20%26nbsp%3BThanks%20much.%3CBR%20%2F%3E%3CBR%20%2F%3E%3C%2FP%3E%3CP%3E%26nbsp%3B%3C%2FP%3E%3C%2FLINGO-BODY%3E%3CLINGO-LABS%20id%3D%22lingo-labs-2688308%22%20slang%3D%22en-US%22%3E%3CLINGO-LABEL%3EFormulas%20and%20Functions%3C%2FLINGO-LABEL%3E%3C%2FLINGO-LABS%3E%3CLINGO-SUB%20id%3D%22lingo-sub-2688610%22%20slang%3D%22en-US%22%3ERe%3A%20analyzing%20data%20to%20remove%20some%20data%20but%20retain%20other%20data%20-%20probably%20a%20very%20simple%20issue%3C%2FLINGO-SUB%3E%3CLINGO-BODY%20id%3D%22lingo-body-2688610%22%20slang%3D%22en-US%22%3E%3CP%3E%3CA%20href%3D%22https%3A%2F%2Ftechcommunity.microsoft.com%2Ft5%2Fuser%2Fviewprofilepage%2Fuser-id%2F1138517%22%20target%3D%22_blank%22%3E%40taylomm%3C%2FA%3E%26nbsp%3BYour%20example%20didn't%20reveal%20how%20your%20data%20really%20looks%20like%2C%20but%20I%20assume%20you%20have%20it%20in%20a%20tabular%20format.%20I'm%20not%20a%20big%20fan%20of%20complicated%20formulae%20if%20I%20can%20solve%20it%20with%20PowerQuery.%20The%20attached%20file%20demonstrates%20just%20that.%20Easy%20to%20learn%20and%20easy%20to%20maintain.%3C%2FP%3E%3CP%3EIt%20takes%20the%20data%2C%20groups%20it%20and%20checks%20if%20a%20project%20has%20more%20than%20one%20department%20involved.%20Then%2C%20it%20sums%20the%20amounts.%20The%20result%20is%20shown%20in%20the%20green%20table.%20something%20that%20could%20work%20for%20you%2C%20I%20believe.%3C%2FP%3E%3CP%3E%26nbsp%3B%3C%2FP%3E%3C%2FLINGO-BODY%3E%3CLINGO-SUB%20id%3D%22lingo-sub-2688617%22%20slang%3D%22en-US%22%3ERe%3A%20analyzing%20data%20to%20remove%20some%20data%20but%20retain%20other%20data%20-%20probably%20a%20very%20simple%20issue%3C%2FLINGO-SUB%3E%3CLINGO-BODY%20id%3D%22lingo-body-2688617%22%20slang%3D%22en-US%22%3E%3CP%3E%3CA%20href%3D%22https%3A%2F%2Ftechcommunity.microsoft.com%2Ft5%2Fuser%2Fviewprofilepage%2Fuser-id%2F1138517%22%20target%3D%22_blank%22%3E%40taylomm%3C%2FA%3E%26nbsp%3B%3C%2FP%3E%0A%3CP%3EPower%20Query%20is%20probably%20the%20best%20way%20to%20do%20this%2C%20but%20if%20you%20prefer%20formulas%2C%20see%20the%20attached%20version.%20The%20formulas%20will%20work%20in%20all%20versions%20of%20Excel.%20In%20Excel%20365%20they%20could%20be%20simplified.%3C%2FP%3E%3C%2FLINGO-BODY%3E%3CLINGO-SUB%20id%3D%22lingo-sub-2688654%22%20slang%3D%22en-US%22%3ERe%3A%20analyzing%20data%20to%20remove%20some%20data%20but%20retain%20other%20data%20-%20probably%20a%20very%20simple%20issue%3C%2FLINGO-SUB%3E%3CLINGO-BODY%20id%3D%22lingo-body-2688654%22%20slang%3D%22en-US%22%3Eis%20power%20query%20a%20365%20or%20excel%20add%20on%3F%20I%20saw%20something%20about%20using%20a%20query%20to%20do%20this%20when%20I%20was%20squirreling%20around%20yesterday%20but%20have%20not%20ever%20used%20it.%20It%20sounds%20fabulous.%20How%20do%20I%20start%3F%3C%2FLINGO-BODY%3E
New Contributor

I have a data set where a "project" is identified by a number and fiscal year. (xxxx-yy).  Each project, however, may have multiple "departments" involved or may have two or more people identified within the same department.  Data set looks something like:

 

1234-17 - John Smith - chemistry - $10,000
1234-17 - Ted Jones - engineering - $5,000

2345-17 - Fred Adams - Math - $100,000

2233-17 - Janet Smith - education - $50,000
2233 -17 - Andy Taylor - education - $5,000


workbook is about 5000 rows and what I want to do is identify the total dollars associated with interdisciplinary work.  So I would want to count 1234-17 as $15,000 because two people crossing departmental lines are collaborating on the same project.  I would NOT want to count 2345-17 because Fred is a loner.  AND I would not want to count 2233-17 because Janet and Andy are in the SAME department so while they are collaborating, it is not interdisciplinary per se.  My excel skills are kind of weak and I have tried sorts and subtotals and all kinds of wonky things but cant seem to find an easy solution.  Some general guidance of how to figure this out without going line by line and deleting stuff I dont need would be great.  This is the 21 century afterall.   Thanks much.

 

8 Replies

@taylomm Your example didn't reveal how your data really looks like, but I assume you have it in a tabular format. I'm not a big fan of complicated formulae if I can solve it with PowerQuery. The attached file demonstrates just that. Easy to learn and easy to maintain. And it easily takes 5000 rows of data. No problem.

It takes the data, groups it and checks if a project has more than one department involved. Then, it sums the amounts. The result is shown in the green table. something that could work for you, I believe.

 

@taylomm 

Power Query is probably the best way to do this, but if you prefer formulas, see the attached version. The formulas will work in all versions of Excel. In Excel 365 they could be simplified.

is power query a 365 or excel add on? I saw something about using a query to do this when I was squirreling around yesterday but have not ever used it. It sounds fabulous. How do I start?

@taylomm 

In Excel 2010 and 2013, Power Query is an add-in that has to be activated; in Excel 2016, 2019 and 365 it is built-in and available directly from the Data tab of the ribbon.

See The Complete Guide to Power Query 

@Riny_van_Eekelen 

Small trick. Let say we have table

image.png

Keep Duplicates from menu, PQ generates something like

image.png

in formula bar change [Count] > 1 on [Count] = 1

image.png

Now we have all but duplicates, not necessary to generate the same by grouping and counting.

@Sergei Baklan Thx. Will look at this later. Right now, I'm trying to solve problems without (too much) M-coding, although I do recognise that it is necessary to come up with the really nice solutions.

 

Edit: See now what you mean. not much M-coding at all

@Riny_van_Eekelen , only to change > on = in formula bar, really not a much of coding.

I appreciate the answers that I received.  I had a little trial and error.  I had to open the samples provided to follow the exact steps as they are not intuitive to someone new to power query and then had to study my raw data multiple times before I actually trusted the results.  I did upgrade my version of excel (was MS 2016 but is now MS 365 in the cloud) so that things worked a little more like what I was being told to do.  I learned something (I think) about splitting columns.  I had some data that belonged to the same project but the project identifier was followed by a letter to distinguish a subset of info for that project.  I thought that if I split the column, the data would be viewed as distinct in the remaining column but excel seemed to remember that the field had contained the letter.  I removed the letter from the raw data prior to the query and it worked like a dream.  Thanks to everyone who responded.  Unfortunately for you, you will likely hear from me again as I learn new excel tricks to make my data analysis easier.

@taylomm