Excel Text Manipulation

Copper Contributor

Device: PC/Windows 10

Microsoft Office Home and Business 2013

Excel Product ID: 00196-20943-85146-AA714

 

In the enclosed spreadsheet there is a column headed ‘Animals’.
Each row contains the name of a person and, in the Animals’ column, their favorite animals.
Dogs and cats are listed by breed and prefixed with ‘a-‘ or ‘b-‘ respectively.
Is it possible to break out dog breeds and cat breeds into their own columns while still keeping them under ‘Animals’?
Thanks, Joel

7 Replies

@Joel_Bauer  asks: In the enclosed spreadsheet there is a column headed ‘Animals’.
Each row contains the name of a person and, in the Animals’ column, their favorite animals.
Dogs and cats are listed by breed and prefixed with ‘a-‘ or ‘b-‘ respectively.
Is it possible to break out dog breeds and cat breeds into their own columns while still keeping them under ‘Animals’?

 

It no doubt is possible.

 

But may I first answer your question with another question (or maybe two or three)?

  1. What is your ultimate purpose here? How will this data be used?
  2. At what stage are you in the creation of the real database (this is obviously just an example)?
  3. Are you open to a wholly different approach?

 

Why do I ask those questions? Because in general, it's a terrible design idea to lump all kinds of disparate types of data into a single cell. A terrible design idea! As proof of that strong statement, the very fact that you're coming and asking "can I break these out into their own columns?" makes my point for me. 

 

It is a LOT easier to take data from multiple columns, even multiple rows, and produce a report that puts them together, a LOT easier to do that than it is to take data all stuffed into a single cell and break it apart. Are there text manipulation functions that could be assembled to do that breaking apart? Yes. But they're not easy, and the more inconsistent and erratic that data is (e.g., the spelling of "Frnch Poodle" or is it "French Poodle"--in your example), the harder it is to make sense of things.

 

So IF you're open to re-thinking the approach, I have some ideas:

  • Have one row per favorite per member--yes that means a LOT of rows, but stay with me
  • Use data validation to get the spelling of each breed consistent
  • If you have the most recent version of Excel, you can even have two or more levels of "cascading" data validation, such that first you'd select among "Jungle, Dog, Cat....", which selection would lead to a focused secondary list "Elephant, Lion, Tiger, Zebra..." or "French Poodle, German Shepherd, Bulldog...." or "Persian, Siamese, American Wirehair...."

 

Creating a database such as I've alluded to there would enable:

  • lists of all members who've identified "Elephant" as a favorite (or whatever)
  • lists of all of John Smith's favorite dogs
  • lists of all of John Smith's favorites of all types
  • lists of all people who have "Horses" AND "Zebras"
  • of lists of all people who have "Horses" AND "Donkeys"
  • etc. [as noted, having separate data elements makes it a lot easier for Excel to do wonders with data retrieval]

@Joel_Bauer I agree with @mathetes (i.e. John) that collecting data as displayed in your example is sub-optimal. 

 

Attached is a possible solution that combines PowerQuery, PowerPivot and a minimal amount of DAX coding (Windows only, I'm afraid). The output is not exactly as what you asked for, but it's close.

 

Note that this only works because I put a comma between "Lion" and "a-Beagle" on the animal listing for John Smith. If you can't guarantee the consistent use of the comma to delimit animals, it's going to break.

 

Having said that, it's probable better to collect your data along the lines as explained by John.

 

 

Hi Mathetes, Sorry I am not open to a different approach. Data is coming from a legacy system. We are lucky to get what we have, Thanks, Joel
Hi Riny, Yes the data is sub_optimal, but its what we have. Yes all 'animals' will be comma delimited. I have many questions, but your arrived at solution looks very promising. I need to be able to implement it myself on real data. I will be back to you after I have googled a bunch of things. Thank you so much. Joel

Hi again Rimy,

  I cannot follow your suggested solution.  It’s the first time of I’ve heard of PowerQuery, PowerPivot, or DAX coding.

  What I am looking for is a function or series of functions or something that I can plug into Cells E2 and F2 and then copy down the rest of the rows.

  These Functions will look at the text in C2 and extract the Dog Breeds and Cat Breeds.

  I have enclosed a revised spreadsheet. The data is now in a table.  The ‘Before’ tab shows what I have now. The ‘After’ tab shows what I want.

  Any thoughts would be appreciated.

Thanks, Joel

@Joel_Bauer Understood what you wanted, but don't think you can achieve exactly that so easily in any other way.

@Joel_Bauer 

 

So, Joel...can I jump in again, recognizing that what you've got is what you've got, that it comes from a legacy system....can I ask how many records there are in that legacy system and whether or not it wouldn't be worth it to spend time to actually approach a total re-design of the fundamental layout, so that, as I suggested in my first response, there's one record per person per animal. It absolutely will take some time, and it may be possible to automate some of that, but I would venture to say that there's going to be a fair amount of "manual" cleaning up needed regardless of how you do the first pass.

 

I'd be happy to create a sample of a cascading set of data validation tables (although I should warn you that they'd require the most current version of Excel) that could be used.

 

And again, I'd point out the far greater flexibility of output from a radically re-designed table of data. So it also comes back to what your ultimate purpose is, how you intend to use the data in whatever "final" format it achieves.

 

By the way, the suggestions from @Riny_van_Eekelen  are excellent--his are--but if you like me are on a Mac, some of those are features not accessible to Mac users (sigh), no matter how much we might wish they were.