Is Excel capable of this operation?

Copper Contributor

THIS QUERY IS NOT FOR THE FAINT OF HEART!

Speadsheet has Column A and Column B. Intermittently, Column A has a "plateau", where data is "stable" ( within 3%) for at least 4, and as many as 20, rows. Between plateaus, data is dynamic and may be higher or lower. I want to assess data in Column B to find when Column B data reaches a "plateau" between 2 and 20 rows (always) AFTER Column A. Absolute values and ranges in both Columns are different, but "plateau definition" is identical. A) How can Excel identify when "plateaus" occur, and B) How can Excel identify when Column B has a "plateau" within 2-20 rows, when Column B has a different data value/ range than Column A? Is this beyond Excel capabilities?

22 Replies
It sounds very do-able. Do you happen to have an anonymized sample workbook you can share by attachment or upload to a host like OneDrive?
Yes, working on numeric AND visual example. Expect to post within 24 hrs. Appreciate your patience. Glad to have a response to my query!!

@Patrick2788 

Sorry for the delay, Patrick! I was creating some docs to help explain my problem and my target. Here they are:

Disparate Data Tiers documents

 

I sincerely appreciate your help with this problem, in advance. I'm not an Excel wiz, so I'm not sure that this is even possible. I hope you're able to prove me wrong, and maybe show me an example.

If you have any problems with OneDrive access, just let me know. I have other avenues of access.
this sounds interesting but not following some of the logic and such. You mention examples but all I see is an excel sheet with no graphics/plots. In that data sheet you label sections in groups but there are cases/runs of data that seem to be plateau that aren't labelled. I am also unclear what the label means as I would expect step 1 to be more a column to note Plateau on A and another for Plateau on B and try and find relationship between those 2. I don't follow what the labels you have are indicating. you also mention a plateau is within a certain percentage, is that known? and what to do about overlapping plateaus (e.g. slow ramps). for example we will say 10% and 5 samples and you have 9,10,9,10,10,11,10,11 and you will have a plateau 9,10,9,10,10 and another plateau 10,10,11,10,11 and the middle 10,10 overlap and are required by both to satisfy the 5 sample min and if they weren't required for the min which side would they go on?

@asezas 

Thank you for taking the time to upload the sample workbook and notes.

 

I have some questions. Beginning at Row 50, there's 37 consecutive data points where White is 6 and Orange is 0.  I entered "Label?" in column C because this seems to meet the criteria of 24+ rows (or 6 minutes) of stability. Should these be marked with labels?

Patrick2788_1-1691445052500.png

 

This might be simplifying things, but sometimes with a complex task it's best to start small and scale it to add nuance and complexity as needed.  One way to measure stability is obtaining a rolling Standard Deviation.  For the sake of this example, we'll say Standard Deviation must be 0 for there to be Stability.

 

This function obtains a rolling Standard Deviation:

=SCAN(, White, LAMBDA(a, v, LET(r, ROW(v) - 1, STDEV(TAKE(White, r)))))

The same function can be used on the Orange array, too.  

 

This snip illustrates the stability early on in the sample (0 and 0 for those last two columns):

Patrick2788_2-1691445768096.png

The question I have is: when the stretch of 4-0 ends, would you like to discard the previous numbers for White and Orange and start over in the calculations?

 

 

@Patrick2788 Wow! I am reeling from your insight! Amazing! "Label ?" question is best answered as "values for Column A are below threshold limit", so no analysis (or label) is required. Values don't meet criteria.  Answer for 2nd question regarding 4-0 is YES, deviation should reset to new plateau values, so 5% deviation at plateaus around Column A data of 30, should be re-calc'd when new plateau is around Column A of 50. Rounding up value of deviation is alao acceptable, cuz we only use integers.  I hope these answers help to clarify. I am travelling from late day 8Aug until eve of 11 Aug. My delayed responses are not due to lack of interest, only to connectivity and eyesight issues. Best regards

 PS: Your skills and knowledge are impressive! Good for you!

I appreciate your response. Hope this reply clears some issues. 1st, Cyan letters are for ID/landmarking purposes only. 2nd, values below Coulmn A Threshold (23?) are not required to be analysed, because they are below a value that can produce a meaningful value in Column B. "RAMPS" are not "plateaus", so they are not counted, but I do see your questions about when does a "ramp" become a plateau

A pure square-wave data set in Column A without scratchiness would certainly eliminate that, but I have no hard answer -only that de-bouncing or flattening the values in both columns (averaging, or means, or rounding to a value of 5's?) might be a solution. I am still trying to bend my brain over that solution. Your comments would help. Thanks for your kind efforts and input..PS: Expect delayed replies from late 08 Aug to late 11 Aug..Travelling, limited conne tivity and vision.
If 6 is below threshold then why are rows 2-34 given Label A since those are only 4 and then 98 to 273 are given Label B also with White value of 6. And then Label E is rows 432 - 479 where White starts at 51 and max of 56 and then very next value is also 51 but not included. So it seems you can have a ramp (sliding threshold?) AND 51 to 56 may/could be 53+/-5% but then that would be 50.35 (round to 50) and 55.65 (round to 56) but then the 50 just before the first 51 isn't included either. and then Label F and G and L and Q and R and T and U are each 0 in column 1 (certainly below threshold). and then Label J does NOT include a 43 just before it starts but DOES include a 40 at the end right after a 43. label K is too short. Label P is also only 23 long but at least that is really close to 24.
So back to my points above. You mention an 'attached graphic' but I can't find it and that may help explain a lot of this. You mention thresholds, minimum deviations, averaging, etc... but I don't see those values anywhere except the 5% deviation you mention. Patrick suggested using StDev instead of a 'flat' deviation and you seem to like that idea? I suspect some of these labels maybe to point out things like areas not to count or plateaus for Orange and not White and hence I suggested the 2 separate columns.
I'd love to sink my teeth into this but feel like I'm grabbing blind from the refrigerator drawer and just as likely to grab an onion as a I am to grab an apple.

@mtarler 

@asezas 

 

The approach I'm taking is to identify the plateaus in the data where White and Orange are stable for at least 24 rows.  Essentially, it's identifying the rows that are part of a 'streak'.  This draft does not allow for any wiggle room. For White/Orange to be stable, there must be no change for 24+ rows.

 

=LET(
    Change, LAMBDA(arr, N(DROP(arr, 1) = DROP(arr, -1))),
    Stable, DELTA(Change(White), Change(Orange)),
    Seq, SEQUENCE(ROWS(Stable)),
    NonStable, 1 + FILTER(Seq, Stable = 0),
    Streak, NonStable - VSTACK(0, DROP(NonStable, -1)),
    Seq_2, SEQUENCE(ROWS(White), , 2),
    Plateau, LAMBDA(e,
        LET(
            x, XLOOKUP(e, NonStable, Streak, , 1),
            IFNA(IF(x >= 24, "Plateau", ""), "")
        )
    ),
    MAP(Seq_2, Plateau)
)

@mtarler FIRST, I like your humour! 2nd, you are correct in thinking your are missing MUCH by not having the explanatory graphic!  How can I get it to you? Patrick opened it OK, and it was definitely uploaded as a PNG. I will shortly try to post as a PDF. Following the notice that the "Labels" are identified in the graphic using alpha characters, they serve only as landmarks, to identify a particular location on the graphic; their inclusion in the xls was to help connect data values to a location on the graphic. They have no connection to CASE occurences.  Without the graphic, explanation is difficult. However, I can say that all values below 23 in Column A are exempt from analysis, and I can also say that Label location F (Column B) is a CASE for Label location E (Column A); Label O is a CASE for Label N; Label T is a CASE for Label R; and, Label W is a CASE for Label U. Until we can get you a copy of the graphic I worry that you are "blindly grabbing in a dark fridge", if you'll pardon the expression! Ha! I hope this helps to clarify matters. - Best regards

@mtarler How about trying this PDF version of the graphic that uses a different colour bkgnd, and darker characters for location Labels? ... PLEASE ADVISE JOY/NOJOY

yes I can see that (btw where is that posted before? in that sharepoint folder? so weird I don't see it in that sharepoint folder).
so let me take a step back here and ask if you are looking for help to specifically do what you want or is there are bigger picture that your trying to get to and hoping this approach gets you there. The analogy being you want to catch lots of fish and you have a fishing pole so you want a boat that can hold tons of rods but didn't realize you could use a net instead. I ask because it appears you are looking for feature detection not plateau detection. for example plateaus B,C,D,E are basically all part of a feature that results in the feature topped by F. Also, that threshold barely meets that final valley and I suspect was selected after the fact specifically to catch that valley. Instead you might be better with a dynamic threshold that is basically a heavily weighted moving average but then will 'drift' up during the active region. I would probably also have the 2 traces independently processed and then correlated for common features. But I don't want to jump down either rabbit hole without checking with you. You could have very good reason why you specifically want to look at plateaus and for that specific threshold. Another question is if maybe you are really only interested in 'peak' plateaus (i.e not features C, D, H which are sub-peak plateaus) and maybe that min length isn't key as it sure looks like (and you say) F is a CASE for E even though F is only 18 long.

@asezas OK so i had 'fun' and created 2 Lambda functions: wAvg and dThresh. 

my wAvg will take an array and apply an N-pt moving average.  It can accept an weighting array for custom weighted windows or just an N for # of pts and will size that window in the beginning and end to output an array length = to the input array. 

dThresh will take an array and apply a dynamic threshold meaning the threshold value will drift based on the signal or the dPath you give it and return FALSE for values < dThresh.

I did this thinking these might be helpful to this problem, but when I came back to look at your data and sample sheet I don't think so and noticed:

a) the graph you sent doesn't match this set of data so those letters don't line up with the named sections on the spreadsheet (no big deal i guess) and

b) you have a column with your visual output which is what I assume you hope the formula to produce.  BUT those sections don't make sense based on your "rules".  

c) To me, it seems like you just want to find any local maximas that 'plateau' for at least x points on the 'response' data (and maybe the same on the 'Energy' data too).

 

would love to help on this but still trying to understand exactly what you need.

OK so I finished creating a LAMBDA to 'find plateaus' and added my LAMBDAs to your example file. I inserted my formula:
=Plateaus(B2:B1100,25,15,5,5)
which is looking at the array in column B (Orange) and uses a threshold for the average plateau value of 25, a minimum plateau width of 15, and allows for a maximum difference of 5 on the plateau and the final 5 is that it applies a 5pt moving average to the data. There is also a graph showing the White and Orange data and this new column/output. The amplitude is just progressive for each plateau (1, 2, 3, etc...) just so that if 2 'plateaus' are adjacent it is clear that they are recognized as 2 and not just 1 long one.
It isn't perfect but hope it is a starting point toward your goal and maybe help us understand your goal better.
Trying to grok the fishing pole analogy, what I seek thru my query is "all of", or "the big" picture. Please let me take a step back from the granularity and scratchiness, and try to explain it this way: Column A is an "event". Let's call it a sound, like a bang or a clap of thunder, or a snap of fingers. The louder the sound, the higher the value, and the longer the sound duration, the more data points get lit up. Column B, is a response to that sound event, by persons. The response (of course) always happens after the event, but sometimes the frequency of the event is too high or too low and is therefore inaudible for a person, and sometimes the response is small because the perceived volume of the sound was small. Sometimes, the response is long because event was long. In the end, one thing I'd like to determine is where the threshold of detection is for the person giving the response ... but right now I'm happy just MANUALLY plugging in a number for that threshold value. Similarly, I'm also happy manually plugging in a number that can be used to determine how many sounds the person heard that were longer than x data points. Visually, it's a relatively simple task to look at the graphic of the plotted data, and tell whether there appears to be a correlation between the sound intensity and the response. It's less easy to determine how long it takes to get a response for a sound of a certain volume, or if sounds of higher volume require less duration to get a response. But for now, if I can plug a value into a formula for the Threshold value (nothing below that needs to be analyzed), and/or plug in a value that says that there has to be a specific number or correct/ positive responses to N number of sounds, I will be a happy camper. I offer this reply to all of you following this problem.
This is intended for Patrick2788: Trying to grok the fishing pole analogy, what I seek thru my query is "all of", or "the big" picture. Please let me take a step back from the granularity and scratchiness, and try to explain it this way: Column A is an "event". Let's call it a sound, like a bang or a clap of thunder, or a snap of fingers. The louder the sound, the higher the value, and the longer the sound duration, the more data points get lit up. Column B, is a response to that sound event, by persons. The response (of course) always happens after the event, but sometimes the frequency of the event is too high or too low and is therefore inaudible for a person, and sometimes the response is small because the perceived volume of the sound was small. Sometimes, the response is long because event was long. In the end, one thing I'd like to determine is where the threshold of detection is for the person giving the response ... but right now I'm happy just MANUALLY plugging in a number for that threshold value. Similarly, I'm also happy manually plugging in a number that can be used to determine how many sounds the person heard that were longer than x data points. Visually, it's a relatively simple task to look at the graphic of the plotted data, and tell whether there appears to be a correlation between the sound intensity and the response. It's less easy to determine how long it takes to get a response for a sound of a certain volume, or if sounds of higher volume require less duration to get a response. But for now, if I can plug a value into a formula for the Threshold value (nothing below that needs to be analyzed), and/or plug in a value that says that there has to be a specific number or correct/ positive responses to N number of sounds, I will be a happy camper. I offer this reply to all of you following this problem.
Perhaps this is a better analogy: Consider a "heavy sleeper". We want to know how hard we have to "poke" them (Column A), to get a response (Column B). The response might be anything between a murmer and a loud moan, and the response always comes AFTER the poke. A big poke would get a faster and louder response, but a gentle poke might get a slower and quieter response. We just want to know (1) if we poke them above a certain level, do they feel it? We want to be able to make sure that the person who is being poked isn't trying to "fake" it, by just responding with responses of random intensities and random intervals. We want to be able to draw a dependable correlation between the "poke" and the "Mm!" It seems easy to SUBJECTIVELY spot correlations visually, but I hope that somehow Excel can perform this act as an OBJECTIVE analyzer.
So my fishing pole analogy is a variation on when you have a hammer everything looks like a nail. and questioning if you are seeing the forest through the trees. I totally get the input output concept and looking at trying to extract thresholds and time delays. On a side note about thresholds I remember an awesome article about the use of adding chaotic or white noise to increase event detection. In concept it was so backwards it stuck with me. Basically it was seemingly random events but by introducing the additional noise it was able to make the periodic stimuli result in reaching threshold more often and then the frequency of that underlying stimuli was revealed. In any case I digress.
My main question here is why are you looking for plateaus and not just peak features/occurences? Is there something special about a plateau?
On another note, did you look at the sample file and play with the LAMBDA function I created to detect those plateaus?