Excel: Help! Formula needed

Copper Contributor

Hi there,

 

I'm a biological research masters student and I have a large data set. I'm working with seals so it is difficult to identify individuals. I'm trying to create a spreadsheet ready for data collection and have come across two problems. 

 

I have the following spreadsheet:Screenshot 2023-04-25 at 1.03.55 pm.png

Problem A:

The ID (identity) relates to which individual it is. At present they are named A-F. This is so that when they are identified, the ID that the individual is can be added in, as it may be the case that later in the videos the individual is identified. Is there a way for excel to change all related cells to the new number? For example, when the first "a" changes to "105", I want it to change all the "A's" in the rest of the data set to "105" for that date? There will be multiple different days and times featured on the data collection sheet. So if a formula is created that works for this, will it then change it for the rest even over different dates?

 

Problem B:

So far I haven't seen a solution for this yet. Is there a way for Excel to recognise the time patterns? I have tried the control+E method and Excel doesn't recognise the pattern, so I'm having to manually type each time in (which is every 30s over considerable hours). I have days of footage to do this with but limited time so wondering if there was a faster method of doing this? If not, well, long days it is!

 

Any help regarding this would be really appreciated! 

16 Replies

@SMcGivern 

I'm working with seals so it is difficult to identify individuals. I'm trying to create a spreadsheet ready for data collection and have come across two problems. 

 

So at least there are no concerns with violating their privacy here. :) It might be helpful, since that worry doesn't apply, if you could post the actual spreadsheet rather than just an image. Use OneDrive or GoogleDrive and paste a link here that grants access.

 

That said, let's look at each of your questions:

Problem A: The ID (identity) relates to which individual it is. At present they are named A-F. This is so that when they are identified, the ID that the individual is can be added in, as it may be the case that later in the videos the individual is identified. Is there a way for excel to change all related cells to the new number? For example, when the first "a" changes to "105", I want it to change all the "A's" in the rest of the data set to "105" for that date? There will be multiple different days and times featured on the data collection sheet. So if a formula is created that works for this, will it then change it for the rest even over different dates?

 

Answer A: yes. That should be fairly easy.

  • To be on the safe side, I'd suggest adding a "helper column" to the right of the current ID column, header (perhaps) "Name" or some other heading that indicates "we've actually identified this individual."
    • "safer" in that doing it this way doesn't destroy the original ID, just in case there's "re-identification" later in the process. I believe in keeping history, in that sense.
  • Then create a table on a separate sheet in the same workbook where you can define the relationship between the temporary letters (your current ID) and the more verified "Name." (I've attached a sample workbook illustrating this, but as noted, if you could post your real one, then we can be fully on the same page.)
  • Then in the helper column, use a formula like this (see attached spreadsheet for how it works). This uses the LET function, so you'd need a new version of Excel or a subscription to Microsoft 365. I'm on a Mac myself, so that per se is not an issue.

 

 

=LET(known,INDEX(Table2[Name],MATCH([@ID],Table2[ID],0)),IF(known=0,"",known))

 

 

 

Problem B: So far I haven't seen a solution for this yet. Is there a way for Excel to recognise the time patterns? I have tried the control+E method and Excel doesn't recognise the pattern

 

Answer for the moment: What do you mean by time pattern in this case. What kind of pattern? What is the nature of the pattern you're wanting to recognize? Frequency? Intervals? ??? Duration? etc.

 

Finally: what version of Excel do you have? And, while we're at it, what Mac OS?

@mathetes Thank you for responding to me.

 

So for problem A:

 

With my individuals, there are seals moving in and out over time which is another complexity to my data - I have individuals being added and then removed during the data collection (N=40). But I see what you are saying. I have an idea as to what I'll do to solve that problem using your method. So thank you for that!!

 

Problem B:

 

So by time pattern I mean purely the pattern within my data so for example:

 

PoolDateTimeID
116/02/202303:57:05A
116/02/202303:57:05B
116/02/202303:57:05C
116/02/202303:57:05D
116/02/202303:57:05E
116/02/202303:57:35A
116/02/202303:57:35B
116/02/202303:57:35C
116/02/202303:57:35D
116/02/202303:57:35E
116/02/202303:58:05A
116/02/202303:58:05B
116/02/202303:58:05C
116/02/202303:58:05D
116/02/202303:58:05E
116/02/202303:58:35A
116/02/202303:58:35B
116/02/202303:58:35C
116/02/202303:58:35D
116/02/202303:58:35E
116/02/202303:59:05A
116/02/202303:59:05B
116/02/202303:59:05C
116/02/202303:59:05D
116/02/202303:59:05E

 

I'm using video time-lapsed data which is taking a snapshot every 30s from midnight to midnight (throughout the day). So as you can see in the section of table above, when there is 5 individuals (for example), there will be 5 copies of the same time which then going into the next "bunch" of cells will be 30s later and so forth. If I select all of the above in time to go down, it only repeats what is above it, which doesn't quite work as I have differing start and stop times (caused by issues with cameras).

I don't think this a "pattern" that excel will recognise or duplicate. Hence why I'm doing it manually. 

 

I've attached the link (hopefully) so you should be able to see the document. There is no data inputted into it as of yet. If it doesn't work, let me know. 

Data sheet for Cameras.xlsx

 

I also have the latest Microsoft Excel and I'm on a Ventura 13.1

@SMcGivern 

 

by time pattern I mean purely the pattern within my data

 

Of course it's a pattern within the data.

 

So as you can see in the section of table above, when there is 5 individuals (for example), there will be 5 copies of the same time which then going into the next "bunch" of cells will be 30s later and so forth. If I select all of the above in time to go down, it only repeats what is above it, which doesn't quite work as I have differing start and stop times (caused by issues with cameras).

I don't think this a "pattern" that excel will recognise or duplicate. Hence why I'm doing it manually. 

 

I apologize if I'm being dense, but although I get that there are five instances of the same time, and then another bunch, "and so forth," it's still not quite clear how you're defining what a pattern IS. (And maybe more to the point, what it ISN'T, which is a contrast question that sometimes makes definitions clearer). Where (when?) does a pattern begin and end? What makes it a "pattern" in your definition? Why would some combinations NOT be a pattern?

 

If we can get a clear definition (clarity being achieved by defining boundaries) of what constitutes a pattern, it is entirely possible that we could create a formula/function that will recognize same.

 

I was able to open your file on OneDrive. Thank you. 

FWIW, my Ventura is 13.3.1 ... a slight update. I'm glad you have the new Excel versions, since that gives us access to some neat new features which might help identifying/recognizing patterns.

@SMcGivern 

Does this return the intended result?

Formula in cell E2 in the example and filled down:

=INDEX($C$2:$C$10000,IF(MOD(ROW(A1),5)=0,QUOTIENT(ROW(A1),5),QUOTIENT(ROW(A1),5)+1))

Formula in cell C2 in the example and filled down:

=ROW(A1)/2880+1/17280

repeated times.JPG

 

@OliverScheurich @mathetes 

 

So to answer @mathetes questions:

 

My apologies, it's a hard one to define for me as I'm neurodivergent.

 

So I think the "pattern" is dependent on how many individuals are within that pool during that day. So for example the max number of seals in pool 1 for the 16/2/2023 is 5. So there will be 5 replicates of the time signature e.g. 12:23:35, before going into the next time signature where there will be another 5 replicates of the same time signature 12:24:05. The start time varies because of malfunctions and other issues in regards to the cameras. So for example this particular clip for the day starts at 13:13:05 and ends at 15:38:35. There is no consistency with the cameras in regards to the time they start the Timelapse so this is near enough always going to be different. However, the times between the start and end will always be 30s as that was what the interval for the Timelapse was set to. 

 

So. @OliverScheurich has the wanted outcome however, I need it to duplicate the same time for every 5 rows for example. And then for other days, in other pools I have 12 individuals, so there would need to be 12 replicates of the same time. 

 

I guess what I'm trying to do is to find a way to repeat the same times for x rows (dependent on the number of individuals) and then have the time change for 30s later and then repeat for another x rows for the same day to an unspecified time point. I hope that makes sense!

@mathetes 

 

Just going back to problem A, I've realised I've not explained in the best way. So the ID are unknown individuals at that time and then when they are identified I would like to be able to change within the column to what the individual actually is. And then any individual that is not ID-ed could then be left as unknown C for example. But I only want this formula to work for one date at a time if possible as the individuals can change from data to date and from pool to pool. 

@SMcGivern 

=INDEX($C$2:$C$10000,IF(MOD(ROW(A1),12)=0,QUOTIENT(ROW(A1),12),QUOTIENT(ROW(A1),12)+1))

 

This formula duplicates the results in column E for every 12 rows. I've only changed 5 to 12 compared to the already suggested formula.

 

=ROW(A1)/2880+18006/86400

 

With this formula you can change the start time to 05:00:36. The formula is in cell C2. The first part ROW(A1)/2880 represents 30 seconds and 18006/86400 represents 05:00:06 because this is the 18006th second of the day. You can fill this formula down until the actual end time for that day.

duplicate rows.JPG

@SMcGivern 

 

So the ID are unknown individuals at that time and then when they are identified I would like to be able to change within the column to what the individual actually is. And then any individual that is not ID-ed could then be left as unknown C for example. But I only want this formula to work for one date at a time if possible as the individuals can change from data to date and from pool to pool. 

 

I understand leaving unknown individuals as unknown. What I don't understand is the desire for even those who ARE known, who have been identified, to be only known "one day at a time." Why is that? Why can't individual A, once identified--for sake of argument, as "SealJohn Doe"--show up on different days and different pools? That is to say, don't individuals, once identified, retain that known identity?  Wouldn't you want to track them accordingly, across multiple days, multiple pools?

 

This also goes back to my first suggestion, that you retain the initial IDs as part of history, but also associate them with a known name or label, once they have been identified, in a distinct column.

@SMcGivern 

The rightmost 2 columns of your grid could be generate by formula provided they are completely regular.

= LET(
    sequence,    QUOTIENT(SEQUENCE(rows,1,0,1),sealCount),
    time,        startTime + sequence*interval,
    individual,  MOD(SEQUENCE(rows,1,0,1),sealCount),
    HSTACK(time, INDEX({"A","B","C","D","E"}, 1+individual))
  )

image.png

@mathetes 

 

It may help if I explain the context a bit. 

So, there are 4 separate pools within this rehabilitation site and each of the 4 pools has been recorded on the same day. Each individual within the pool changes from recording to recording (Day to day), sometimes due to the individual either being released or moved to another pool as part of the rehab programme. Some of the individuals do not move from one pool to another. For example pool 1 is a release pool so some individuals are not seen in other pools. On the same day of recordings there can be for example 5 individuals in pool 1, 4 individuals in pool 2, 6 individuals in pool 3 and 12 individuals in pool 4. All of these seals are different and are only the same individuals in a handful of cases.

The A, B, C, D are purely just notary of all individuals in the pool at that time until I can successfully ID that individual through focal following (following that individual). 

 

That is to say, don't individuals, once identified, retain that known identity?  Wouldn't you want to track them accordingly, across multiple days, multiple pools?

 

Yes, they do retain their identity, but because of the video footage, it can be incredibly difficult to ID each individual, or even the same individual, across multiple clips. So each of the videos (day) has to be taken as if it was a new piece of footage as for example the individual you were able to identify in the 16/2/2023, you may not be able to identify for certain in the clip from the 20/2/2023 for a number of reasons. The identification process has to be started anew every time you go into a new data clip for an alternate day for this reason. Seals, especially grey seals, can be difficult to identify as I have experienced, for example, 3 seals in one pool all look alike but the only identifiable difference is their flippers which you are unable to see due to these being underwater. 

Additionally the recording days are not consecutive. Due to technical problems, the cameras were only able to record at least every 2 days, which after the 16/2/2023 was increased to 4 days due to the battery life of the cameras failing (the cameras are solar powered and this is the first project to use these cameras). 

Additionally these individuals already have an identity that is corresponded to further data - such as weight. So matching the behavioural data - what this data collection sheet is for - to that individual has to match exactly. 

So, in theory, by using the method suggested, I would have to create multiple tables because of how the footage has turned out. I simply want to be able to change that letter (e.g. A) for that day for the correct identity (e.g. 105), and then for that date have it change all "A" to the correct ID "105". 

 

@SMcGivern 

I had in mind that you would have a separate sheet for each run of observations since any patterns from one set of recordings do not simply read over to the next.  I hard-wired the letter IDs 

= {"A","B","C","D","E"}

into the formula but, if it were instead a reference to a range on the worksheet, then simply overwriting a letter by the ID would change every reference to the specific seal within the relevant pool.

@SMcGivern 

It may help if I explain the context a bit. 

 

It always helps to explain the context. Always. It helps eliminate (by anticipating) questions of defining the process. 

 

I don't know if this would meet your need as I now  understand it. I've changed the reference table to allow for different IDs to Name connections based on date. And the new formula takes account of both ID and Date in that two dimensional matrix.

=INDEX(Tables!$B$2:$D$7,MATCH([@ID],Tables!$A$2:$A$7,0),MATCH([@Date],Tables!$B$1:$D$1,0))

 

 

@mathetes 

 

Would this work for multiple pools? So, as mentioned before, the ID would potentially be different for each pool as well as each day. So it is more of a 3 dimensional mix than a 2 dimensional mix. Would this mean that I would have to create a table for each pool with the dates across the top and ID as the first column or would there be another way to do this? 

 

Thank you for all of your help. It is really appreciated. 

Okay great thanks. I just have a question regarding the 18006/86400. How did you work this part out? How did you work out that this was that second of the day? I'm guessing that the 86400 is the total number of seconds for the day? Also, I noticed that on the first comment you made that you had 1/17280. What is the 17280 related to? Is that the total number of seconds of the day?

@SMcGivern 

86400 seconds is the total numbers of seconds of a day. 60 seconds times 60 minutes times 24 hours is 86400 seconds. In the second example the start time is 05:00:36. Start time less 30 seconds is 05:00:06. Five hours and 6 seconds are 18006 seconds. 60 seconds times 60 minutes times 5 hours plus 6 seconds is 18006 seconds and therefore 18006/86400 of one day. Then 30 seconds are added with ROW(A1)/2880 which makes 05:00:36. If the formula is dragged one row down the result is 05:01:06 and so on.

 

In the first example the start time is 00:00:35. 5 seconds are represented by 5/86400 which is equal to 1/17280. Then 30 seconds are added with ROW(A1)/2880 which makes 00:00:35.

@SMcGivern 

Would this work for multiple pools? So, as mentioned before, the ID would potentially be different for each pool as well as each day. So it is more of a 3 dimensional mix than a 2 dimensional mix.

 

There are ways to do it. Unfortunately, I'm going to have to pass for now...driving to the airport to go on a trip very shortly and still need to pack.