Handling non detects (<0.123) in a database

Copper Contributor

Hi 

Hoping for a bit of advice and hopefully a simple enough solution...

 

I have a database of water chemistry roughly 130k rows of data currently but this will grow month on month. At the moment I run multiple pivot tables off the data which work pretty well so far. What I have struggled with and to date overlooked was where some results are reported as a limit of detection (LOD) for a parameter e.g. <0.06 or similar. Up to now I've told the Pivot tables to ignore these values i.e. when It pulls them in to the table it is an error and all errors in the pivot tables are the converted to "<LOD" when shown in the table.

 

My overall goal is to quickly review and present several tables and reports based on the analysis being completed on monthly basis (this is not fixed and can vary from a few hundred results or a few thousand) depending on the no of samples and or parameters being tested.

 

My questions based on this are:

1. Is there a better way to handle these <LOD values in pivot tables

2. I am considering converting the <0.06 to 0.03 as a semi conservative way to represent the data. this may be acceptable as I will not be saying there is zero. More conservatively I could convert the date from <0.06 to 0.06 but this could present a false impression of the results. aadvice on how to run this on 130k lines of data easily would be appreciated.

 

Note on the pivot tables I summarize each result as an average value  which has caused me some issues also but has worked okay for the most part but duplicate entries have caused some issues.

 

I've no experience of any coding or other methods outside of the normal functions of excel so I would likely struggle if I need to introduce Macros or VBA

 

Many thanks in advance

 

 

3 Replies

@ultankcc I don't understand the question/problem you are explaining and why you are doing what you are doing and such.  My thought/guess is that you are marking those values <LOC for easy visual recognition but when you start to do sums and averages you value is off because those small values can add up and make a difference?  but that is all a guess.  If I am right, I would recommend leaving the 'real' values and highlight (or 'dim') those cells using conditional formatting.  you can do a simple threshold detect and set the formatting of those small values to be dark gray font on light gray background or something to easily indicate they are <LOC.  If I'm am way off here, maybe you could try again explaining and share a sample sheet. 

@mtarler thanks for the response and sorry for delay in my response as well as not making myself clear. My source data contains values  the less than symbols e.g."<0.01" so when I sum or average values this data it results in an error on the pivot table. I was hoping that there is a way to avoid these calculation errors.

@ultankcc so in your op you mention a couple of options that you were looking at doing (replacing "<xxx" with a value =xxx/2 or =xxx) and you mention how you have a pivot table that is replacing them with "<LOC".   So I have a number of questions. 

Are you looking for advice on which way is best w/r to data analysis, or how to accomplish this replacement, or for alternative solutions?

With respect to the pivot table, there are ways to create custom data columns in a pivot table but are you only using the pivot table to calculate your averages? If so and you are just reporting that average, I could imagine a formula to calculate the average without the pivot table.  I'm just trying to understand the direction to go.

You also mention a problem with duplicate values being a problem.  What exactly do you mean by this because either direction can be plagued by this sort of problem and it would be helpful to understand this problem before working on a solution for you only to have this problem need to get addressed later.

As for the data analysis view point, maybe doing it both or all 3 way to give a min-max bracket around the average to give a confidence interval.  (i.e. assume <LOC min=0, avg=LOC/2, max=LOC).  It's a bit more work but if it is critical to know the tolerance of your value that might be best.