Jul 09 2022 05:45 AM
Hi everyone
I am working with large amounts of current data from a photovoltaic plant and I need to find those current values that deviate from the average and therefore are producing less than the rest. I only know that there is less production at sunrise and sunset because at that time the pv panels are shaded and the value of current produced decreases.
I have data every 15 minutes from 2021 to 2022.
I appreciate if anyone could help me with this.
Thanks in advance
Valentina
Jul 09 2022 09:41 AM
Jul 09 2022 11:57 AM
@mtarler thank you for your answer.
I agree with you that there will be multiple below average current values which does not mean that all those PV panels are shaded. Those whose value is considerably below average (thank you very much for the suggestion to incorporate the standard deviation) would be those that are shaded.
I am analyzing the values every 15 minutes, ie.
7:00
7:15
7:30
7:45
8:00
etc.
In this way I will be able to have a daily, then monthly, and finally annual trend, understanding that the position of the sun and the angle of incidence of solar radiation has seasonal variation so that the modules that receive shade in summer may not be the same as in winter.
Jul 09 2022 01:56 PM
Jul 10 2022 01:41 PM
Jul 10 2022 02:32 PM
Jul 11 2022 08:00 AM
Solution@Valentinaampuero So I go the sample files (May-Nov of 2021). I am attaching the Nov data here. In this file I added a tab (DailyAvg) where I calculate the average for every :15 sec slot for each panel across all the days recorded. I then calculate the average (col LED) and std (col LEE) for each :15sec slot and added conditional formatting to highlight all the spots <1std (orange) and <2 std (red). Finally at the bottom I did a count for how many times that particular panel was found to be <1 STD and <2 STD on average. I hope this might be helpful and can be a template for the other months. Maybe even collate these summary counts across months for even better indicator of performance across the year.
Jul 11 2022 08:25 AM
@mtarler Wow! you did in one hour what i couldn't do in the whole weekend hahaha
I'm looking at the file right now and trying to understand the formulas and the results.
Thank you so much!
Jul 11 2022 08:27 AM
@mtarler you can not use the standard deviation method unless the mean, median and mode of your dataset - in this case your DailyAvg values - are the same. The deviation method relies on the basic business statistics assumption that the mean, median and mode are the same. And, if they are, then you can use the standard deviation method to make assumptions about values that are not normal. In this case the following box plot method based on the median must be used: https://boxplot-outlier-data-analysis-templates.sellfy.store/box-plot-graph-statistics/. This approach uses the more stable median, not mean (average) as its starting point and then uses percentile cutoffs (eg. 25th percentile, 50th percentile, 75 percentile, interquartile range (IQR) to then determine what values are not normal. This particular example dataset could be approached using a template that I developed for multiple columns of data. Here is a link to a similar case study that I developed on Humber River water levels data using the median approach: https://boxplot-outlier-data-analysis-templates.sellfy.store/p/case-study-humber-river-water-levels-.... You can download this free Excel-worksheet to see how my algorithm would approach the same problem. Thanks
Jul 11 2022 08:29 AM
@Valentinaampuero please see my reply below to @mtarler
Jul 11 2022 08:45 AM
Jul 11 2022 09:08 AM
@Valentinaampuero That is correct. And, if I understand correctly that this data is related to solar power capture, then all of the 0's before 7:00am in the morning in your dataset are also irrelevant: since, it simply makes sense that you will not capture solar energy when it is dark outside. Even still, without the impact of these 0's, yes, my median approach is more stable and does not rely on normality. Hope this helps.
Jul 11 2022 09:39 AM
Jul 11 2022 09:57 AM
Jul 11 2022 10:03 AM
Jul 11 2022 10:21 AM
@cool2021 @mtarler Thank you guys for the interest and the help. I now have better tools to solve this problem.
But...
I don't think I was clear enough.
The panels have a mounting system that moves following the trajectory of the sun to take better advantage of the radiation. However, this movement is programmed through the "tracker" system. Due to misconfiguration, there are certain panels that shade each other, which occurs at sunrise and sunset.
That said, I need to find those panels that shadow each other (via the current data) so I can fix the tracking configuration.
Panels that have lower (how much lower?) sunrise and sunset current values should be those that are misconfigured. However, we should not overlook the fact that clouds can pass through and also affect the current values.
Jul 11 2022 03:09 PM
@Valentinaampuero . I am clear. My process will catch this. Attached is a sample run of the first column of raw data from your original spreadsheet through an early version of the free template example that you downloaded from my site: https://boxplot-outlier-data-analysis-templates.sellfy.store/p/free-findoutlier-boxplot-analysis-tem...
I eliminated the blank cells and 0's in the first column of raw data. And, then I clicked a button in my template that you can not here but can if you download the free template: you see the resulting new column of data (Outlier-Flag) beside your original data. You can now analyze your raw data against this flag to get a sense of each segment difference and how the groupings were made by the algorithm. The flag is based on the median approach to finding outliers but you can see that High or Low segment values in the Outlier-Flag column might be the values you are looking for in records re: the impact of shadows on your solar capture process.
Thanks.
Jul 13 2022 06:22 AM
@mtarler I believe the core issue is the number of columns in the spreadsheet (8,245 columns to reflect the total number of panels), and not the granularity of the data. Take a look at the attached report after running my Boxplot Analysis process on the first 10 records of both your Daily Average summarized data (DailyAVGOut worksheet tab) and the raw data (RawOUT worksheet tab): you will find that the granular, raw data yields more robust results. The issue, now, is that the dataset must be organized better so that one or two rows of panels (fewer columns) is being analyzed at one time to make the analysis run quicker.
Jul 13 2022 07:32 AM
Jul 13 2022 08:35 AM