Jul 09 2022 05:45 AM
Hi everyone
I am working with large amounts of current data from a photovoltaic plant and I need to find those current values that deviate from the average and therefore are producing less than the rest. I only know that there is less production at sunrise and sunset because at that time the pv panels are shaded and the value of current produced decreases.
I have data every 15 minutes from 2021 to 2022.
I appreciate if anyone could help me with this.
Thanks in advance
Valentina
Jul 09 2022 09:41 AM
Jul 09 2022 11:57 AM
@mtarler thank you for your answer.
I agree with you that there will be multiple below average current values which does not mean that all those PV panels are shaded. Those whose value is considerably below average (thank you very much for the suggestion to incorporate the standard deviation) would be those that are shaded.
I am analyzing the values every 15 minutes, ie.
7:00
7:15
7:30
7:45
8:00
etc.
In this way I will be able to have a daily, then monthly, and finally annual trend, understanding that the position of the sun and the angle of incidence of solar radiation has seasonal variation so that the modules that receive shade in summer may not be the same as in winter.
Jul 09 2022 01:56 PM
Jul 10 2022 02:32 PM
Jul 11 2022 08:00 AM
Solution@Valentinaampuero So I go the sample files (May-Nov of 2021). I am attaching the Nov data here. In this file I added a tab (DailyAvg) where I calculate the average for every :15 sec slot for each panel across all the days recorded. I then calculate the average (col LED) and std (col LEE) for each :15sec slot and added conditional formatting to highlight all the spots <1std (orange) and <2 std (red). Finally at the bottom I did a count for how many times that particular panel was found to be <1 STD and <2 STD on average. I hope this might be helpful and can be a template for the other months. Maybe even collate these summary counts across months for even better indicator of performance across the year.
Jul 11 2022 08:25 AM
@mtarler Wow! you did in one hour what i couldn't do in the whole weekend hahaha
I'm looking at the file right now and trying to understand the formulas and the results.
Thank you so much!
Jul 11 2022 08:29 AM
@Valentinaampuero please see my reply below to @mtarler
Jul 11 2022 08:45 AM
Jul 11 2022 09:08 AM
@Valentinaampuero That is correct. And, if I understand correctly that this data is related to solar power capture, then all of the 0's before 7:00am in the morning in your dataset are also irrelevant: since, it simply makes sense that you will not capture solar energy when it is dark outside. Even still, without the impact of these 0's, yes, my median approach is more stable and does not rely on normality. Hope this helps.
Jul 11 2022 09:39 AM
Jul 11 2022 09:57 AM
Jul 11 2022 10:03 AM
Jul 11 2022 10:21 AM
@cool2021 @mtarler Thank you guys for the interest and the help. I now have better tools to solve this problem.
But...
I don't think I was clear enough.
The panels have a mounting system that moves following the trajectory of the sun to take better advantage of the radiation. However, this movement is programmed through the "tracker" system. Due to misconfiguration, there are certain panels that shade each other, which occurs at sunrise and sunset.
That said, I need to find those panels that shadow each other (via the current data) so I can fix the tracking configuration.
Panels that have lower (how much lower?) sunrise and sunset current values should be those that are misconfigured. However, we should not overlook the fact that clouds can pass through and also affect the current values.
Jul 13 2022 06:22 AM
@mtarler I believe the core issue is the number of columns in the spreadsheet (8,245 columns to reflect the total number of panels), and not the granularity of the data. Take a look at the attached report after running my Boxplot Analysis process on the first 10 records of both your Daily Average summarized data (DailyAVGOut worksheet tab) and the raw data (RawOUT worksheet tab): you will find that the granular, raw data yields more robust results. The issue, now, is that the dataset must be organized better so that one or two rows of panels (fewer columns) is being analyzed at one time to make the analysis run quicker.
Jul 13 2022 07:32 AM
Jul 13 2022 08:35 AM
Jul 13 2022 09:21 AM
@mtarler one other point: Microsoft's colour scaling criteria are not based on any test of statistical significance, my process is: https://support.microsoft.com/en-us/office/highlight-patterns-and-trends-with-conditional-formatting...
So, you may see patterns and trends using the Microsoft scales: but you will not identify statistically significant (important) events that are occurring. And, my process will, regardless of data scale and what type of data it is: $, #, % every single time. That is the key difference. When you see any shading using my process, it is statistically significant: always.
Thanks
Aug 02 2022 04:32 AM
@mtarler , the attached workbook has another run of a sample of photovoltaic plant data through my box plot outliers time series algorithm. From the original dataset, I took a sample of 45 solar panels and ran the 15-minute interval photovoltaic rays capture data through my algorithm that identified a number of time series segments. If you look at the final column on the 'Data' tab, this represents the number of panels (maximum of 45) during a specific 15-minute interval period that had a 'Low' photovoltaic data capture as classified by the algorithm. If you then go to the 'Pivot' tab of the worksheet, you will see a pivot table summarizing - within each hour of the day - the % of photovoltaic panels with 'Low' capture. For example, between 8 and 9am, there were 4 readings of data (four 15 minute intervals). And, you can see that one of those readings indicated that all of the 45 panels in this sample were catching 'Low' photovoltaic rays. Between 10am and 4pm, 100% of solar panels were capturing the optimal level of photovoltaic rays across all of the 15 minute intervals. But, after 4pm, you then start to see a higher percentage of readings where at least one or more of the 45 solar panels being measured are capturing 'Low' photovoltaic rays. And, incidentally, even if you were to change the raw data so the time intervals are columns and panels are rows, you will still find a similar result.
Aug 08 2022 06:11 AM
@Valentinaampuero I actually found a way to efficiently run a report on all 8,245 solar panels data at once. Really works well.
Jul 11 2022 08:00 AM
Solution@Valentinaampuero So I go the sample files (May-Nov of 2021). I am attaching the Nov data here. In this file I added a tab (DailyAvg) where I calculate the average for every :15 sec slot for each panel across all the days recorded. I then calculate the average (col LED) and std (col LEE) for each :15sec slot and added conditional formatting to highlight all the spots <1std (orange) and <2 std (red). Finally at the bottom I did a count for how many times that particular panel was found to be <1 STD and <2 STD on average. I hope this might be helpful and can be a template for the other months. Maybe even collate these summary counts across months for even better indicator of performance across the year.