VBA Optimization help for Large string manipulation related to memory management?

Contributor

TLDR;

The issue I need to solve.

How do I optimize the code, so the runtime for each 250 is approximately the same?

i.e, about say 15 secs? consistently from 0 - 35,000 conversations.

and even if I re-run the same macro, it'll take about the same time, and not keep increasing as the code keep running.

 

What is my code about?

As a teacher (I am not but as an example), I have say 100 students, whom I have told to provide screenplay writing, which I have in each cell.

 

So for example in cell A1, the data is

Mary: good morning

John: Hello!, nice to meet you

Mary: The weather is great today, do you want to go to the movies and hang out?

John: Yes, I would like that.

 

and cell a2, the data is another large screenwriting play.

 

Now my macro creates 12 columns, each with unique useful info, like what are all the stuff Mary said, who spoke first, how many time did each person speak, what happened at the end, the last three sentences used, etc etc.

 

With the help online, and by asking questions here, with the help of you guys. I finally did it!!! Yay!.

 

The macro now runs for 25,000 such plays, gathering a frick-ton of information for my work.

 

What I did and observed with my VBA Code,

Now is the runtime problem, and I assume memory management problem.

So this code works great for about 500-1000 conversations.

but it started taking SUPER long.

 

I thus wrote a timer. (sample code)

Dim StartTime As Double
Dim LapTime As Double
Dim PrevTimer As Double

Cells(2, 9).Value2 = Beginrow & " - " & Lastrow & " is Done - " & Format((Timer - StartTime) / 86400, "hh:mm:ss")

 

This helped me know the time difference between each set of conversations.

What I did was make a for loop, and take 250 conversations at a time.

(sample code)

While Beginrow < Total_Conv_Count
If Beginrow + 250 > Total_Conv_Count Then
Lastrow = Total_Conv_Count
Else
Lastrow = Beginrow + 250
End If

Call My_CustomScript(Beginrow, Lastrow)

Cells(2, 9).Value2 = Beginrow & " - " & Lastrow & " is Done - " & Format((Timer - StartTime) / 86400, "hh:mm:ss")

Wend

 

I noticed when I first boot my Laptop.

For 250 conversations, it takes 11 secs

For 2,000 conversations, it takes 1.30 secs

 

Kay. I stopped it (ESC key), and made a change to VBA, etc etc.

Ran the VBA code again.

 

For 250 conversations, it takes 32 secs

For 2,000 conversations, it takes 3.24 secs

 

At around 8,000 conversations on my first run, it started taking 2 mins for every 250 conversations.

At around 9,000 conversations, the same run, the VBA code started taking 5 mins for every 250 conversations.

 

Great, I needed to Optimize.

So I went to this website https://vbacompiler.com/optimize-vba-code/

As well as googled a lot. I did what they all recommended, which is basically the same.

 

Now I use the status bar updating, cause I disabled everything else.

Application.StatusBar = Beginrow & " - " & Lastrow & " is Done - Time Elapsed: " & Format((Timer - StartTime) / 86400, "hh:mm:ss")

 

Cool, it runs faster now (or so I thought)

Cause I had just rebooted and restarted my laptop.

 

I find, that although faster, it still increases in time.

So the first run is faster than the third (re-running the same macro)

and the first 1,000 conversations is faster than the next subsequent 1,000

 

The times I recorded are, in seconds (for each 250)

11 - 13 - 15 - 16 - 17 - ............ - (at 2,500) 1min - 1.30min - 2min .......

 

Sigh, no progress. idky why.

I also tried

Set Search_Range = Nothing

Erase data_array

 

at the end of each private subroutine, for each 250, once done. I will clear the memory. No luck.

 

 

The issue I need to solve.

How do I optimize the code, so the runtime for each 250 is approximately the same?

i.e, about say 15 secs? consistently from 0 - 35,000 conversations.

and even if I re-run the same macro, it'll take about the same time, and not keep increasing as the code keep running.

 

Workaround solution I am currently using.

Split the 30,000 conversations into 5,000 each, thus making 6 sheets in the same workbook.

Run the macro for the first sheet.

Then reboot the Laptop, open the Excel workbook again, and do the second sheet - Repeat.

Finally, once all is done, copy and paste all 6 sheets into a single sheet by manually appending the data.

 

Clearly, I want to avoid this workaround.

8 Replies
If you want us to comment on the performance of your code we need, well, your code. And some nonsense but relevant sample data.

@Sandeeep 

 

I'm adding the code. (but removing almost every sensitive data I can) 

Also, a related question of mine here, solve my Hans (file share was also something he made)

https://techcommunity.microsoft.com/t5/excel/obtain-data-from-a-cell-with-lots-of-paragraphs/m-p/328...

 

Sub SomethingCode()
' Main MasterCode Macro
' Obtains
' Main Macro
'
' Keyboard Shortcut: Ctrl+Shift+E
'

' SomeCode for Data Extraction .... with Colors and everything! Yay!

ActiveSheet.Range("A1").Select
Dim Beginrow As Integer
Dim Lastrow As Integer

Dim StartTime As Double
Dim MinutesElapsed As String
StartTime = Timer

Total_Conv_Count = Cells(Rows.Count, "A").End(xlUp).Row - 1
Beginrow = 1
Columns("I:I").ColumnWidth = 28
Cells(2, 9).Value2 = "First 250 Starting"
Cells(3, 9).Value2 = "100 Conversations Takes 1min"

Call SomethingCode_Headers

While Beginrow < Total_Conv_Count
    If Beginrow + 250 > Total_Conv_Count Then
        Lastrow = Total_Conv_Count
    Else
        Lastrow = Beginrow + 250
    End If
    
    Call SomethingCode_Script(Beginrow, Lastrow) ' SubRoutine for the Script
    
    'ThisWorkbook.Save
    Cells(2, 9).Value2 = Beginrow & " - " & Lastrow & " is Done!"
    Beginrow = Lastrow + 1
    Cells(4, 9) = "Time Elapsed: " & Format((Timer - StartTime) / 86400, "hh:mm:ss")
Wend

 Cells(3, 9).Value2 = "Now Coloring and Beautifying"
 Call ColorFormatting ' Subroutine for Making the whole thing Beautiful
 Cells(5, 9) = "Time Completed: " & Format((Timer - StartTime) / 86400, "hh:mm:ss")
 ActiveSheet.Range("I2").Select
    
End Sub

 

 

Then I call the Sub-routine Private SomethingCode()

(Ton of info is removed, I have used ?????????? to mask stuff)

 

Sub SomethingCode_Script(Beginrow As Integer, Lastrow As Integer)
'
' Obtains Data Extracted 
' Main 
'
'
'

ActiveSheet.Range("A1").Select
Dim midText() As String
Lastrow = Cells(Rows.Count, "A").End(xlUp).Row - 1
Set Search_Range = Range(Range("H1").Offset(Beginrow, 0), Range("H1").Offset(Lastrow, 0))

'Set idk_Range = Range(Range("AA3"), Range("AA3").End(xlDown)) ' Has to be after the cells are populated

' Code Block for Step Counter of Conversations Count - Column JKL
For Each cell In Search_Range
    cell(1, 3).Value2 = UBound(Split(cell, "||")) + 1 ' - Column J
    cell(1, 4).Value2 = UBound(Split(cell, "||?????????")) + 1 ' - Column K
    cell(1, 5).Value2 = cell(1, 3).Value2 - cell(1, 4).Value2 ' - Column L

' Empty the cells if we re-run the same Macro
    If IsEmpty(cell(1, 3).Value2) = False Then ' If IsEmpty(cell(1, 19).Value2) = False Or IsEmpty(cell(1, 18).Value2) = False Then
        cell(1, 8).Clear
        cell(1, 13).Clear
        cell(1, 15).Clear
        cell(1, 18).Clear
        cell(1, 19).Clear
    End If

' Code Block for ????????????? - Column R
    If InStr(1, cell, "?????????????") <> 0 Then
        cell(1, 11) = "!!!!!!!!!!!!!!!!!!"
    ElseIf InStr(1, cell, "@@@@@@@@@@@@@@@@@") <> 0 Then
        cell(1, 11) = "###########"
    ElseIf InStr(1, cell, "$$$$$$$$$$$$$$$$$$") <> 0 _
        Or InStr(1, cell, "^^^^^^^^^^^^^") <> 0 Then 
            cell(1, 11) = "(((((((((((((((" 
    Else
        cell(1, 11) = "%&%$%^$^%$%^&^*&^(&"
    End If
Next cell

' Code Block for Iterating over every Sentence in a conversation.
For Each cell In Search_Range
    midText = Split(cell, "||")
    Counter = 0
    SnippetStart = 0

' Code Block for First Sentence via delimiter. - Column NYZ
    For i = LBound(midText) To UBound(midText)
        If InStr(1, midText(i), "?????????????") > 0 Then
            cell(1, 19).Value2 = cell(1, 19).Value2 & ":" & midText(i) & vbLf ' ?????????????
        Else
            cell(1, 18).Value2 = cell(1, 18).Value2 & ":" & midText(i) & vbLf ' All ?????????????
            If Counter = 0 Then ' If Block for First sentence
                cell(1, 7) = midText(i)
                Counter = 1
            End If
        End If

' Find ????????????? - Columns TV
        If InStr(1, midText(i), "WHAT!", vbTextCompare) > 0 Then ' what came After sentence to WHAT!
            cell(1, 13).Value2 = cell(1, 13).Value2 & midText(i - 1) & vbLf
        ElseIf InStr(1, midText(I), "????????????? Ton of ElseIFs for different statements removed")
                cell(1, 15).Value2 = cell(1, 15).Value2 & midText(i - 1) & vbLf ' previous sentence
        End If
    Next
    
' Code Block ????????????? - Columns UW
    If UBound(Split(cell(1, 13).Value2, vbLf)) > 0 Then
        cell(1, 14) = UBound(Split(cell(1, 13).Value2, vbLf))
    Else
        cell(1, 14).Clear
    End If
    If UBound(Split(cell(1, 15).Value2, vbLf)) > 0 Then
        cell(1, 16) = UBound(Split(cell(1, 15).Value2, vbLf))
    Else
        cell(1, 16).Clear
    End If
    

' Code Block for Last 3 sentences-ish kinda ????????????? - Column OPQ
    If InStr(1, cell, "?????????????") = 0 Then
        For i = UBound(midText) To LBound(midText) Step -1 ' Reading the Array backwards
            If InStr(1, midText(i), "?????????????") = 0 Then
                If InStr(1, midText(i - 1), "?????????????") > 0 Then ' Checking if the word ?????????????is there, if not, it is ?????????????
                    SnippetStart = i - 1 ' Then this is where the snippet starts
                    Exit For ' So exit the for loop, we have the starting point
                End If
            End If
        Next
    Else
        For i = UBound(midText) To LBound(midText) Step -1 ' Reading the Array backwards
            If InStr(1, midText(i), "?????????????") > 0 Then
                For j = i To LBound(midText) Step -1 ' Reading the Array backwards ?????????????
                    If InStr(1, midText(j), "?????????????") = 0 Then
                        If InStr(1, midText(j - 1), "?????????????") > 0 Then ' ?????????????
                            SnippetStart = j - 1 ' Then this is where the snippet starts
                            Exit For ' So exit the for loop, we have the starting point
                        End If
                    End If
                Next
            End If
        Next
    End If

    For i = SnippetStart To UBound(midText)
        cell(1, 8).Value2 = cell(1, 8).Value2 & midText(i) & vbLf ' Final Snippet
        cell(1, 9).Value2 = midText(SnippetStart) ' ????????????? what caused the ending
        cell(1, 10).Value2 = midText(SnippetStart + 1) ' ????????????? Response
    Next
    
' Code Block for Date & Week Manipulation for student submission
    '(not yet added)
   
' End the Iteration
Next cell

' Memory Management
Set Search_Range = Nothing
Erase midText


End Sub

 

ColorFormatting subroutine is fine, no problems there.

and SomethingCode_Headers() is just a subroutine to add text to row1

Private Sub SomethingCode_Headers()


' Code Block for Header Titles
Range("I1").Value2 = "null - 2 Progress Bar"
' Tons of others in the same way here, till cell AH1

End sub

 

Apologizes for most stuff being removed. Hope this gives an idea-ish?

Kay, kinda sent it.
I have to remove tons of stuff tho. But should give a rough idea.

@Sandeeep 

If you need this speeded up, the best thing to do is process everything in memory first, by using arrays. Then after it has all been worked through, push out the entire array to the worksheet in a single line of code. This may be orders of magnitudes faster that what you currently have.

But a very simple first step would be to turn off screen-updating and calculation prior to your loop and turn those back on afterwards:

'...Code
Application.ScreenUpdating = False
Application.Calculation = xlCalculationManual
'Loop here
Application.ScreenUpdating = True
Application.Calculation = xlCalculationAutomatic
'Some more code

@Jan Karel Pieterse 

 

Right, I did try that 

Private Sub OptimizedMode(ByVal enable As Boolean)
     Application.EnableEvents = Not enable
     Application.Calculation = IIf(enable, xlCalculationManual, xlCalculationAutomatic)
     Application.ScreenUpdating = Not enable
     Application.EnableAnimations = Not enable
     ' Application.DisplayStatusBar = Not enable
     Application.PrintCommunication = Not enable
End Sub

 

From the website here.

https://vbacompiler.com/optimize-vba-code/

 

The idea you are providing with arrays, is point 6 from this blog post correct?

https://techcommunity.microsoft.com/t5/excel/9-quick-tips-to-improve-your-vba-macro-performance/m-p/...

 

Dim vArray As Variant
Dim iRow As Integer
Dim iCol As Integer
Dim dValue As Double
vArray = Range("A1:C10000").Value2 ‘ read all the values at once from the Excel cells, put into an array 

For iRow = LBound(vArray, 1) To UBound(vArray, 1)
  For iCol = LBound(vArray, 2) To UBound(vArray, 2)
    dValue = vArray (iRow, iCol)
    If dValue > 0 Then 
      dValue=dValue*dValue ‘ Change the values in the array, not the cells 
    vArray(iRow, iCol) = dValue
  End If
Next iCol
Next iRow
Range("A1:C10000").Value2 = vArray ‘ writes all the results back to the range at once

 

So, I haven't tried this yet.

I'm confused with this part from above

For iRow = LBound(vArray, 1) To UBound(vArray, 1)
  For iCol = LBound(vArray, 2) To UBound(vArray, 2)
    dValue = vArray (iRow, iCol)

 

With regards to Lbound(vArray, 1) & again the loop Lbound(vArray, 2)

What is 1 & 2 referring to?

and when we put them back with 

Range("A1:C10000").Value2 = vArray

 

is (vArray, 1) referring to A1? & the second loop is B1?

i.e a1 a2 a3 a4... then b1 b2 b3 b4....

 

So is C column not being looked at? but I'mma assume it is?

 

I understood the examples here

https://learn.microsoft.com/en-us/office/vba/language/reference/user-interface-help/lbound-function

 

and here,

https://www.automateexcel.com/vba/ubound-lbound-function/

 

but didn't understand what and where I need to modify.

as is.

should I make vArray = Search_Range( entire H column) with tons of conversations,

then, cell = split(vArray(iRow,iCol), "||") then run my for loops?

 

if so, how am I putting the values back into the 25 columns the specific data needs to be entered into?

Cause, I am using cell(1, 20) etc, to reference which column I need to put it in.

 

and idk how to loop and assign properly.

1 refers to the rows, 2 refers to the columns. So the example loop runs through all rows and for each row through all columns. I agree your macro is quite a lot of work to convert to one that uses arrays.
I wonder whether this could be pulled off using Data, From Range/Table, as you seem to be splitting up much of the texts into different columns.
Hi, So, I'm not as convinced that it's an issue with the code.
I did change Instr to InstrB, cause I have no need for the positional value.

I ran the code, for small data sets, up to 2000, which is excellent in under 2mins.
I wrote a functional LapTime to update the Application status bar.

Up to 2000, each set of 250 conversations takes about 5 secs.
Running the same macro the fifth time, for no reason whatsoever,
Now causes up to 2000, each set of 250 conversations takes about 15 secs.

This is still fine.
After 2000 or 5000 conversations,
The lap time for each set of 250 conversations, suddenly increases to 31secs, then 54secs for the next set.
Then 1.21 mins for the next sequential set.

The issue I'm trying to solve is WHY?
WHY the consistency till a certain point? Why can't I have the same speed for all 30k conversations?
What is happening at the 2,000 or 5,000 mark that drastically changes the speed and runtime?
And how to mitigate this? or stop it?
One other thing to try is to avoid reading the value of cell many, many times.
At the top of the routine, add:
Dim cellValue as Variant
Then immediately after For Each cell In Search_Range
add:
cellValue = cell.Value2
Now everwhere in the loop, where you're retreiving the value of the cell variable, replace cell with cellValue.
For example, this line:
midText = Split(cell, "||")
should be changed to
midText = Split(cellValue, "||")