I have a VB.NET application that I use to load various files into a RichTextBox and then scan through the document to find specific words. It's similar to the Find function in Word. The app was running fine until a 5,150 line .sql document run through it and it's taking upwards of 10 minutes to run to completion.
Can anyone recommend a better way of coding it than I have below?
If sqlText.Contains("GRANT") Then
Dim searchstring As String = "GRANT"
Dim count As New List(Of Integer)()
For i As Integer = 0 To rtbFile.Text.Length - 1
If rtbFile.Text.IndexOf(searchstring, i) <> -1 Then
For i As Integer = 0 To count.Count - 1
rtbFile.SelectionBackColor = Color.Yellow
rtbFile.SelectionFont = New Font(rtbFile.Font, FontStyle.Bold)
Catch ex As Exception
rtbFile.SelectionBackColor = Color.White
rtbFile.SelectionFont = New Font(rtbFile.Font, FontStyle.Regular)
This can also be done with a While loop and RichTextBox.Find():
Dim searchstring As String = "GRANT" Dim index As Integer = rtbFile.Find(searchstring, 0, RichTextBoxFinds.None) While index <> -1 rtbFile.Select(index, searchstring.Length) rtbFile.SelectionBackColor = Color.Yellow rtbFile.SelectionFont = New Font(rtbFile.Font, FontStyle.Bold) index = rtbFile.Find(searchstring, index + searchstring.Length, RichTextBoxFinds.None) End While
That first loop is killing the performance, you are calling
IndexOf for every character in the string. Also the two loops can be merged in to one. Change it to:
rtbFile.SelectionBackColor = Color.Yellow rtbFile.SelectionFont = New Font(rtbFile.Font, FontStyle.Bold) For Each m As Match in Regex.Matches(sertbFile.Text, searchstring) rtbFile.Select(m.Index, searchstring.Length) Next
You've got a few bad things going on here:
First, the following code:
For i As Integer = 0 To rtbFile.Text.Length - 1 If rtbFile.Text.IndexOf(searchstring, i) <> -1 Then count.Add(rtbFile.Text.IndexOf(searchstring, i)) End If Next
This is looping through every character in your string, and calling
IndexOf on the entire string from that point forward. So your 50,000-character string is running
IndexOf 50,000 times, on large strings.
You only need to call
IndexOf as many times as you find a string. When your string is found, you increment your start index to that point, and keep searching only from that point.
Next thing, this code:
For i As Integer = 0 To count.Count - 1 ... count.RemoveAt(i) Next
RemoveAt line is unnecessary. You're already looping through a list, so you don't need to remove the items as you go along. The way it stands, your loop will skip every other item in your list.
Whoops. I missed a very important point about the IndexOf (and incorrectly assumed it was fed with the end of the last match). See Magnus's answer.
I am not sure where the bottleneck is (and it might very well be from setting the selection itself), but here are my suggestions, roughly in order of priority:
Invoke rtbFile.Text once to avoid any roundtrips to underlying control (perhaps a native Windows control?) and use a variable to store the resulting string. Once the string is obtained in .NET, just keep using it directly unless/until the text may change. If the control is native then a lot of work may be required to simply "get the text".
Use normal item iteration over the count collection (not indexing) and do not remove from the front-of the List when assigning the selections. Removing from the front of a List is "expensive" in that it must shift all items down internally. Also, removing the element is unneeded here and dubious at best: since the collection being modified is also is being iterated which likely leads to incorrect behavior (skipped items), regardless of performance.
Only call IndexOf once per loop and use a variable to avoid a duplicate search. This likely won't have an overall impact, but it does avoid some "extra" work. IndexOf itself is fine and doesn't need to be replaced.