Chris Haas's Blog

June 25, 2009

Creating a simple multi-threaded VB.Net application

Filed under: Uncategorized — Tags: — chrishaas @ 11:41 am

I write lots of desktop tools that help me do my job more efficiently. For instance, I’m normalizing a 98 million row live table and do to the nature of the server, table locks, etc, I’ve decided to pull the data down in batches using a VB.Net app, analyze it and perform a batch update. The high-level code is simple: pull the data, group it and then execute some UPDATE SQL statements. But 98 million rows in a partially non-indexed table takes a looooong time, and maybe its my agency background, but I want some feedback as to what’s going on, and I want more than just a Trace/Console output. At first glance you might just add a progress bar and update it every time you do something, but unfortunately your code loop will block the UI thread from updating until complete (although you can throw some Application.DoEvents in there).

So enter multi-threading. If you think this is overkill then a) you obviously don’t know me and b) I’ve been doing this for years, it only takes a little bit more work to add it. The code below will show you how to create a multi-threaded app with a responsive UI. The code below is all done in Visual Basic 2008 Express Edition.

1) Create a new VB.Net Windows Forms Application.
2) Add a ProgressBar and a Button to the form, leave the names the default names
3) Resize the ProgressBar so you can see more of it
4) Double-click the Button to go into code-behind
5) Add a new class to the project called “Worker” and add the following code to it:


Option Explicit On
Option Strict On

Public Class Worker
    Private _curRun As Integer
    Private _maxRuns As Integer
    Public ReadOnly Property CurRun() As Integer
        Get
            Return Me._curRun
        End Get
    End Property
    Public ReadOnly Property MaxRuns() As Integer
        Get
            Return Me._maxRuns
        End Get
    End Property
    Public Sub New(ByVal maxRuns As Integer)
        Me._maxRuns = maxRuns
    End Sub
    Public Sub Start()
        For Me._curRun = 1 To Me._maxRuns
            System.Threading.Thread.Sleep(250)
        Next
    End Sub
End Class

This code will be doing our primary work. In this case its not actually doing anything except sleeping, you’ll want to replace the System.Threading.Thread.Sleep(250) with your own logic, obviously. We’ll instantiate this code in our primary UI thread later, telling it how many times we want it to loop. I always prefer to pass my required variables in the constructor and make them read-only properties so that I know that they’ll be set. One other thing that you’ll notice, usually in the For loop you’d do something like For I = 1 to Me._maxRuns, creating a local counter variable. Instead, our code creates a class-level variable and gives read-only permission to it so that other code (on another thread) can determine where we’re at. That’s it from the worker class’s point of view. The only bit of thread-related code in here is the Sleep method and that’s for example purposes only.

The next step is to add some logic to the main form’s code-behind to create our worker, create a thread to execute the worker and create a thread to monitor the worker and report back to the form and changes. Before I slap the code up let me explain how it will work. After clicking the button the form’s primary (and only, initially) thread will create an instance of the Worker class. It will then create a new Thread object and tell it that when we execute it it should call the Worker’s Start() method. This alone would give you a responsive UI but unfortunately the Worker thread can’t safely update the UI because they’re on different threads (and VS 2008 will actually throw errors if you try). You can get around this using events or delegates in the worker object but I prefer to make my worker object 100% completely thread un-aware, meaning that it doesn’t report back to anyone, it just does the work and if someone wants to inspect its progress, fine, but it doesn’t really care. So how do we do this? We introduce a third thread whose only job is to check the status of the worker object and tell the main thread to update the progress bar. The code is so simple that I don’t even bother wrapping it in a separate class, I just include a Monitor() method in the form’s code-behind. The code runs a loop until the worker thread signals that its done, sleeping for a bit between each loop so that we don’t flood the UI with update calls (there’s no sense in sending “set the progress bar to 1″ a million times a second). Threads automatically handle adjusting their ThreadState so we don’t have to worry about that.

One of the things that takes a little bit of getting used to is the concept that one thread cannot modify controls on another thread. So our Monitor() method can’t actually update the UI. Instead what it does is ask the UI thread to update the UI on behalf of the monitor thread. I know, sounds weird. To do this, our UpdateUI method first calls Me.InvokeRequired which basically asks the form to check if the code calling this method is on the same thread. If its on the same thread it just adjust the progress bar. If its not on the same thread it ask the form to call the same method with the same parameters on its behalf using the Invoke() method. Invoke takes two parameters although the second is optional. The first is a delegate which is basically a type-safe function pointer. If you were using Javascript this would be the same as passing the function’s name without parentheses or just using the function(){} syntax. The second is an array of objects that represent the parameters to pass to the method. The syntax may look confusing but that’s just VB’s way of declaring an array. I think they’re fixing that in VB10 but I’m not sure. We also use a similar concept when the main thread is done to re-enable the button. And that’s it.


Option Explicit On
Option Strict On

Imports System.Threading

Public Class Form1
    Private MonitorThread As Thread
    Private WorkerThread As Thread
    Private W As Worker
    Private Delegate Sub UpdateUIDelegate(ByVal curIndex As Integer)
    Private Delegate Sub WorkerDoneDelegate()

    Private Sub Button1_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button1.Click
        Me.Button1.Enabled = False                      'Disable the button
        W = New Worker(50)                              'Create our Worker object
        Me.ProgressBar1.Maximum = W.MaxRuns             'Set the maximum value of the progress bar
        WorkerThread = New Thread(AddressOf W.Start)    'Create our Worker thread and tell it that when we start it it should call our Worker's Start() method
        MonitorThread = New Thread(AddressOf Monitor)   'Create our Monitor thread and tell it that when we start it it should call this class's Monitor() method
        WorkerThread.Start()                            'Start the worker thread
        MonitorThread.Start()                           'Start the monitor thread
    End Sub
    Private Sub Monitor()
        Do While WorkerThread.ThreadState <> ThreadState.Stopped    'Loop until the Worker thread (and thus the Worker object's Start() method) is done
            UpdateUI(W.CurRun)                                      'Update the progress bar with the current value
            Thread.Sleep(250)                                       'Sleep the monitor thread otherwise we'll be wasting CPU cycles updating the progress bar a million times a second
        Loop
        WorkerDone()                                                'If we're here, the worker object is done, call a method to do some cleanup
    End Sub
    Private Sub UpdateUI(ByVal curIndex As Integer)
        If Me.InvokeRequired Then                                                           'See if we need to cross threads
            Me.Invoke(New UpdateUIDelegate(AddressOf UpdateUI), New Object() {curIndex})    'If so, have the UI thread call this method for us
        Else
            Me.ProgressBar1.Value = curIndex                                                'Otherwise just update the progress bar
        End If
    End Sub
    Private Sub WorkerDone()
        If Me.InvokeRequired Then                                                           'See if we need to cross threads
            Me.Invoke(New WorkerDoneDelegate(AddressOf WorkerDone))                         'If so, have the UI thread call this method for us
        Else
            Me.Button1.Enabled = True                                                       'Otherwise just update the button
        End If
    End Sub
End Class

One last thing. Debugging a multi-threaded program can be a PITA so one of the things that I do is include a switch to turn it on or off as needed. I prefer to do this with a Compiler Directive by adding #Const MULTI_THREADED = True to the top of my code-behind. Then there’s just a simple change to the button’s click event:


    Private Sub Button1_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button1.Click
        Me.Button1.Enabled = False                      'Disable the button
        W = New Worker(50)                              'Create our Worker object
#If MULTI_THREADED Then
        Me.ProgressBar1.Maximum = W.MaxRuns             'Set the maximum value of the progress bar
        WorkerThread = New Thread(AddressOf W.Start)    'Create our Worker thread and tell it that when we start it it should call our Worker's Start() method
        MonitorThread = New Thread(AddressOf Monitor)   'Create our Monitor thread and tell it that when we start it it should call this class's Monitor() method
        WorkerThread.Start()                            'Start the worker thread
        MonitorThread.Start()                           'Start the monitor thread
#Else
        W.Start()
        Me.Button1.Enabled = True
#End If
    End Sub

When MULTI_THREADED is False we’ll invoke the worker, blocking the UI thread from ever updating, do the work and reset the button. This can save you a lot of time when debugging. This is also why I like to keep my worker object thread un-aware.

Fixing “query processor ran out of stack space”

Filed under: Uncategorized — Tags: , , — chrishaas @ 9:07 am

I’ve been normalizing a table with 98 million rows for a couple of days now and with a recent change I’ve been able to run my batches in groups of 100,000 instead of just 10,000. The program doing the normalization pulls down 100,000 records at a time, analyzes and groups them and performs an UPDATE for each item in the group, passing the primary keys back to the server in an IN () clause. Everything was fine for a while but with the upgrade to 100,000 and with some groupings sending thousands of keys in the IN clause I started to get the following error:

Internal Query Processor Error: The Query Processor Ran Out Of Stack Space During Query Optimization.

Microsoft says to insert the keys into a temp table and join with that but that doesn’t really work for me. Instead I just decided to break up the UPDATE query into ones with more manageable IN clauses.

Originally I had something like:


            Dim mySQL = String.Format("UPDATE Tbl SET Col2=xyz WHERE Col1 IN ({0})", Join(TheList.ToArray(), ","))
            Using Com As New SqlCommand(mySQL, con)
                Com.CommandType = CommandType.Text
                Com.ExecuteNonQuery()
            End Using

The new code breaks things into more manageable sizes:


        Dim AmmountToProcess = 1000
        Do While True
            Dim Y = TheList.Take(AmmountToProcess)
            Dim mySQL = String.Format("UPDATE Tbl SET Col2=xyz WHERE Col1 IN ({0})", Join(Y.ToArray(), ","))
            Using Com As New SqlCommand(mySQL, con)
                Com.CommandType = CommandType.Text
                Com.ExecuteNonQuery()
            End Using
            TheList.RemoveRange(0, Math.Min(TheList.Count, AmmountToProcess))
            If TheList.Count = 0 Then Exit Do
        Loop

The new code uses the wonderful Take extension method to grab 1,000 items. Take luckily doesn’t error if you ask for more than available. The smaller UPDATE query runs and then we remove the first 1,000 items (or however many are left) from the list. Repeat until the list is empty.

June 24, 2009

Make sure your SQL queries are using the proper index

Filed under: Uncategorized — Tags: , — chrishaas @ 11:00 am

I’ve got a live table with 98 million rows that I’m normalizing. Unforuntately all full batch updates tend to lock the system for a very long so I wrote a VB.Net program to perform updates in smaller batches. The program pulls 10,000 records that haven’t been updated by using the standard SELECT TOP 10000… syntax. Unfortunately the WHERE portion of this query takes a really long time to run. I started digging into things and tried just SELECT TOP 1… and that returned results immediately. I then tried SELECT TOP 10… and that returned immediately, too. After playing around I found that SELECT TOP 63… was the most I could go before the query took a long time to execute. I have no idea what’s so special about 63 but it was bugging me. So I ran SELECT TOP 63… and SELECT TOP 64… side-by-side in SQL SMS with the execution plan turned on and found the 63 version was using an Index Seek on the index that was on the column in my WHERE clause. This is a good thing. The 64 version, however, was performing a Clustered Index Scan on the primary key which had nothing to do with my WHERE clause.

The fix for this is to force SQL to use the index of your choice and its so easy to do. In your query, before the WHERE clause just add WITH (INDEX(INDEX_NAME)).

So if this is your original query:

SELECT TOP 10000 Col1, Col2 FROM YourTable WHERE Col3 IS NULL

And you have an index on Col3 called IDX__YourTable__Col3 you’d execute this query instead:

SELECT TOP 10000 Col1, Col2 FROM YourTable WITH (INDEX(IDX__YourTable__Col3)) WHERE Col3 IS NULL

Normally you shouldn’t have to do this but if you’re doing one-off things like I’m doing it comes in handy.

–Update:

The forced index is running so fast that I’ve actually changed my batch size from 10,000 to 100,000. The prior 10,000 was processing about 400 records/sec, most of the time spent trying to get results from the SELECT TOP… but with the forced index and changing to 100,000 I’m now averaging 3,500 records/sec. Nice.

June 10, 2009

RegEx for parsing FTP LIST command

Filed under: Uncategorized — Tags: — chrishaas @ 8:22 am

Need to parse the FTP LIST command output? I did. Here’s what I came up with, its based on another post that didn’t quite work for me. The code below works specifically on the FilZilla FTP Server which I was targetting.


^(?<dir>[\-ld])(?<permission>([\-r][\-w][\-xs]){3})\s+(?<filecode>\d+)\s+(?<owner>\w+)\s+(?<group>\w+)\s+(?<size>\d+)\s+(?<timestamp>((?<month>\w{3})\s+(?<day>\d{2})\s+(?<hour>\d{1,2}):(?<minute>\d{2}))|((?<month>\w{3})\s+(?<day>\d{2})\s+(?<year>\d{4})))\s+(?<name>.+)$

June 9, 2009

Email Marketing Click-Through-Rate (CTR)

Filed under: Uncategorized — Tags: — chrishaas @ 3:23 pm

Apparently instead of comparing the number of unique people that clicked an email campaign to the number of email that didn’t bounce (sometime called the number delivered) people are now comparing them to the number of opens. So in the past we did:

unique clicks ÷ (sent – bounced) = CTR

Instead, now people are doing:

unique clicks ÷ unique opens = CTR

If that formula looks familiar to you its probably because its what we used to call the “Click-to-open-ratio” or CTO. I’m curious when this happened.

If you’re wondering why it matters, imagine an email that went to 100 people with no bounces. 10 people opened the email and 1 person clicked. Under the old formula we’d say that this email had a 1% CTR. Under the new formula we get a 10% CTR. If you moved from one email vendor to another and they each used a different formula you’d be seeing a tenfold increase (or decrease) without any of the base numbers actually changing.

So the moral? Make sure you run the numbers yourself so you know what the ratios mean.

Blog at WordPress.com.