.NET's ThreadPool Class - SPIRO THE TECH GURU

www.spiroprojects.com

My next installment of the Application Automation Layer requires a thread pool to manage worker threads. On investigating .NET's ThreadPool class, I discovered it is not quite what I had in mind. To quote from Manisha Mehta's article "Multithreading Part 4: The ThreadPool, Timer classes and Asynchronous Programming Discussed" found
As you start creating your own multi-threaded applications, you would realize that for a large part of your time, your threads are sitting idle waiting for something to happen...
This is true only for a subset of threads, for example when an I/O completion event occurs and the thread is released. In many cases I require the ability to create threads that perform some work in the background while still allowing the user to interact with the application. And in specific, I'd like the ability to explore what I'll call "competitive threads", that is, adjusting thread priorities based on different factors. Think of it as "quality of service" for threads.
The article also states:
But remember that at any particular point of time there is only one thread pool per process and there is only one working thread per thread pool object...[A thread pool] has a default limit of 25 threads per available processor...
These statements seem contradictory, implying that only one thread can be executing per process, but that there are 25 threads available, all of which is very confusing. After investigating the code behind the ThreadPool, I have found that this statement is "sort of" true but also lead to some other discoveries.
The rest of this article discusses my observations. For purposes of understanding some of the numbers discussed, keep in mind that I am running these tests on a 1.6Ghz P4 single processor system. Also note that for all of these tests I am using a timer event set up to time out after one second. The timer event sets a flag which the thread monitors. When the thread sees the flag is set, it terminates itself. The code for the timer event is:

static void OnTimerEvent(object src, ElapsedEventArgs e)
{
  done=true;
}


How High Can I Count?
How high can I count in 1 second? Roughly 10,277,633. Using a System.Timers.Timer object set to trigger after one second, the the code happily counts until the timer expires.

static decimal SingleThreadTest()
{
  done=false;
  decimal counter=0;
  timer.Start();
  while (!done)
  {
    ++counter;
  }
  return counter;
}


Why Use Thread Pooling?
Here's the reason to use thread pooling.
 This time, I'm going to see how high I can count creating and 
destroying a thread for every increment of the counter. Here's the code:
 

static void CreateAndDestroyTest()
{
  done=false;
  timer.Start();
  while (!done)
  {
    Thread counterThread=new Thread(new ThreadStart(Count1Thread));
    counterThread.Start();
    while (counterThread.IsAlive) {};
  }
}

static void Count1Thread()
{
  ++count2;
}

And the answer is: 11 Yes. ELEVEN. In one second, my machine was 
able to create and destroy eleven threads. Obviously, if I have an 
application that needs to process lots of asynchronous, non I/O 
completion type of events, thread creation and destruction is a very 
expensive way to go. Hence the need for a thread pool. 

First, A Benchmark
Before testing the performance of a thread 
pool, a benchmark is useful. I created one by simply instantiated 10 
counting threads. Each thread increments its own counter, and at the end
 of one second, exits. Here's the code: 

// initialize counters
static void InitThreadPoolCounters()
{
  threadDone=0;
  for (int i=0; i<10; i++)
  {
    threadPoolCounters[i]=0;
  }
}

// initialize threads
static void InitThreads()
{
  for (int i=0; i<10; i++)
  {
    threads[i]=new Thread(new ThreadStart(Count2Thread));
    threads[i].Name=i.ToString();
  }
}

// start the threads
static void StartThreads()
{
  done=false;
  timer.Start();
  for (int i=0; i<10; i++)
  {
    threads[i].Start();
  }
}

// the thread itself
static void Count2Thread()
{
  int n=Convert.ToInt32(Thread.CurrentThread.Name);
  while (!done)
  { 
    ++threadPoolCounters[n];
  }
  Interlocked.Increment(ref threadDone);
}

...and the code that actually puts it all together: 



...
InitThreadPoolCounters();
InitThreads();
StartThreads();
while (threadDone != 10) {};
...

The resulting count for each thread is: 
T0 = 957393
T1 = 1003875
T2 = 934912
T3 = 1004638
T4 = 988772
T5 = 962442
T6 = 979893
T7 = 777888
T8 = 923105
T9 = 982427
Total = 9515345 


Within a reasonable margin of error, the 10 separate threads 
total to the same value as determined earlier by the single application 
thread counter. This therefore is our benchmark for the performance of a
 thread pool. 



Using The ThreadPool
Now, let's see what happens when I use .NET's ThreadPool object: 

static void QueueThreadPoolThreads()
{
  done=false;
  timer.Start();
  for (int i=0; i<10; i++)
  {
    ThreadPool.QueueUserWorkItem(new WaitCallback(Count3Thread), i);
  }
}

static void Count3Thread(object state)
{
  int n=(int)state;
  while (!done)
  {
    ++threadPoolCounters[n];
  }
  Interlocked.Increment(ref threadDone);
}

The test, which is suppoed to run for only 1 second, takes something 
like 30 seconds to complete! And when it does complete, the counts are 
ridiculously high, indicating that the timer event never fired. To 
understand this, we have to dive into the 
sscli\clr\src\vm\win32threadpool.cpp code. Let's look first at ThreadPool.QueueUserWorkItem(). 






This function puts the worker thread delegate into a queue and tests 
whether a new thread should be created or not. If a new thread should be
created, it calls the CreateWorkerThread function and 
exits. Conversely, if the a thread is not to be created at this point, a
different thread, "CreateThreadGate" is created, if it doesn't already 
exist. The purpose of the thread gate is to periodically check and see 
if the thread can be created at a later time. 


The ShouldGrowWorkerThread function performs a test of 3 parameters to determine whether ornot a new thread should be created. 






Note the the very first thing this function tests is to see if the 
number of running threads is less than the number of available CPU's. 
Obviously, this function will return false when there are one or more 
running threads on a single CPU system. When this is the case, (as per 
the flowchart) the thread gate is utilized to create the thread at a 
later time. I'll get to that shortly. 


The CreateWorkerThread function is pretty much just a stub to instantiate the actual workerthread: 






The worker thread sits in a wait loop awaiting a 
WorkRequestNotification event, timing out after 40 seconds if the event 
isn't signalled. Assuming the event is signalled, execution continues by
 removing the delegate from the queue (which places the event in the 
unsignalled state), testing whether a valid delegate was actually 
obtained, and then invoking the delegate. When the delegate returns, the
 worker thread immediately checks to see if there are additional 
requests queued. If there are, it processes those requests, and if not, 
it returns to the wait state. 


From this code alone, a paramount fact of a ThreadPool managed thread is revealed: Do Your Work Quickly. Any time spent by your worker thread will result in delays when processing other items in the queue. 


Now let's inspect the gate thread by looking at the GateThreadStart flowchart: 






The first thing of note is that this function goes to sleep for 1/2 a 
second. After this delay, it performs a test to determine whether there 
are any thread requests in the queue. If not, it goes back to sleep. If 
so, it calls into a function to determine whether a thread should be 
initiated. This function delays creation of a thread based on the time 
the last thread was created and the number of currently running threads.
By inspecting this table: 








it is interesting to note that can take 5.4 seconds for the 25th 
thread to be instantiated. Furthermore, because of the Sleep(500) call, 
these times end up being quantized to 500ms intervals when threads are 
created in rapid succession. For example, if two threads are created in 
succession, the second thread will take 1000ms as the requisite 550ms 
will not have transpired and the function returns to the sleep state. 


This section of the ThreadPool emphasizes the need to get in and 
out of your worker thread as quickly as possible so as to avoid the 
bottleneck that will occur if there are several concurrent threads 
running. 





Timers And Waitable Objects
Timers callback using the 
ThreadPool. Therefore, if you want fairly reliable timers, they and your
 worker threads need to be short. Similarly, any waitable object, such 
as I/O completion events, that are managed by the ThreadPool are also 
affected by the implementation of the other threads in your application.
 



What Are The Alternatives?
As I mentioned at the beginning of 
this article, not all threads are created for the same purpose. While 
the ThreadPool is useful for managing threads that are usually in a wait
 state and that take only a short amount of time to do their work, 
.NET's ThreadPool class is a very poor choice for managing threads in 
situations that do not meet this criteria. 


Fortunately, Stephan Stoub at Microsoft has written a ManagedThreadPool class that is designed for this second type of thread requirement. Using it is identical to .NET's ThreadPool: 




static void QueueManagedThreadPoolThreads()
{
    done=false;
    timer.Start();
    for (int i=0; i<10; i++)
    {
        Toub.Threading.ManagedThreadPool.QueueUserWorkItem(
                          new WaitCallback(Count3Thread), i);
    }
}

And as the following test numbers illustrate: 
T0 = 970806
T1 = 996123
T2 = 914349
T3 = 990998
T4 = 977450
T5 = 957585
T6 = 951259
T7 = 770934
T8 = 982279
T9 = 1135806
Total = 9647589 


it performs very well. The advantage of Stoub's ManagedThreadPool
 class is that all threads are initially created and assigned when 
needed. There is no complex holdoffs of thread creation, making this 
thread pool suitable for threads of both types.