Demystifying The Code

Introducing Parallelism into your programs

Overview

A little over a month ago, I volunteered to develop a session for the Windows 7 / Windows Server R2 launches that are being delivered across the country and beyond  The session I worked on I affectionately call “An Introduction to Parallel Programming”.  In that session I wanted to illustrate various approaches you can take when introducing parallelism to your applications: 1) Fine-grained parallelism, 2) Structured parallelism  and 3) PLINQ.

The approach I chose was to start with an a sequential application and parallelize it using each of the 3 approaches.  The application I wrote simply creates a thumbnail image for every image in a specific directory.  This post provides a link to a screencast I did on this subject, provides a brief overview of each approach and illustrates how my sample program looks with each approach.

 

Screencast

Click here to view the screencast on channel9

 

Approaches for introducing parallelism into your applications

Fine-grained parallelism (aka imperative task parallelism)

This approach provides you with the most control.  Accordingly it requires the most work.  The following are some attributes of this approach:

  • Express potential parallelism via expressions and statements that take the form of lightweight tasks
  • Affords fine-grained control
  • The semantic is similar to how threads and the threadpool work today
  • You tend to use it when the other approaches do not suit your problem

Structured parallelism (imperative data parallelism)

Structured parallelism offers a higher level of abstraction.  With this approach you tend to think in terms of parallelizing ‘blocks of code’.  Daniel Moth did a great job of explaining this in his PDC 08 talk this way:  One thread goes into that block. Within that block, many threads can execute the code within the block and one thread comes out.  To this end, we provide a ‘Parallel’ type that allows you to easily parallelize for loops, foreach loops and arrays of actions.  The following are some attributes of this approach:

  • Mechanisms used to express common imperative data-oriented operations
    • For loops
    • Foreach loops
    • Invoke
  • Think in terms of blocks of code
  • Parallelize loops

PLINQ (Declarative Data Parallelism)

As a declarative approach, you simply define what you want done, not how you want it done.  You do this with PLINQ,  an implementation of LINQ To Objects.  PLINQ replaces LINQ to Objects extension methods with new ones that internally use parallelism.  In many cases, it is as simple as adding a call to the AsParallel extension method to your query.  The following are some attributes of this approach:

  • Implementation of LINQ-to-Objects that executes queries in parallel
  • Express what you want to accomplish, rather than how you want to accomplish it
  • Minimal impact to existing queries

    Illustrating the 3 Approaches

    In order to illustrate these 3 approaches, I decided to take a simple console application that was implemented sequentially and convert it to use each approach.  The sample application simply creates a thumbnail image for each image in a specified directory.  The process I will take is to show you the code for the sequential implementation, as well as each approach to parallelism.  I will then explain the code and lastly show you the timings for each approach.  I am running this on a dual core laptop using the Beta 1 of Visual Studio 2008 and .NET 4.  I will run the code 3 times for each approach. 

    * Please note that the results in this post should not be used to evaluate which of the 3 parallel approaches is generally more performant than the others.  That will be scenario driven.  The numbers should simply be used to illustrate the relative performance gains the parallel approaches provide over the sequential approach.

     

      The Sequential Version of the application

      The Code

      The following is the serial version of the application:

        static void Main(string[] args)
        {
            Stopwatch sw = Stopwatch.StartNew();
    
            CreateThumbnails();
    
            Console.WriteLine("Elapsed = " + sw.ElapsedMilliseconds.ToString());
    
            Console.ReadLine();
        }
    
        private static void CreateThumbnails()
        {
            string[] pics = Directory.GetFiles(imageDir, imageExt);
    
            foreach (string pic in pics)
            {
                CreateThumbnail(pic);
    
                Console.WriteLine(pic + TID);
            }
    
            Console.WriteLine("Done");
        }

    Code Summary

    As you can see, in CreateThumbnails, we create a string array of image paths.  We then iterate over that array, calling CreateThumbnail for each item, as well as writing out the path and the thread id of the executing thread to the console.  Lastly, we write out ‘Done’ to the console when complete.  The implementation of CreateThumbnail is relatively unimportant to the theme of this post.  It uses the Bitmap and Graphics types from the GDI+ libraries to create a thumbnail image (max height / width of 128 PX).  If you are interested, you can download the sample code here.

    Performance

    Time (in milliseconds)
    2274
    2289
    2286

     

    Fine Grained Parallelism

    The Code

       1:      private static void CreateThumbnails()
       2:      {
       3:          string[] pics = Directory.GetFiles(imageDir, imageExt);
       4:   
       5:          var tasks = new Task[pics.Length];
       6:          for (int i = 0; i < pics.Length; i++)
       7:          {
       8:              tasks[i] = Task.Factory.StartNew(state =>
       9:              {
      10:                  CreateThumbnail((string)state);
      11:                  Console.WriteLine(state + TID);
      12:              }, pics[i]);
      13:          }
      14:          Task.WaitAll(tasks);
      15:   
      16:   
      17:          Console.WriteLine("Done");
      18:      }

    Code Summary

    The first thing you might notice is that this code is a bit more involved.  If you watch the screencast I did on Channel9 on introducing parallelism into your applications, I describe in more detail how we get to this code from the sequential implementation.

    Again, the first thing we do is to create a string array of image paths.  The next thing we do (line 5) is to create an array of Tasks (the Task is the type we use to implement fine grained parallelism).  We will need this array later.  Next we iterate i number of times (once for each path in the array of image paths).  Now comes the interesting part.  On line 8, we call Task.Factory.StartNew, assigning the result to the appropriate item in the Task array.  StartNew is a helper method that is analogous to creating a Task and then calling Start.  In code terms, it is similar to doing the following, yet a bit more efficient:

                tasks[i] = new Task(state =>
                {
                    CreateThumbnail((string)state);
                    Console.WriteLine(state + TID);
                }, pics[i]);
                tasks[i].Start();

    You will notice that we are using an overload of StartNew where we explicitly pass the state into the Task using the state object (pics[i] is passed in line 12).  This allows us to ensure that the Task is, in fact, working on the image path that we intended it to.  If you are new to this, that may sound confusing.  Let’s re-write the code for a moment to illustrate what may happen if we didn’t use the state object (or a proper closure):

            string[] pics = Directory.GetFiles(imageDir, imageExt);
    
            foreach (string pic in pics)
            {
                Task t = Task.Factory.StartNew(() =>
                    CreateThumbnail(pic));
    
                Console.WriteLine(pic + TID);
            }
    
    
    

    At first glance, this code doesn’t appear to have any issues.  However, to quote Inigo Montoya from The Princess Bride “I do not think it means what you think it means”.  In this example, the pic variable is not captured correctly in the closure.  What this ultimately means is that by the time the Task wakes up and begins calling CreateThumbnail on the pic variable, the pic variable may no longer have the original value from the outer loop. 

    Getting back to the proper implementation, you will notice that in lines 10 and 11 both CreateThumbnail and the Console.WriteLine call are operating on the state object.  The last thing to point out is the call to Task.WaitAll in line 14.  As you can see, we pass the Task array we created in line 5 and assigned to in line 8.  This call will ensure that all of the Tasks will complete prior to the call to write ‘Done’ to the console.

    Performance

    Time (in milliseconds)
    1486
    1485
    1510

     

    As you can clearly see, this is a dramatic performance increase.  This is due to the fact that we were able to distribute the work across the 2 cores on my machine, instead of just one like the sequential implementation.  However, the code was a bit involved.  Let’s take a look at a couple of other options that provide a higher level of abstraction.

     

    Structured Parallelism

    The Code

       1:      private static void CreateThumbnails()
       2:      {
       3:          string[] pics = Directory.GetFiles(imageDir, imageExt);
       4:   
       5:          //foreach (string pic in pics)
       6:          Parallel.ForEach(pics, pic =>
       7:          {
       8:              CreateThumbnail(pic);
       9:   
      10:              Console.WriteLine(pic + TID);
      11:          });
      12:   
      13:          Console.WriteLine("Done");
      14:      }

    Code Summary

    The first thing you will notice is that this code is much simpler.  In fact, it is semantically similar to the sequential implementation.  Line 5 contains the original foreach construct that I commented out.  I replaced it with a call to Parallel.ForEach in line 6.  Notice how similar they are.

    The ForEach method takes the IEnumerable you want to iterate over as its first argument and a delegate as the second.  The Action delegate gets run for each iteration of the loop.  In our case, we pass a lamda that calls CreateThumbnail and Console.WriteLine.

    Performance

    Time (in milliseconds)
    1596
    1534
    1497

     

    Here we see similar performance increases over the sequential solution.

     

    PLINQ

    LINQ Code

       1:      private static void CreateThumbnails()
       2:      {
       3:          string[] pics = Directory.GetFiles(imageDir, imageExt);
       4:   
       5:          var output = pics.Select(pic =>
       6:          {
       7:              CreateThumbnail(pic);
       8:              return pic + TID;
       9:          });
      10:   
      11:          foreach (var line in output) Console.WriteLine(line);
      12:   
      13:   
      14:          Console.WriteLine("Done");
      15:      }

     

    PLINQ Code

       1:      private static void CreateThumbnails()
       2:      {
       3:          string[] pics = Directory.GetFiles(imageDir, imageExt);
       4:   
       5:          var output = pics.AsParallel().Select(pic =>
       6:          {
       7:              CreateThumbnail(pic);
       8:              return pic + TID;
       9:          });
      10:   
      11:          foreach (var line in output) Console.WriteLine(line);
      12:   
      13:   
      14:          Console.WriteLine("Done");
      15:      }
     

    Code Summary

    Think back to when you were a kid.  Remember the game where you were shown 2 pictures and you were tasked with identifying the differences between them.  In that spirit, the only difference between the LINQ and PLINQ code is the call to ‘AsParallel’.  By simply adding the call to this extension method, you have introduced parallelism into your solution (see description above).

    Performance

    LINQ PLINQ
    2181 1683
    2180 1533
    2240 1550

    Sample Code

    You can download the sample application for introducing parallelism into your applications here (you will have to register and login first).

     

    Acknowledgements

    I would like to thanks Steven Toub, Rick Brewster, Huseyin Yildiz, Chris Dern and Aaran Layman for their help answering my questions and offering guidance.  Further, it should be noted that this sample was inspired by and was based upon Daniel Moth’s PDC 2008 talk.

     

    Summary

    As you can see, there are various approaches to introducing parallelism into your applications, each with their own pros and cons.  Some provide greater control (Fine-Grained), while others provide a simpler model (Structured, PLINQ).  Hopefully one of them will meet your needs and allow your application to take advantage of the hardware it is running on.

    Comments

    4 Responses to “Introducing Parallelism into your programs”
    1. Mike says:

      Hi Rob, I just popped over from your MSDN blog as instructed, and just wanted to give you a heads up that it looks like your RSS feed for this site might be malformed

    2. Mike says:

      PS – I hope this move doesn’t have anything to do with “punting” your demo.

      ;-P

    3. RobBagby says:

      Funny – The change in blog domain doesn’t have anything to do with the punted demo. I am, however moving to London. I’m not sure whether I’ll be able to find a similar position in Microsoft, so to be safe, I am moving my blog to my own domain. It also gives me a bit more control to add some custom features…

    4. RobBagby says:

      I have fixed the RSS issue. Thanks for the heads up (Guy Ellis also informed me of the issue).

    Speak Your Mind

    Tell us what you're thinking...
    and oh, if you want a pic to show with your comment, go get a gravatar!

    Demystifying The Code