Generic Thread.VolatileRead and Thread.VolatileWrite

I am thinking about using a generic version of Thread.VolatileRead and Thread.VolatileWrite. The code is similar to existing one (actually there are plenty of overloads) with the difference of generic type parameter:

[MethodImpl(MethodImplOptions.NoInlining)]
public T VolatileRead<T>(ref T address)
{
    T result = address;
    Thread.MemoryBarrier();
    return result;
}

[MethodImpl(MethodImplOptions.NoInlining)]
public void VolatileWrite<T>(ref T address, T value)
{
    Thread.MemoryBarrier();
    address = value;
}

Why would I use generics if there is plenty of overloads already there? Because you have to perform casting to and from object since there is a single overload that accepts reference (ref object address that is). Here is an example:

Tubo tubo = new Tubo();
object param = tubo;
Tubo read = (Tubo)Thread.VolatileRead(ref param);

Isn’t the following code much better?

Tubo tubo = new Tubo();
Tubo read = MyThread.VolatileRead(ref tubo);

The questions is whether my code is correct or not. Sure it looks like correct but one never knows for sure when dealing with threading. Feedback appreciated.

A managed path to DirectCompute

About DirectCompute

After NVidia Cuda, OpenCL we got DirectX’s version (of GPGPU parallelism for numerical calculations) named DirectCompute. All three technologies are very similar and have one goal: to provide some sort of API and shader language for numerical calculations that GPGPU is capable of. And if you use it wisely the performance gains over normal CPU are huge, really huge. Not just that you offload CPU but the calculations due to GPGPU massive multithreading (and multicore) are much faster compared to even the most advanced x64 CPU.

It works like this: you prepare enough data that can be worked on in parallel, send it to GPGPU, wait till GPGPU finished processing the data and fetch the results back to the CPU where your application can use them. You can achieve up to a TFLOP with a graphics card sold today.

I guess this information is enough to see what is all about. For more information on the topic check out  Cuda and OpenCL. I’d suggest to check out the DirectCompute web site but the reality is that I couldn’t find any official (or non-official) page that deals with it (it is the youngest of the three but still that could be more online information, don’t you think). However you might check the DirectCompute PDC 2009 presentation here and a simple tutorial.

Of the three I am most interested in DirectCompute because:

  1. Cuda is tied to NVidia GPGPU while OpenCL and DirectCompute aren’t
  2. I prefer using DirectX over OpenGL for a simple reason – I did work for a client using DirectX so I am more familiar with it

I have a NVidia silent GeForce 9600 series graphic card (cards from GeForce 8 and up are supported) and the good news is that NVidia has been supporting CUDA for a while and recently introduced support for DirectCompute as well. OpenCL is also supported. I don’t know the current state of support for ATI cards.

There is a potential drawback if you go with DirectCompute though. It is a part of DirectX 11 and will run on Windows 7, Windows 2008 and Vista SP2 (somebody correct me). The problem is if you have an older Windows OS like Windows XP or you want support older Windows. In that case you might consider alternatives.

Goal

The not that good aspect of a young technology such as DirectCompute is lack of samples, documentation and tutorials are almost nonexistent. What’s even worse for a .net developer as I am is apparent lack of managed support. Which turns to be there to some extent through Windows API Code Pack (WACP) managed code wrappers. Note, WACP isn’t very .netish such as Managed DirectX was but it is good enough.

So I tried the managed approach for BasicCompute11 C++ sample (that comes with DirectX SDK) through WACP and here are the steps to make it run. It should give you a good starting point if you are DirectComputing through managed code and even if you are using native code. Hopefully somebody will benefit from my experience.

BasicCompute11 sample is really a simple one. It declares a type holding an int and a float, creates two arrays of those with an incremental value ( {0, 0.0}, {1, 1.1}, {2, 2.2}, ….{8191, 8191.0}) and adds them together to a third array.

I’ll be working on Windows 7 x64 with NVidia GeForce 9600 graphics card but will create a Win32 executable.

Steps

1. Install the latest NVidia graphics drivers. Current version 195.62 works well enough.

2. Install Windows 7 SDK. It is required because WACP uses its include files and libraries.

3. Make sure that Windows 7 SDK version is currently selected. You’ll have to run [Program Files]\Microsoft SDKs\Windows\v7.0\Setup\SDKSetup.exe. Note that I had to run it from command prompt because it didn’t work from UI for some reason.

4. Install DirectX SDK August 2009 SDK.

  • If it doesn't exist yet then declare an environment variable DXSDK_DIR that points to [Program Files (x86)]\Microsoft DirectX SDK (August 2009)\ path.
    image

5. Install Windows API Code Pack. Or better, copy the source to a folder on your computer. Open those sources and do the following:

  • Add $(DXSDK_DIR)Include path to Include folder
    image
  • And $(DXSDK_DIR)Lib\x86 to libraries
    image

After these settings the project should compile. However, there are additional steps, because one of the methods in there is flawed. Let’s correct it.

  • Open file Header Files\D3D11\Core\D3D11DeviceContext.h and find method declaration 
    Map(D3DResource^ resource, UInt32 subresource, D3D11::Map mapType, MapFlag mapFlags, MappedSubresource mappedResource);
    (there are no overloads to Map method). Convert it to
    MappedSubresource^ Map(D3DResource^ resource, UInt32 subresource, D3D11::Map mapType, MapFlag mapFlags);
  • Open file Source Files\D3D11\Core\D3D11DeviceContext.cpp and find the (same) method definition
    DeviceContext::Map(D3DResource^ resource, UInt32 subresource, D3D11::Map mapType, D3D11::MapFlag mapFlags, MappedSubresource mappedResource)
    Replace the entire method with this code:
    MappedSubresource^ DeviceContext::Map(D3DResource^ resource, UInt32 subresource, 
    D3D11::Map mapType, D3D11::MapFlag mapFlags) { D3D11_MAPPED_SUBRESOURCE mappedRes; CommonUtils::VerifyResult(GetInterface<ID3D11DeviceContext>()->Map( resource->GetInterface<ID3D11Resource>(), subresource, static_cast<D3D11_MAP>(mapType), static_cast<UINT>(mapFlags), &mappedRes)); return gcnew MappedSubresource(mappedRes); }

DirectX wrappers we need are now functional. Compile the project and close it.

6. Open DirectX Sample Browser (you’ll find it in Start\All Programs\Microsoft DirectX SDK (August 2009) and install BasicCompute11 sample somewhere on the disk.

image

7. Open the sample in Visual Studio 2008 and run it. The generated console output should look like this:

image Pay attention to the output because if DirectX can’t find an adequate hardware/driver support for DirectCompute it will run the code with a reference shader – using CPU not GPGPU (“No hardware Compute Shader capable device found, trying to create ref device.”) or even won’t run at all.

If the sample did run successfully it means that your environment supports DirectCompute. Good for you.

8. Now try running my test project (attached to this article) that is pure managed C# code. It more or less replicates the original BasicCompute11 sample which is a C++ project. The main differences with the original are two:

  • I removed the majority of device capability checking code (you’ll have to have compute shader functional or else…)
  • I compile compute shader code (HLSL –> FX) at design time rather at run time. This is required because there are no runtime compilation managed wrappers in WACP because those are part of D3DX which isn’t part of the OS but rather redistributed separately. So you have to rely on FXC compiler (part of DirectX SDK). A note here: I’ve lost quite a lot of time (again!) figuring out why FXC doesn’t find a suitable entry point to my compute shader code when compiling (and errors out). I finally remembered that I’ve stumbled upon this problem quite a long time ago: FXC compiles only ASCII(!!!!!) files. Guys, is this 2009 or 1979? Ben 10 would say. “Oh man.”
    Anyway, the FX code is included with project so you won’t have to compile it again. This time the output should look like this:
    image

I didn’t document my rewritten BasicCompute11 project. You can search for comments in the original project.

Conclusion

Compute shaders are incredibly powerful when it comes to numerical calculations. And being able to work with them through managed code is just great. Not a lot of developers will need their power though, but if/when they’ll need them the support is there.

As for final word: I miss documentations, tutorials and samples. Hopefully eventually they’ll come but for now there is a good book on CUDA (remember, the technologies are similar) – the book comes in a PDF format along NVidia CUDA SDK.

Have fun DirectComputing!

DirectComputeManaged.zip (2.05 mb)

Want to try Parallel Extensions on .net 3.5?

Check out Reactive Extensions to .NET (Rx). Looks like it includes “a back ported (and unsupported) release of Parallel Extensions for the .NET Framework 3.5 in the form of System.Threading.dll”. So, if you don’t have Visual Studio 2010 beta handy you might check it out and let us know how it goes. While you are there make sure you check out Rx as well as it looks an interesting and useful library once you grasp its concepts.

See the related blog post.

Jinxing your application

If you ever wrote a multithreading application you should understand how hard is to get it right. If you don’t understand it then your application most probably isn’t written correctly.

You’ve written a multithreaded application and now what. How can you test it whether it is written correctly or not. Unit testing won’t be of great help because there are complex currency issues that might manifest in a bug only under certain circumstances. Imagine this piece of code:

class Program
{
    static int x;
    static Random rnd = new Random();

    static void Main(string[] args)
    {
        List<Thread> threads = new List<Thread>();
        for (int i = 0; i < 4; i++)
        {
            Thread t = new Thread(Runner) { IsBackground = true };
            t.Start();
            threads.Add(t);
        }
        foreach (Thread t in threads)
        {
            t.Join();
        }
    }

    static void Runner()
    {
        for (int i = 0; i < 1000000; i++)
        {
            Thread.Sleep(rnd.Next(10));
	   int orig = x;
            x += 5;
            Debug.Assert(x == orig+5);        
} } }

I am increasing a shared static variable from multiple threads without any synchronization. Will it work? It might or might not (try it!). It depends on when different threads are accessing the variable x. One thing is for sure, this code isn’t correct and it most probably won’t work and for sure it won’t work always. That is the biggest problem with multithreading – if something works it doesn’t mean that it it is correctly written and that it will work always. More about this later. If you use unit testing to test runner method the test will pass because unit testing doesn’t test multithreading, at least not easily.

So, how does one test such code and scenarios. One way is to use a static analysis tool. The other way is to put a jinx on your application. Once your application is jinxed it will be much more prone to displaying concurrency and other multithreading errors. And that’s exactly what Jinx does. Behind the scenes it makes your application fail more often that it would fail in normal circumstances. I mean that it shows faults in the application (if any) that would otherwise remain hidden and would occur only randomly here and there (you know, your user will find it after 2 minutes of running the application) – it doesn’t fail your application for no reason, it just emphasizes your bugs.

The most interesting aspect of Jinx is its the way it works. Jinx is a sort of hypervisor. You certainly know Hyper-V hypervisor whose task is to run guest operating systems. Jinx’s task is to make a clone of your OS and debugged application within and run multiple versions of it under various conditions. This is done so that any multithreading error is more likely to appear. Just by running more versions of the same application the error is more likely to manifest itself. But Jinx throws all sort of other jinxes to your application as well. That’s the shallow explanation. You’ll find more on official overview page and FAQ page.

So, let’s jinx the code above. Note that if you don’t want to you don’t need any modification of an existing code. Jinx can be set to do its task on any newly run application. However, some fine control is a better way to go. Here are the simple steps for selective jinxing:

  1. Reference jinxinterface assembly. It contains a single static JinxInteface class with a bunch of static methods and serves as a communication bridge between application and Jinx.
  2. Add jinx.cs file to your project. Again it contains a single static Jinx class that is a wrapper around JinxInteface class mentioned above. Its purpose is mostly to apply Conditional("DEBUG") over methods so they won’t get executed for non-debug version of the application.
  3. Call Jinx.RegisterApplication(); method at start of the application. This way you’ll let Jinx know that your application should use some jinxing.
  4. Replace Debug.Assert with Jinx.Assert. The only difference with both asserts is that the latter send statistical information to Jinx.

After the changes the code should look like this:

class Program
{
    static int x;
    static Random rnd = new Random();

    static void Main(string[] args)
    {
        Jinx.RegisterApplication();
        List<Thread> threads = new List<Thread>();
        for (int i = 0; i < 4; i++)
        {
            Thread t = new Thread(Runner) { IsBackground = true };
            t.Start();
            threads.Add(t);
        }
        foreach (Thread t in threads)
        {
            t.Join();
        }
    }

    static void Runner()
    {
        for (int i = 0; i < 1000000; i++)
        {
            Thread.Sleep(rnd.Next(10));
            int orig = x;
            x += 5;
            Jinx.Assert(x == orig + 5);
        }
    }
}

Before any analyzing takes place Jinx should be enabled and set – this is a system wide option. There are two ways to open the Jinx console – either through Tool/Jinx Visual Studio menu item or directly from All Programs via start menu. Either way you need Administrator privileges. Enabling is easy, just check Enable Jinx checkbox and that’s it. As per what programs are analyzed I’ll use “Analyze the most recent program I have registered.” option.

jinx

You can adjust some strategy settings on Strategy tab, I’ll skip this as it is an advanced option. And you can see the statistics on Status tab. Let’s run the application now. Jinx will kick in and the CPU will get under heavy load and the system might shutter due to Jinx running versions of the application in parallel. But the assert failure pops up almost immediately – the bug was caught for sure. If you run the application without jinxs there error would manifest much more later if at all and note that the test code in question is an extreme example. Here is the status tab page after a couple of errors caught:

status

The asserts observed counter is increased thanks to Jinx.Assert method call. Jinx analyzer isn’t limited to asserts and it might catch other type of errors as well according to PetraVM guys. Jinx is much more than this simple example.

So far I’ve run few examples like this and Jinx performed well and I think Jinx is a good weapon against multithreading bugs. However there might be a problem for testing. Since Jinx is a hypervisor on its own it won’t get along with other hypervisors such as Hyper-V. In other words forget about running Jinx on guest OS. A dedicated machine is required. Perhaps this issue will change in the future.

Also be careful when experimenting with Jinx as it is in beta phase right now (you can apply for testing over here). Running a beta hypervisor might result in a BSOD and all the consequences from BSOD such as non bootable Windows after. Which happened when I was writing this post. Perhaps this post is jinxed as well :-). Humor aside, I am sure guys behind Jinx will make it rock solid for the RTM. They obviously know very well the hypervisor craft.

Happy jinxing your applications!

.net reflector pro is awesome

.net reflector

I am sure we all know and love .net reflector originally developed by Lutz Roeder and took over by fine folks at Red Gate. If you don’t know what .net reflector is or what it does you must take a look. It is an indispensible tool for understanding how a certain assembly (i.e. from .net framework or 3rd party) works – .net reflector does that by disassembling assemblies into C#/VB/IL/whatever (whatever code is achieved through a right plugin) code you want. And based on all this information it can provide you a ton of useful data, i.e. who uses which type, what types are derived from a type, etc.

.net reflector pro

And now there is a PRO version with a kick ass feature – it lets you step through source code even for referenced assemblies without sources while debugging an application under Visual Studio 2005/2008/2010. The functionality is similar to using symbols for same purposes. But it is a lot better because you aren’t constrained to assembly vendor as vendor has to provide symbol files for you to debug them. .net reflector pro does the trick for any assembly, regardless of the origin. True, it is useless with obfuscated assemblies and it doesn’t provide comments but hey – AFAIK right now there is only Microsoft providing some, not all, symbol files. You are out of luck for other Microsoft and 3rd party assemblies in this case.

Let me tell you an example. I’ve been bugging Developer Express guys for a while to provide symbol files and they probably will, but who knows when because this task is a low priority one for them. Now, with .net reflector pro I don’t need those files anymore nor I need any other 3rd party vendor symbol files.

The question is why would I want this feature at all?

This question should be asked only by a beginner. Everybody that did some serious development knows how important is to understand how a certain feature in a certain assembly is working, specially when you are presented an exception dialog and you have to understand what happened, what went wrong. Normally if you try digging the call stack to show code some code below your methods

call stack

(double click on a method where you don’t have source code – almost all call stack is gray meaning there is no source code available) you are presented with this informative dialog:

no source code

Go ahead, if you are an assembler guy, click Show Disassembly. For mere mortals disassembler code is useless.

How to use .net reflector pro

Download early access program version from .net reflector pro forum, unzip it somewhere and run reflector.exe. By running the executable for at least once you will register the Visual Studio addin that integrates .net reflector pro into Visual Studio. Without the addin registered nothing will happen in Visual Studio.

Run Visual Studio. Take note that there is a new root menu entry - .NET Reflector. Next create a test application – mine will be a WinForms one hosting a single SimpleButton from DevExpress. I’ll implement its Click event and put a breakpoint there. No additional code.

Run the application. If you have turned on Just My Code debugging setting then you’ll be prompted by a .net reflector dialog like this:

turnoffjustmycode

You have to turn off Just My Code feature. This is the same restriction as with symbol files. Then you’ll be presented another dialog where you can select which assemblies are you interested in:

assemblyselection

Select all assemblies. Once you hit OK button .net reflector will start decompiling assemblies by using all available cores on the computer (note that there are many entries in “In progress” state running in parallel).

decompiling

This is the first application I use that actually use all of my Core i7 4+4 cores. Good job and very smart because decompiling so many assemblies even in parallel takes a around 10 minutes on my Core i7 920 computer. But don’t fear – the results are cached and next run there is almost no performance hit at the start. That’s it for configuration and single window application will finally run.

The application will present a default window with a single button. Click on the button and execution will hit a breakpoint you’ve put in button’s Click handler. Look at the call stack again - it isn’t gray anymore:

newcallstack

And double clicking on the BaseButton.OnClick call stack entry opens decompiled source code from DevExpress assembly with everything except the Edit&Continue feature.

kickass

Let me repeat. The code above comes from an assembly decompilation and it is almost as useful as normal source code in debugger. You can inspect the variables and step forth and back but, of course, you can’t change it. Or better, you can change it but it won’t be recompiled.

No more black boxes! Now, if this isn’t an awesome feature, a must must have one, I don’t really know what such a feature is.

There is a slight drawback though: pro version won’t be free, which isn’t a big surprise and not a big issue. The regular version will continue to be free but it won’t have the *feature*. Also currently, being in beta, there are some bugs here and there which should disappear by RTM I guess.

More (official) information on this Alex Davies' blog post.

.net reflector pro is awesome

.net reflector

I am sure we all know and love .net reflector originally developed by Lutz Roeder and took over by fine folks at Red Gate. If you don’t know what .net reflector is or what it does you must take a look. It is an indispensible tool for understanding how a certain assembly (i.e. from .net framework or 3rd party) works – .net reflector does that by disassembling assemblies into C#/VB/IL/whatever (whatever code is achieved through a right plugin) code you want. And based on all this information it can provide you a ton of useful data, i.e. who uses which type, what types are derived from a type, etc.

.net reflector pro

And now there is a PRO version with a kick ass feature – it lets you step through source code even for referenced assemblies without sources while debugging an application under Visual Studio 2005/2008/2010. The functionality is similar to using symbols for same purposes. But it is a lot better because you aren’t constrained to assembly vendor as vendor has to provide symbol files for you to debug them. .net reflector pro does the trick for any assembly, regardless of the origin. True, it is useless with obfuscated assemblies and it doesn’t provide comments but hey – AFAIK right now there is only Microsoft providing some, not all, symbol files. You are out of luck for other Microsoft and 3rd party assemblies in this case.

Let me tell you an example. I’ve been bugging Developer Express guys for a while to provide symbol files and they probably will, but who knows when because this task is a low priority one for them. Now, with .net reflector pro I don’t need those files anymore nor I need any other 3rd party vendor symbol files.

The question is why would I want this feature at all?

This question should be asked only by a beginner. Everybody that did some serious development knows how important is to understand how a certain feature in a certain assembly is working, specially when you are presented an exception dialog and you have to understand what happened, what went wrong. Normally if you try digging the call stack to show code some code below your methods

call stack

(double click on a method where you don’t have source code – almost all call stack is gray meaning there is no source code available) you are presented with this informative dialog:

no source code

Go ahead, if you are an assembler guy, click Show Disassembly. For mere mortals disassembler code is useless.

How to use .net reflector pro

Download early access program version from .net reflector pro forum, unzip it somewhere and run reflector.exe. By running the executable for at least once you will register the Visual Studio addin that integrates .net reflector pro into Visual Studio. Without the addin registered nothing will happen in Visual Studio.

Run Visual Studio. Take note that there is a new root menu entry - .NET Reflector. Next create a test application – mine will be a WinForms one hosting a single SimpleButton from DevExpress. I’ll implement its Click event and put a breakpoint there. No additional code.

Run the application. If you have turned on Just My Code debugging setting then you’ll be prompted by a .net reflector dialog like this:

turnoffjustmycode

You have to turn off Just My Code feature. This is the same restriction as with symbol files. Then you’ll be presented another dialog where you can select which assemblies are you interested in:

assemblyselection

Select all assemblies. Once you hit OK button .net reflector will start decompiling assemblies by using all available cores on the computer (note that there are many entries in “In progress” state running in parallel).

decompiling

This is the first application I use that actually use all of my Core i7 4+4 cores. Good job and very smart because decompiling so many assemblies even in parallel takes a around 10 minutes on my Core i7 920 computer. But don’t fear – the results are cached and next run there is almost no performance hit at the start. That’s it for configuration and single window application will finally run.

The application will present a default window with a single button. Click on the button and execution will hit a breakpoint you’ve put in button’s Click handler. Look at the call stack again - it isn’t gray anymore:

newcallstack

And double clicking on the BaseButton.OnClick call stack entry opens decompiled source code from DevExpress assembly with everything except the Edit&Continue feature.

kickass

Let me repeat. The code above comes from an assembly decompilation and it is almost as useful as normal source code in debugger. You can inspect the variables and step forth and back but, of course, you can’t change it. Or better, you can change it but it won’t be recompiled.

No more black boxes! Now, if this isn’t an awesome feature, a must must have one, I don’t really know what such a feature is.

There is a slight drawback though: pro version won’t be free, which isn’t a big surprise and not a big issue. The regular version will continue to be free but it won’t have the *feature*. Also currently, being in beta, there are some bugs here and there which should disappear by RTM I guess.

More (official) information on this Alex Davies' blog post.

Parallel computing in Visual Studio 2010/.net 4.0 slides

Just finished the presentation about Parallel computing in Visual Studio 2010/.net 4.0 at TŠC Nova Gorica (a part of Microsoft’s event for students). Audience was cool and almost everything went well, except for VS2010CTP crash at very beginning. Not a big problem, one has to expect such problems running CPT versions. Visual Studio restart fixed it.

All in all a good day for my presentation. Go get the slides here (slides are in Slovene language).