A managed path to DirectCompute

About DirectCompute

After NVidia Cuda, OpenCL we got DirectX’s version (of GPGPU parallelism for numerical calculations) named DirectCompute. All three technologies are very similar and have one goal: to provide some sort of API and shader language for numerical calculations that GPGPU is capable of. And if you use it wisely the performance gains over normal CPU are huge, really huge. Not just that you offload CPU but the calculations due to GPGPU massive multithreading (and multicore) are much faster compared to even the most advanced x64 CPU.

It works like this: you prepare enough data that can be worked on in parallel, send it to GPGPU, wait till GPGPU finished processing the data and fetch the results back to the CPU where your application can use them. You can achieve up to a TFLOP with a graphics card sold today.

I guess this information is enough to see what is all about. For more information on the topic check out  Cuda and OpenCL. I’d suggest to check out the DirectCompute web site but the reality is that I couldn’t find any official (or non-official) page that deals with it (it is the youngest of the three but still that could be more online information, don’t you think). However you might check the DirectCompute PDC 2009 presentation here and a simple tutorial.

Of the three I am most interested in DirectCompute because:

  1. Cuda is tied to NVidia GPGPU while OpenCL and DirectCompute aren’t
  2. I prefer using DirectX over OpenGL for a simple reason – I did work for a client using DirectX so I am more familiar with it

I have a NVidia silent GeForce 9600 series graphic card (cards from GeForce 8 and up are supported) and the good news is that NVidia has been supporting CUDA for a while and recently introduced support for DirectCompute as well. OpenCL is also supported. I don’t know the current state of support for ATI cards.

There is a potential drawback if you go with DirectCompute though. It is a part of DirectX 11 and will run on Windows 7, Windows 2008 and Vista SP2 (somebody correct me). The problem is if you have an older Windows OS like Windows XP or you want support older Windows. In that case you might consider alternatives.

Goal

The not that good aspect of a young technology such as DirectCompute is lack of samples, documentation and tutorials are almost nonexistent. What’s even worse for a .net developer as I am is apparent lack of managed support. Which turns to be there to some extent through Windows API Code Pack (WACP) managed code wrappers. Note, WACP isn’t very .netish such as Managed DirectX was but it is good enough.

So I tried the managed approach for BasicCompute11 C++ sample (that comes with DirectX SDK) through WACP and here are the steps to make it run. It should give you a good starting point if you are DirectComputing through managed code and even if you are using native code. Hopefully somebody will benefit from my experience.

BasicCompute11 sample is really a simple one. It declares a type holding an int and a float, creates two arrays of those with an incremental value ( {0, 0.0}, {1, 1.1}, {2, 2.2}, ….{8191, 8191.0}) and adds them together to a third array.

I’ll be working on Windows 7 x64 with NVidia GeForce 9600 graphics card but will create a Win32 executable.

Steps

1. Install the latest NVidia graphics drivers. Current version 195.62 works well enough.

2. Install Windows 7 SDK. It is required because WACP uses its include files and libraries.

3. Make sure that Windows 7 SDK version is currently selected. You’ll have to run [Program Files]\Microsoft SDKs\Windows\v7.0\Setup\SDKSetup.exe. Note that I had to run it from command prompt because it didn’t work from UI for some reason.

4. Install DirectX SDK August 2009 SDK.

  • If it doesn’t exist yet then declare an environment variable DXSDK_DIR that points to [Program Files (x86)]\Microsoft DirectX SDK (August 2009)\ path.
    image

5. Install Windows API Code Pack. Or better, copy the source to a folder on your computer. Open those sources and do the following:

  • Add $(DXSDK_DIR)Include path to Include folder
    image
  • And $(DXSDK_DIR)Lib\x86 to libraries
    image

After these settings the project should compile. However, there are additional steps, because one of the methods in there is flawed. Let’s correct it.

  • Open file Header Files\D3D11\Core\D3D11DeviceContext.h and find method declaration 
    Map(D3DResource^ resource, UInt32 subresource, D3D11::Map mapType, MapFlag mapFlags, MappedSubresource mappedResource);
    (there are no overloads to Map method). Convert it to
    MappedSubresource^ Map(D3DResource^ resource, UInt32 subresource, D3D11::Map mapType, MapFlag mapFlags);
  • Open file Source Files\D3D11\Core\D3D11DeviceContext.cpp and find the (same) method definition
    DeviceContext::Map(D3DResource^ resource, UInt32 subresource, D3D11::Map mapType, D3D11::MapFlag mapFlags, MappedSubresource mappedResource)
    Replace the entire method with this code:
    MappedSubresource^ DeviceContext::Map(D3DResource^ resource, UInt32 subresource, 
    D3D11::Map mapType, D3D11::MapFlag mapFlags) { D3D11_MAPPED_SUBRESOURCE mappedRes; CommonUtils::VerifyResult(GetInterface<ID3D11DeviceContext>()->Map( resource->GetInterface<ID3D11Resource>(), subresource, static_cast<D3D11_MAP>(mapType), static_cast<UINT>(mapFlags), &mappedRes)); return gcnew MappedSubresource(mappedRes); }

DirectX wrappers we need are now functional. Compile the project and close it.

6. Open DirectX Sample Browser (you’ll find it in Start\All Programs\Microsoft DirectX SDK (August 2009) and install BasicCompute11 sample somewhere on the disk.

image

7. Open the sample in Visual Studio 2008 and run it. The generated console output should look like this:

image Pay attention to the output because if DirectX can’t find an adequate hardware/driver support for DirectCompute it will run the code with a reference shader – using CPU not GPGPU (“No hardware Compute Shader capable device found, trying to create ref device.”) or even won’t run at all.

If the sample did run successfully it means that your environment supports DirectCompute. Good for you.

8. Now try running my test project (attached to this article) that is pure managed C# code. It more or less replicates the original BasicCompute11 sample which is a C++ project. The main differences with the original are two:

  • I removed the majority of device capability checking code (you’ll have to have compute shader functional or else…)
  • I compile compute shader code (HLSL –> FX) at design time rather at run time. This is required because there are no runtime compilation managed wrappers in WACP because those are part of D3DX which isn’t part of the OS but rather redistributed separately. So you have to rely on FXC compiler (part of DirectX SDK). A note here: I’ve lost quite a lot of time (again!) figuring out why FXC doesn’t find a suitable entry point to my compute shader code when compiling (and errors out). I finally remembered that I’ve stumbled upon this problem quite a long time ago: FXC compiles only ASCII(!!!!!) files. Guys, is this 2009 or 1979? Ben 10 would say. “Oh man.”
    Anyway, the FX code is included with project so you won’t have to compile it again. This time the output should look like this:
    image

I didn’t document my rewritten BasicCompute11 project. You can search for comments in the original project.

Conclusion

Compute shaders are incredibly powerful when it comes to numerical calculations. And being able to work with them through managed code is just great. Not a lot of developers will need their power though, but if/when they’ll need them the support is there.

As for final word: I miss documentations, tutorials and samples. Hopefully eventually they’ll come but for now there is a good book on CUDA (remember, the technologies are similar) – the book comes in a PDF format along NVidia CUDA SDK.

Have fun DirectComputing!

DirectComputeManaged.zip (2.05 mb)

7 thoughts on “A managed path to DirectCompute

  1. Hi, I tried for hours to make your sample code work.
    First your project requires VS 2008 debug dlls so I dwl the latest WindowsApiCodePack and after I pointed the sample project to use the new library I found that I have to remap calls and enums..etc. After getting itto compile I basically get all kind of run-time errors (e.g. Value does not fall within the expected range. when I tried to create the D3DBuffer of the D3DBuffer buffer has some strange option flags..etc)
    Can you please review your project to work with WindowsApiCodePack v.1.1 and VS2010? As you mentioned in your article is nearly impossible to find documentation about this thing.
    Best regards,
    Dan

    1. Hi Dan,

      I opened the project in Visual Studio 2010 just fine. Had no problems converting it.
      What problems do you face when converting to VS2010?

  2. Using your project "as is" gives me the following error

    Could not load file or assembly 'Microsoft.WindowsAPICodePack.DirectX, Version=1.0.0.0, Culture=neutral, PublicKeyToken=null' or one of its dependencies. The application has failed to start because its side-by-side configuration is incorrect. Please see the application event log or use the command-line sxstrace.exe tool for more detail. (Exception from HRESULT: 0x800736B1) … than looking into the ApplicationEvent log is showing

    Activation context generation failed for "C:\tests\DirectComputeManaged\bin\Debug\Microsoft.WindowsAPICodePack.DirectX.dll". Dependent Assembly Microsoft.VC90.DebugCRT,processorArchitecture="x86",publicKeyToken="1fc8b3b9a1e18e3b",type="win32",version="9.0.21022.8" could not be found. Please use sxstrace.exe for detailed diagnosis.

    That's probably because I don't have VS 2008 design time libraries.

    However I decided to use the newer Microsoft.WindowsAPICodePack.DirectX v1.1
    After fixing and translating all the compilation errors looking into the old metadata and new one I don't get the program to run because I get an error on the following line

    static D3DBuffer CreateBuffer(D3DDevice device, BuffType[] buffer, bool init) <<< init = false

    >>>> result = device.CreateBuffer(desc); throws the Value does not fall within the expected range. exception.

    I don't know how to fix this probably the only way is to read a HLSL book and probably will do that soon.

    IT WOULD BE AWESOME IF YOU FIX THE SAMPLE TO WORK WITH v 1.1 🙂
    This might be the only good article on the web to give the reader a taste of this technology!

    Best regards,
    Dan

      1. Windows 7 x64
        VS2010 Premium
        DirectX SDK (June 2010)
        Windows API Code Pack v1.1 (Binaries, Source, Docs)
        Intel Core 2 Quad 9450 8Gb
        Nvidia 560 GTX 1Gb driver 296.10

    1. HERE IS THE UPDATED SOURCE CODE TO MAKE IT COMPATIBLE WITH 1.1

      using System;
      using System.Collections.Generic;
      using System.Text;
      using Microsoft.WindowsAPICodePack.DirectX.Direct3D11;
      using Microsoft.WindowsAPICodePack.DirectX.Direct3D;
      using System.IO;
      using System.Runtime.InteropServices;
      using Microsoft.WindowsAPICodePack.DirectX.Graphics;

      namespace TestSimple
      {
      struct BuffType
      {
      public int I;
      public float F;

      /// <summary>
      /// Initializes a new instance of the BuffType structure.
      /// </summary>
      /// <param name="i"></param>
      /// <param name="f"></param>
      public BuffType(int i, float f)
      {
      I = i;
      F = f;
      }
      }

      class Program
      {
      static int ELEMENT_SIZE = Marshal.SizeOf(typeof(BuffType));

      static void Main1(string[] args)
      {
      const int NUM_ELEMENTS = 1024;

      FeatureLevel[] level = new FeatureLevel[] { FeatureLevel.Ten };
      using (D3DDevice device = D3DDevice.CreateDevice(null, DriverType.Hardware, null, CreateDeviceOptions.SingleThreaded | CreateDeviceOptions.Debug, level))
      {

      DeviceContext context = device.ImmediateContext;
      Console.WriteLine(device.DeviceFeatureLevel);
      FeatureDataD3D10XHardwareOptions options;

      //if (device.CheckFeatureDataD3D10XHardwareOptions(out options))
      // Console.WriteLine(options.ComputeShadersPlusRawAndStructuredBuffersViaShader4x);
      if (!device.IsComputeShaderWithRawAndStructuredBuffersSupported)
      Console.WriteLine("Failed to check hw options");

      ComputeShader shader = null;
      try
      {

      using (Stream hlsl = File.OpenRead("unit.fx"))
      shader = device.CreateComputeShader(hlsl);
      BuffType[] buffer;

      buffer = new BuffType[NUM_ELEMENTS];
      for (int i = 0; i < NUM_ELEMENTS; i++)
      {
      buffer[i] = new BuffType { I = i, F = i };
      }

      using (D3DBuffer buf0 = CreateBuffer(device, buffer, true))
      using (D3DBuffer buf1 = CreateBuffer(device, buffer, true))
      using (D3DBuffer buf2 = CreateBuffer(device, buffer, false))
      {
      using (ShaderResourceView view0 = CreateView(device, buf0))
      using (ShaderResourceView view1 = CreateView(device, buf1))
      using (UnorderedAccessView view2 = CreateUnorderedAccessView(device, buf2))
      {
      RunComputerShader(context, shader, new ShaderResourceView[] { view0, view1 }, new UnorderedAccessView[] { view2 }, NUM_ELEMENTS, 1, 1);
      D3DBuffer debugbuf = CreateAndCopyToDebugBuf(device, context, buf2);
      MappedSubresource mappedResource = (MappedSubresource)context.Map(debugbuf, 0, Map.Read, Microsoft.WindowsAPICodePack.DirectX.Direct3D11.MapOptions.None);
      IntPtr p = mappedResource.Data;
      int cursor = p.ToInt32();
      for (int i = 0; i < NUM_ELEMENTS; i++)
      {
      p = new IntPtr(cursor);
      BuffType bt = (BuffType)Marshal.PtrToStructure(p, typeof(BuffType));
      cursor += ELEMENT_SIZE;
      if (bt.I != buffer[i].I * 2 || bt.F != buffer[i].F * 2)
      {
      Console.WriteLine("Failed");
      }
      }
      }
      }
      }
      finally
      {
      if (shader != null)
      shader.Dispose();
      }
      }
      Console.WriteLine("End");
      Console.ReadLine();
      }

      static void Main(string[] args)
      {
      const int NUM_ELEMENTS = 1024;

      FeatureLevel[] level = new FeatureLevel[] { FeatureLevel.Ten };
      using (D3DDevice device = D3DDevice.CreateDevice(null, DriverType.Hardware, null, CreateDeviceOptions.SingleThreaded | CreateDeviceOptions.Debug, level))
      {
      DeviceContext context = device.ImmediateContext;
      Console.WriteLine(device.DeviceFeatureLevel);
      FeatureDataD3D10XHardwareOptions options;

      if (device.IsComputeShaderWithRawAndStructuredBuffersSupported)
      Console.Write("IsComputeShaderWithRawAndStructuredBuffersSupported");
      else
      Console.WriteLine("Failed to check hw options");

      ComputeShader shader = null;
      try
      {

      using (Stream hlsl = File.OpenRead("unit.fx"))
      shader = device.CreateComputeShader(hlsl);
      BuffType[] buffer;

      buffer = new BuffType[NUM_ELEMENTS];
      for (int i = 0; i < NUM_ELEMENTS; i++)
      {
      buffer[i] = new BuffType { I = i, F = i };
      }

      using (D3DBuffer buf0 = CreateBuffer(device, buffer, true))
      using (D3DBuffer buf1 = CreateBuffer(device, buffer, true))
      using (D3DBuffer buf2 = CreateBuffer(device, buffer, false))
      {
      using (ShaderResourceView view0 = CreateView(device, buf0))
      using (ShaderResourceView view1 = CreateView(device, buf1))
      using (UnorderedAccessView view2 = CreateUnorderedAccessView(device, buf2))
      {
      RunComputerShader(context, shader, new ShaderResourceView[] { view0, view1 }, new UnorderedAccessView[] { view2 }, NUM_ELEMENTS, 1, 1);
      D3DBuffer debugbuf = CreateAndCopyToDebugBuf(device, context, buf2);
      MappedSubresource mappedResource = (MappedSubresource)context.Map(debugbuf, 0, Map.Read, Microsoft.WindowsAPICodePack.DirectX.Direct3D11.MapOptions.None);
      IntPtr p = mappedResource.Data;
      int cursor = p.ToInt32();
      for (int i = 0; i < NUM_ELEMENTS; i++)
      {
      p = new IntPtr(cursor);
      BuffType bt = (BuffType)Marshal.PtrToStructure(p, typeof(BuffType));
      cursor += ELEMENT_SIZE;
      if (bt.I != buffer[i].I * 2 || bt.F != buffer[i].F * 2)
      {
      Console.WriteLine("Failed");
      }
      }
      }
      }
      }
      finally
      {
      if (shader != null)
      shader.Dispose();
      }
      }
      Console.WriteLine("End");
      Console.ReadLine();
      }

      static D3DBuffer CreateAndCopyToDebugBuf(D3DDevice device, DeviceContext context, D3DBuffer buffer)
      {
      BufferDescription desc = buffer.Description;
      desc.CpuAccessOptions = CpuAccessOptions.Read;
      desc.Usage = Usage.Staging;
      desc.BindingOptions = BindingOptions.None;
      desc.MiscellaneousResourceOptions = MiscellaneousResourceOptions.None;
      D3DBuffer result = device.CreateBuffer(desc);
      context.CopyResource(result, buffer);
      return result;
      }

      static void RunComputerShader(DeviceContext context, ComputeShader shader, ShaderResourceView[] views, UnorderedAccessView[] unordered, uint x, uint y, uint z)
      {
      ComputeShaderPipelineStage cs = context.CS;
      cs.Shader = shader;
      cs.SetShaderResources(0, views);
      // 4 size of handle?
      cs.SetUnorderedAccessViews(0, unordered, new uint[] { 0 });
      context.Dispatch(x, y, z);
      // cs.SetUnorderedAccessViews(0, null, new uint[] { 0 });
      // cs.SetShaderResources(0, 1, null);
      }

      static ShaderResourceView CreateView(D3DDevice device, D3DBuffer buffer)
      {
      ExtendedBufferShaderResourceView exBuffer;
      Format format;

      //desc.ExtendedBuffer.FirstElement = 0;
      if ((buffer.Description.MiscellaneousResourceOptions & MiscellaneousResourceOptions.BufferAllowRawViews) == MiscellaneousResourceOptions.BufferAllowRawViews)
      {
      //desc.Format = Format.R32Typeless;
      //desc.ExtendedBuffer.BindingOptions = ExtendedBufferBindingOptions.Raw;
      //desc.ExtendedBuffer.ElementCount = buffer.Description.ByteWidth / 4;

      format = Format.R32Typeless;
      exBuffer = new ExtendedBufferShaderResourceView
      {
      FirstElement = 0,
      BindingOptions = ExtendedBufferBindingOptions.Raw,
      ElementCount = buffer.Description.ByteWidth / 4

      };

      }
      else if ((buffer.Description.MiscellaneousResourceOptions & MiscellaneousResourceOptions.BufferStructured) == MiscellaneousResourceOptions.BufferStructured)
      {
      //desc.Format = Format.Unknown;
      //desc.ExtendedBuffer.ElementCount = buffer.Description.ByteWidth / buffer.Description.StructureByteStride;

      format = Format.Unknown;
      exBuffer = new ExtendedBufferShaderResourceView
      {
      FirstElement = 0,
      ElementCount = buffer.Description.ByteWidth / buffer.Description.StructureByteStride
      };
      }
      else
      throw new Exception("Unsupported buffer format");

      ShaderResourceViewDescription desc = new ShaderResourceViewDescription
      {
      ViewDimension = ShaderResourceViewDimension.ExtendedBuffer,
      Format = format,
      ExtendedBuffer = exBuffer
      };

      return device.CreateShaderResourceView(buffer, desc);
      }

      static UnorderedAccessView CreateUnorderedAccessView(D3DDevice device, D3DBuffer buffer)
      {

      Format format;
      BufferUnorderedAccessView buff;

      //desc.Buffer.FirstElement = 0;

      if ((buffer.Description.MiscellaneousResourceOptions & MiscellaneousResourceOptions.BufferAllowRawViews) == MiscellaneousResourceOptions.BufferAllowRawViews)
      {
      //desc.Format = Format.R32Typeless;
      //desc.Buffer.BufferOptions = UnorderedAccessViewBufferOptions.Raw;
      //desc.Buffer.ElementCount = buffer.Description.ByteWidth / 4;
      format = Format.R32Typeless;
      buff = new BufferUnorderedAccessView
      {
      BufferOptions = UnorderedAccessViewBufferOptions.Raw,
      ElementCount = buffer.Description.ByteWidth / 4
      };
      }
      else if ((buffer.Description.MiscellaneousResourceOptions & MiscellaneousResourceOptions.BufferStructured) == MiscellaneousResourceOptions.BufferStructured)
      {
      //desc.Format = Format.Unknown;
      //desc.Buffer.ElementCount = buffer.Description.ByteWidth / buffer.Description.StructureByteStride;
      format = Format.Unknown;
      buff = new BufferUnorderedAccessView
      {
      ElementCount = buffer.Description.ByteWidth / buffer.Description.StructureByteStride
      };
      }
      else
      throw new Exception("Unsupported buffer format");

      UnorderedAccessViewDescription desc = new UnorderedAccessViewDescription
      {
      ViewDimension = UnorderedAccessViewDimension.Buffer,
      Format = format,
      Buffer = buff
      };

      return device.CreateUnorderedAccessView(buffer, desc);
      }

      static D3DBuffer CreateBuffer(D3DDevice device, BuffType[] buffer, bool init)
      {

      BufferDescription desc = new BufferDescription
      {
      BindingOptions = BindingOptions.UnorderedAccess | BindingOptions.ShaderResource,
      ByteWidth = (uint)(ELEMENT_SIZE * buffer.Length),
      MiscellaneousResourceOptions = MiscellaneousResourceOptions.BufferStructured,
      StructureByteStride = (uint)ELEMENT_SIZE
      };
      IntPtr ptr = Marshal.AllocHGlobal((int)desc.ByteWidth);
      int current = ptr.ToInt32();
      foreach (BuffType bt in buffer)
      {
      IntPtr offset = new IntPtr(current);
      Marshal.StructureToPtr(bt, offset, false);
      current += ELEMENT_SIZE;
      }
      D3DBuffer result;
      SubresourceData initData;
      if (init)
      {
      initData = new SubresourceData
      {
      SystemMemory = ptr
      };
      result = device.CreateBuffer(desc, initData);
      }
      else
      {
      initData = new SubresourceData();
      result = device.CreateBuffer(desc);
      }
      return result;
      }

      }
      }

Leave a Reply