Preventing Cycles Crash after CUDA timeout

I recently had a problem with Blender freezing and Cycles crashing due to “Unknown” CUDA errors. I just want to put this solution out there for anybody experiencing similar issues.

Crash characteristics

I encountered crashes when rendering on my GPU using Cycles (on Windows), originally only at some point in the rendering process. Side-effects were a Windows message saying “Display diver stopped responding and has recovered”, and the screen turning black for little less than a second.

Increasing the tile size caused the crashes to occur immediately with the first samples rendered.

If you have this problem on a device with Nvidia Optimus technology, you should first check if this solution works for you. A choice of the wrong GPU device may crash the graphics card driver. Here, forcing Blender to use the correct GPU device is the solution (as described in the BlenderArtists-thread).

In my case, this did not help. My console output was the following:

[...] (rendering a few, in some cases thousands of samples without crash)

Fra:1 Mem:354.16M (58.59M, Peak 585.60M) | Mem: 1471.75M, Peak: 1471.75M | Scene | Elapsed: 00:49.05 | Rendering | Path Tracing Tile 1/12, Sample 5/200
CUDA error: Unknown error in cuMemcpyDtoH((uchar*)mem.data_pointer + offset, (CUdeviceptr)((uchar*)mem.device_pointer + offset), size)
Fra:1 Mem:354.16M (58.59M, Peak 585.60M) | Mem: 1471.75M, Peak: 1471.75M | Scene | Elapsed: 00:52.74 | Rendering | Path Tracing Tile 1/12, Sample 6/200
CUDA error: Unknown error in cuMemcpyDtoH((uchar*)mem.data_pointer + offset, (CUdeviceptr)((uchar*)mem.device_pointer + offset), size)
Fra:1 Mem:354.16M (58.59M, Peak 585.60M) | Mem: 1471.75M, Peak: 1471.75M | Scene, Flugzeug | Elapsed: 00:52.88 | Rendering | Path Tracing Tile 1/12, Sample 7/200
CUDA error: Unknown error in cuMemcpyDtoH((uchar*)mem.data_pointer + offset, (CUdeviceptr)((uchar*)mem.device_pointer + offset), size)
CUDA error: Unknown error in cuMemFree(cuda_device_ptr(mem.device_pointer))
CUDA error: Unknown error in cuMemFree(cuda_device_ptr(mem.device_pointer))
Fra:1 Mem:354.16M (58.59M, Peak 585.60M) | Mem: 1462.75M, Peak: 1471.75M | Scene | Elapsed: 00:53.01 | Rendering | Path Tracing Tile 1/12, Sample 200/200
CUDA error: Unknown error in cuMemAlloc(&device_pointer, size)
CUDA error: Unknown error in cuMemAlloc(&device_pointer, size)

[Blender then tries to render all the other tiles and fails repeatedly, always with the same error messages]

Fra:1 Mem:354.16M (58.59M, Peak 585.60M) | Mem: 1471.75M, Peak: 1471.75M | Scene | Elapsed: 00:53.02 | Rendering | Path Tracing Tile 2/12, Sample 0/200
Fra:1 Mem:354.16M (58.59M, Peak 585.60M) | Mem: 1471.75M, Peak: 1471.75M | Scene | Elapsed: 00:53.02 | Rendering | Path Tracing Tile 2/12, Sample 200/200
CUDA error: Unknown error in cuMemAlloc(&device_pointer, size)
CUDA error: Unknown error in cuMemAlloc(&device_pointer, size)

Cause of the crash

The problem here is that Windows has a timeout detection and recovery (TDR) system that detects if a GPU computation takes longer than a given amount of time, the default value for that being two seconds, and then “reinitializes” the Windows Display Driver Model (WDDM) driver and resets the GPU. This will stop the rendering process. You will also notice that any displays attached to the GPU you use for rendering will turn black for a short moment.

Normally, this system is great because it prevents permanent screen freezes for malfunctioning drivers or games. But in Cycles, one sample is considered one computation, which means that if your sample calculation takes longer than two seconds, Cycles (and the Blender UI, if you render with UI) will crash.

Solutions

Changing the rendering settings

Because the computing time per sample mainly depends on the size of the tile and the complexity of the geometry inside it, reducing the tile size and reducing the complexity of your shaders will fix the issue. But reducing the tile size can increase rendering times on the GPU, and simple shaders might not be what you want to render. Therefore, the following solution is usually the better one.

Increasing the TDR timeout value

To prevent Cycles from crashing, you can also increase the TDR timeout. Before you do that, keep in mind that you should be careful when making any changes to your registry, and that for actual screen freezes, your computer might now wait longer until it restarts your driver, so be aware of that and remember to be patient in such a situation. I am not responsible for any damage caused by this, but it should be safe and it worked for me.

How to increase the TDR timeout time (on Windows 7, the Vista path should be very similar):

  1. Open regedit.exe (Press the Windows key, type “regedit”, press enter, confirm that you want to open it).
  2. Go to: HKEY_LOCAL_MACHINE/SYSTEM/CurrentControlSet/Control/GraphicsDrivers
  3. a)
    If the folder contains a key with the name “TdrDelay”, right-click it and change the value to something high enough for your scene. Something between 8 and 16 should be fine, the number is interpreted as seconds. And be aware of the difference between a decimal and a hexadecimal input.
    b)
    If the folder does not contain a key with the name “TdrDelay”, right-click into the empty space below the values and create a new “DWORD”-value. Name it “TdrDelay” and change the value as described in 3.a).
  4. Reboot your system for the changes to take effect.

Alternately, creating a DWORD “TdrLevel” and setting it to 0 (off) instead of the default 3 (recover on timeout) will also work. However, I do not recommend doing this because you may have to reboot your system in the case of a future driver or application crash that leads to a frozen screen – and data may get lost as a consequence of that.

More Registry options to modify the behaviour of the TDR can be found here.

I hope this helps whoever might have this problem, please leave a comment if you found an error.


RSS Feed


Julian Herzog Written by:

3 Comments

  1. Thank you Julian. I had the same crashy problem until I reduced bucket / tile size to 128x128px. I also updated the Nvidia Drivers running a 760, 780 & the Titan.

  2. Tunge
    November 15
    Reply

    Thank you very much, Your tip helped a lot!

  3. February 23
    Reply

    Great tip!
    Just “discovered” branched path tracing and was getting about 1/10th of render time, but at the expense of frequent crashing. This seems to have solved the problem for now.

    Thanks a lot

Leave a Reply

Your email address will not be published. Required fields are marked *