RuntimeError: Unable to find a valid cuDNN algorithm to run convolution when running python

Message boards : Number crunching : RuntimeError: Unable to find a valid cuDNN algorithm to run convolution when running python

Author	Message
kotenok2000 Send message Joined: 18 Jul 13 Posts: 78 Credit: 128,533,046 RAC: 1,484,729 Level Scientific publications	Message 58983 - Posted: 7 Jul 2022 \| 20:06:30 UTC Last modified: 7 Jul 2022 \| 20:08:01 UTC
	I have nvidia gtx 1650. Maybe it is too old?
	ID: 58983 \| Rating: 0 \| rate: / Reply Quote

kotenok2000 Send message Joined: 18 Jul 13 Posts: 78 Credit: 128,533,046 RAC: 1,484,729 Level Scientific publications	Message 58984 - Posted: 7 Jul 2022 \| 20:55:38 UTC - in response to Message 58983. Last modified: 7 Jul 2022 \| 20:57:20 UTC
	Another got RuntimeError: CUDA out of memory. Tried to allocate 28.00 MiB (GPU 0; 4.00 GiB total capacity; 1.32 GiB already allocated; 1011.70 MiB free; 1.36 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF Third workunit finished and validated in 1645 seconds.
	ID: 58984 \| Rating: 0 \| rate: / Reply Quote

jjch Send message Joined: 10 Nov 13 Posts: 101 Credit: 15,583,700,388 RAC: 4,166,616 Level Scientific publications	Message 58987 - Posted: 8 Jul 2022 \| 22:15:59 UTC - in response to Message 58984.
	First off I would say that the Python apps seem to have a high error rate. I'm noting about 40% failures on my windows systems without finding a good reason why. There could be a cause for this but it might also be normal. The error you noted below seems to be from a variation of the memory used on the GPU. I think the GTX 1650 should be adequate to run the Python apps, so it could be a problem with the Python app. What might be happening is you are also using GPU memory for something else at the same time or prior to GPUgrid. Don't run any other GPU projects or play games etc. I also noted some of your tasks failed where it looked like you were running out of system memory. 16GB is on the low side of what will work well with other things running. I would suggest setting things up so you are only running one GPUgrid Python app and look at your system memory usage. I have seen it be around 10Gb but it can be more. Also check your available free disk space and the swap space you are using while you are monitoring it. Make sure you are not pushing the limits there and running out too.
	ID: 58987 \| Rating: 0 \| rate: / Reply Quote

Keith Myers Send message Joined: 13 Dec 17 Posts: 1341 Credit: 7,684,871,308 RAC: 13,111,656 Level Scientific publications	Message 58988 - Posted: 9 Jul 2022 \| 5:16:36 UTC
	There's a problem with how Windows allocates virtual memory for Python libraries. Linux does not have the issue because it allocates memory differently. See this message of mine. https://www.gpugrid.net/forum_thread.php?id=5322&nowrap=true#58908
	ID: 58988 \| Rating: 0 \| rate: / Reply Quote

kotenok2000 Send message Joined: 18 Jul 13 Posts: 78 Credit: 128,533,046 RAC: 1,484,729 Level Scientific publications	Message 58993 - Posted: 9 Jul 2022 \| 19:34:27 UTC - in response to Message 58988.
	One also crashed because of CUDA out of memory. Tried to allocate 28.00 MiB (GPU 0; 4.00 GiB total capacity; 1.32 GiB already allocated; 1011.70 MiB free; 1.36 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
	ID: 58993 \| Rating: 0 \| rate: / Reply Quote

jjch Send message Joined: 10 Nov 13 Posts: 101 Credit: 15,583,700,388 RAC: 4,166,616 Level Scientific publications	Message 58994 - Posted: 10 Jul 2022 \| 2:48:24 UTC - in response to Message 58993.
	The GTX 1650 is a 4GB card so it should have plenty of memory for the Python app. There is something else going on there. I'm not a CUDA expert but there could be a problem with your driver. It looked like the driver you have is the current version. I would suggest running a full deinstall and cleanup with DDU and reinstall it. If that still doesn't work go back to the previous version and see if that helps. It could just be a problem with the Python/PyTorch programs and there interaction with CUDA or an error in the programming. Other than that, I would only guess you having a problem with your card. Make sure it isn't overheating etc. Also, if you are overclocking revert that to normal etc.
	ID: 58994 \| Rating: 0 \| rate: / Reply Quote

Ian&Steve C. Send message Joined: 21 Feb 20 Posts: 1069 Credit: 40,231,533,983 RAC: 432 Level Scientific publications	Message 58996 - Posted: 10 Jul 2022 \| 13:04:18 UTC - in response to Message 58994.
	The GTX 1650 is a 4GB card so it should have plenty of memory for the Python app. There is something else going on there. I'm not a CUDA expert but there could be a problem with your driver. It looked like the driver you have is the current version. I would suggest running a full deinstall and cleanup with DDU and reinstall it. If that still doesn't work go back to the previous version and see if that helps. It could just be a problem with the Python/PyTorch programs and there interaction with CUDA or an error in the programming. Other than that, I would only guess you having a problem with your card. Make sure it isn't overheating etc. Also, if you are overclocking revert that to normal etc. from what I remember, the python app was using more than 4GB of VRAM. it's definitely possible that 4GB isnt enough. ____________
	ID: 58996 \| Rating: 0 \| rate: / Reply Quote

jjch Send message Joined: 10 Nov 13 Posts: 101 Credit: 15,583,700,388 RAC: 4,166,616 Level Scientific publications	Message 58998 - Posted: 11 Jul 2022 \| 4:06:44 UTC - in response to Message 58996.
	That would be an interesting development. From what I have been gathering the Python app is not putting much of a load on the GPU. Not quite sure about the actual memory usage. I tried to find a reference on what GPU memory is needed in the Forum but I only found one that mentioned a GTX980Ti .... gpu memory usage is almost constant at 2.679MB If you find something that indicates they need 4Gb or more I would like to see it. I don't know of a good way to check on the GPU memory usage because you have to catch it when it's actually using it. The error mentioned below in this thread is only referencing 28.00 MiB more than what was being used at 1.36 GiB and there is 1011.70 MiB free CUDA out of memory. Tried to allocate 28.00 MiB (GPU 0; 4.00 GiB total capacity; 1.32 GiB already allocated; 1011.70 MiB free; 1.36 GiB reserved in total by PyTorch) That actually seems more like a memory error related to CUDA or the driver etc. Not the memory capacity of the card.
	ID: 58998 \| Rating: 0 \| rate: / Reply Quote

Keith Myers Send message Joined: 13 Dec 17 Posts: 1341 Credit: 7,684,871,308 RAC: 13,111,656 Level Scientific publications	Message 58999 - Posted: 11 Jul 2022 \| 5:32:19 UTC - in response to Message 58998.
	The memory utilization seems to be constant on my gpus when they are running a Python task. Currently using 3349MB out of the 8GB on the card. You can see that with nvidia-smi in a Terminal. Or if you want to watch it in real-time then I can use this: watch -n 1 nvidia-smi --query-gpu=name,pci.bus_id,utilization.gpu,utilization.memory,clocks.current.sm,power.draw,memory.used,pcie.link.gen.current,pcie.link.width.current --format=csv which besides showing the amount of memory being used, also shows the memory bus and gpu utilization, clocks, watts and link width and speed.
	ID: 58999 \| Rating: 0 \| rate: / Reply Quote

jjch Send message Joined: 10 Nov 13 Posts: 101 Credit: 15,583,700,388 RAC: 4,166,616 Level Scientific publications	Message 59000 - Posted: 12 Jul 2022 \| 0:40:35 UTC - in response to Message 58999.
	I found a few tasks running on my Windows servers and checked them with GPU-Z. The GPU memory used was between 2518 and 3287 MB. I think with that usage these should run OK on a 4GB card.
	ID: 59000 \| Rating: 0 \| rate: / Reply Quote

Post to thread

Message boards : Number crunching : RuntimeError: Unable to find a valid cuDNN algorithm to run convolution when running python

	About	Science	Volunteers	Performance	Forum	Join us	Donate