Message boards : Number crunching : Multiple Teslas in one box. Is there a limit per machine for tasks?
Author | Message |
---|---|
One of my team mates has a very impressive rig and is having an issue with getting more than 8 work units at a time on his Teslas. He has 8 dual GPU cards in his rig and the only way he could feed all 16 was to add a second project. Is this a limitation on the server side? | |
ID: 37067 | Rating: 0 | rate: / Reply Quote | |
All Teslas, Left are K10s, right are M2090s. | |
ID: 37068 | Rating: 0 | rate: / Reply Quote | |
It gets more fun... I have seen gpugrid try to use gpu 9 once in a while... but it has an immediate computational error... oO. | |
ID: 37069 | Rating: 0 | rate: / Reply Quote | |
Do the admins have any input on this issue? We were wondering if maybe there was some kind of limit server side preventing more than 8 work units in a single machine for GPU's or if running more than 8 at a time was a known issue. It surprises me that nobody has bothered to chime in even to get additional information. | |
ID: 37239 | Rating: 0 | rate: / Reply Quote | |
Don't know - I've some 8-GPU (K40) machines that run ok. What goes wrong exactly? | |
ID: 37241 | Rating: 0 | rate: / Reply Quote | |
Don't know - I've some 8-GPU (K40) machines that run ok. What goes wrong exactly? It is the 9th gpus that pushes it over. 0-8 makes 9 All work units fail instantly on gpu 9 or the 10th gpu core or higher... They rarely try on higher gpu numbers past 9 as the gpus are given work units sequentially. HP makes the SL270s Gen8 that these reside in. K10 is a dual gpu card, K40 is single gpu albeit significantly stronger. I have tried 2 installs of SL 6.5 I have removed the 5th and 6th card to make sure it was not the cards... Problem migrated to the next ones in line. I updated the drivers to 337.19 and the problem persisted. I dropped both servers down to 9 gpu cores and the problem goes away. I would prefer to have all the K10s in one box... As is I have 3 empty slots in the left SL node and require 2 more servers to use all of them... http://i.imgur.com/pEDLqoM.png | |
ID: 37243 | Rating: 0 | rate: / Reply Quote | |
4 possibilities come to mind: | |
ID: 37245 | Rating: 0 | rate: / Reply Quote | |
skgiven, if BOINC has the limitation on GPU's, it must be server side. As he stated above, he can fill the GPU's with Einstein work. Just not GPUGrid. At least that is what I took from our conversation... | |
ID: 37247 | Rating: 0 | rate: / Reply Quote | |
It is the 9th gpus that pushes it over. 0-8 makes 9 Yes. Our app has a limit of 8 GPUs/host. I'll see about getting that fixed, but it won't happen for a wee while. Matt | |
ID: 37320 | Rating: 0 | rate: / Reply Quote | |
It is the 9th gpus that pushes it over. 0-8 makes 9 Thank you for the confirmation. This allows us to move on and not waste more time testing and tweaking. ____________ | |
ID: 37341 | Rating: 0 | rate: / Reply Quote | |
Message boards : Number crunching : Multiple Teslas in one box. Is there a limit per machine for tasks?