Advanced search

Message boards : Number crunching : Error: ERROR: src\mdsim\context.cpp line 318: Cannot use a restart file on a different device!

Author Message
Greg _BE
Send message
Joined: 30 Jun 14
Posts: 135
Credit: 121,356,939
RAC: 23,416
Level
Cys
Scientific publications
watwatwatwatwatwat
Message 61544 - Posted: 20 Jun 2024 | 16:39:13 UTC
Last modified: 20 Jun 2024 | 16:48:11 UTC

For 4 ACEMD 3 tasks, I got this error.
What's that all about?
They were new tasks to me, started from scratch.

I checked the other systems. 1080 like me, a NVIDIA Quadro M4000, a 1060 and a GT730 also crashed with an different error.

But yet a 980TI was able to complete the task that I looked at.

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 1341
Credit: 7,678,756,915
RAC: 13,455,284
Level
Tyr
Scientific publications
watwatwatwatwat
Message 61545 - Posted: 20 Jun 2024 | 21:41:42 UTC - in response to Message 61544.

For 4 ACEMD 3 tasks, I got this error.
What's that all about?
They were new tasks to me, started from scratch.

I checked the other systems. 1080 like me, a NVIDIA Quadro M4000, a 1060 and a GT730 also crashed with an different error.

But yet a 980TI was able to complete the task that I looked at.

That's the standard and expected error on acemd tasks for restarting calculations on a different device that what the task was initially started on.

You didn't post the host so assume it has multiple cards of different type. Unless you have all the same type of cards i.e all 1080's for example, it is best to not pause or stop the calculations of acemd tasks.

Ian&Steve C.
Avatar
Send message
Joined: 21 Feb 20
Posts: 1069
Credit: 40,231,533,983
RAC: 432
Level
Trp
Scientific publications
wat
Message 61547 - Posted: 20 Jun 2024 | 21:47:56 UTC - in response to Message 61544.

you (or something) interrupted the task and it restarted on the other GPU.
you can see this in your stderr output:

21:51:33 (21892): wrapper (7.9.26016): starting
21:51:33 (21892): wrapper: running bin/acemd.exe (--boinc --device 1)

11:16:03 (25672): wrapper (7.9.26016): starting
11:16:03 (25672): wrapper: running bin/acemd.exe (--boinc --device 0)
ERROR: src\mdsim\context.cpp line 318: Cannot use a restart file on a different device!


it started on device 1 at 21:51, then something happened (either you shut off the computer, or rebooted it, or BOINC interrupted the task to do something else, and the task restarted on device 0 at 11:16.

i think this has always been the case with acemd3. they don't tolerate that.

your host identifies the best GPU as the GTX 1080, and you have two GPUs in the host. but are they both the same? if not, then that's the problem. sometimes it will restart OK on a different device if the GPUs are identical, but i've seen cases where even restarting on a different identical device causes this error.

the best bet is to not interrupt these at all.

are you turning your computer off at night? dont do that.
are you running other projects? maybe suspend them while GPUGRID is running, or set boinc to not switch tasks
do you have BOINC set to suspend GPU tasks when the computer is in use? turn off that setting.
did the computer crash due to some instability or overheating? investigate and remedy that problem.
____________

Post to thread

Message boards : Number crunching : Error: ERROR: src\mdsim\context.cpp line 318: Cannot use a restart file on a different device!

//