Message boards : Number crunching : Error While Computing
Author | Message |
---|---|
The vast majority of the units my computer completes have been reported as 'Error While Computing'. This has been going on for a few months. For a while a few weeks ago, the units seemed to be much smaller and only take a few hours to complete. These seemed to be validated much more often than the large units that take a couple days of crunching. | |
ID: 58656 | Rating: 0 | rate: / Reply Quote | |
The acemd4 and python tasks are still being debugged by the admin developers. | |
ID: 58657 | Rating: 0 | rate: / Reply Quote | |
I think the only tasks I've gotten have been ACEMD3. Some validate, many show an error while computing. What could cause this on my end?? | |
ID: 58658 | Rating: 0 | rate: / Reply Quote | |
What could cause this on my end?? Do you overclock your GPU ? What's the temperature of the GPU ? | |
ID: 58659 | Rating: 0 | rate: / Reply Quote | |
The acemd3 tasks have been stable for over a year. And one of mine has just crashed on a normally stable computer. Result 32884789: exit code 0, "Incorrect function", after 5 seconds. The acemd3 application normally has a usage lifetime of around a year before it needs a software licence renewal. Are we reaching that time again? Shouldn't be - it was last refreshed on 10 Nov 2021. | |
ID: 58660 | Rating: 0 | rate: / Reply Quote | |
Just to piggyback on this thread with something else,,,, | |
ID: 58661 | Rating: 0 | rate: / Reply Quote | |
Just to piggyback on this thread with something else,,,, I would say no that's not normal. I'm going to guess that you're running the CPU on 100% utilization on some CPU project too? that's probably the reason. you're starving the GPU of CPU resources. ____________ | |
ID: 58662 | Rating: 0 | rate: / Reply Quote | |
I think the only tasks I've gotten have been ACEMD3. Some validate, many show an error while computing. What could cause this on my end?? Looking at your error: 08:26:39 (15796): wrapper: running bin/acemd3.exe (--boinc --device 0) Detected memory leaks! You are having issues with either a hot gpu, hot cpu or flaky memory. These are the typical issues that cause memory errors. | |
ID: 58663 | Rating: 0 | rate: / Reply Quote | |
I think the only tasks I've gotten have been ACEMD3. Some validate, many show an error while computing. What could cause this on my end?? You quoted the wrong issue. Detected memory leaks is ubiquitous in the Windows ACEMD3 app. Even successful runs shows that error. It’s benign and not indicative of any problem. His real issue is here:
____________ | |
ID: 58664 | Rating: 0 | rate: / Reply Quote | |
Thanks for the correction. I wasn't aware that memory leaks are a common problem on Windows hosts. | |
ID: 58665 | Rating: 0 | rate: / Reply Quote | |
Ok, since Keith Myers quoted me, are you saying I have a different problem on my end or there is no problem on my end? | |
ID: 58667 | Rating: 0 | rate: / Reply Quote | |
You had a problem with the task configuration. Server issue. | |
ID: 58668 | Rating: 0 | rate: / Reply Quote | |
Thanks. Incidentally, I'm getting 'error while computing' issues on Rosetta@Home units, also. . . | |
ID: 58670 | Rating: 0 | rate: / Reply Quote | |
Then something wrong with your Python environment I guess. Rosetta is doing Python tasks also I believe. | |
ID: 58671 | Rating: 0 | rate: / Reply Quote | |
ERROR: C:\Users\admin\miniconda3\conda-bld\acemd3_1632736748005\work\src\mdsim\context.cpp line 318: Cannot use a restart file on a different device! | |
ID: 58697 | Rating: 0 | rate: / Reply Quote | |
ERROR: C:\Users\admin\miniconda3\conda-bld\acemd3_1632736748005\work\src\mdsim\context.cpp line 318: Cannot use a restart file on a different device! This is a well known issue. You can’t restart the task on a different GPU. Basically can’t interrupt a running task at all. ____________ | |
ID: 58698 | Rating: 0 | rate: / Reply Quote | |
ERROR: C:\Users\admin\miniconda3\conda-bld\acemd3_1632736748005\work\src\mdsim\context.cpp line 318: Cannot use a restart file on a different device! I have had the same error, I suspend and shut down the client and exit via the menu at the end of my computing day. The next morning I start up again and the task resumes on the same GPU. But a half day later for full day later then it crashes. | |
ID: 58702 | Rating: 0 | rate: / Reply Quote | |
ERROR: C:\Users\admin\miniconda3\conda-bld\acemd3_1632736748005\work\src\mdsim\context.cpp line 318: Cannot use a restart file on a different device! you can see in your task log that it actually restarted on a different GPU. that's why it failed. 08:01:06 (9168): wrapper (7.9.26016): starting it started on device 1, then the final restart happened on device 0. I would recommend not restarting your computer until the GPUGRID task finishes. I've even seen this issue happen from restarting on the same GPU after something like a driver update. just don't interrupt the task at all. ____________ | |
ID: 58703 | Rating: 0 | rate: / Reply Quote | |
Message boards : Number crunching : Error While Computing