Author |
Message |
|
I sure wish BOINC allowed for inclusion of screen shots!
8/16/2022 11:27:13 PM | GPUGRID | Starting task 2-ADRIA_FS_RNAfmnrb_EFL6_2-1-2-RND3133_1
from the event log (I have saved screen shots to document this)
date time progress elapsed remaining
8/18 10:01 A [b]57.799%[/b] 1d 10:34:05 1d 01:14:17
8/18 10:56 A [b]57.799%[/b] 1d 11:29:30 1d 01:54:45
8/18 11:00 A [b]57.799%[/b] 1d 11:32:26 1d 01:56:54
...At this point, I suspended the WU for 1hr and 20 min. then restarted
...the BOINC 'elapsed' timer mysteriously 'shortened' the actual run time,
8/18 12:19 P [b]57.799%[/b] 23:01:10 1d 14:28:10 by 12 hr, 31min
LUCKILY, it does seem this tactic of pausing and restarting will save the WU,
https://gpugrid.net/result.php?resultid=33001055
and it is slated to finish after 1d 15:58 of run time on a GTX 1660-Ti
YET this faulty WU wasted 12 1/2 hours of GPU time
PLEASE fix the problem(s) with ADRIA tasks
LLP, PhD, Prof Engr
____________
|
|
|
|
what does "Detected memory leaks!" mean??
https://gpugrid.net/result.php?resultid=33001055
Stderr output
<core_client_version>7.16.20</core_client_version>
<![CDATA[
<stderr_txt>
23:27:19 (7328): wrapper (7.9.26016): starting
23:27:19 (7328): wrapper: running bin/acemd3.exe (--boinc --device 0)
Detected memory leaks!
Dumping objects ->
|
|
|
|
what does "Detected memory leaks!" mean??
https://gpugrid.net/result.php?resultid=33001055
Stderr output
<core_client_version>7.16.20</core_client_version>
<![CDATA[
<stderr_txt>
23:27:19 (7328): wrapper (7.9.26016): starting
23:27:19 (7328): wrapper: running bin/acemd3.exe (--boinc --device 0)
Detected memory leaks!
Dumping objects ->
Read the FAQs
https://gpugrid.net/forum_thread.php?id=5272
____________
|
|
|
|
Thanks for your reply, but "Please ignore the message. ..." is a ludicrous answer by GPUGrid.
"Such messages are always present in Windows" I'm not sure what "Such messages" is supposed to mean, but with over 40 years of working with computers, a Masters and a PhD and being a registered PE (a licensed Professional Engineer) ... I have never seen any other "such messages"
"It's completely harmless.... not related to successful" well, there was no other error message in the task output, yet the task 'stalled' and wasted over 12 hours of GPU time on a not-that-bad GPU (NVidia GTX 1660 Ti)
If the GPUGrid project is willing to ask for and accept the in-kind donations of people's GPU time, then GPUGrid has an obligation to do what they can to resolve problematic tasks and code
____________
|
|
|
|
Thanks for your reply, but "Please ignore the message. ..." is a ludicrous answer by GPUGrid.
"Such messages are always present in Windows" I'm not sure what "Such messages" is supposed to mean, but with over 40 years of working with computers, a Masters and a PhD and being a registered PE (a licensed Professional Engineer) ... I have never seen any other "such messages"
"It's completely harmless.... not related to successful" well, there was no other error message in the task output, yet the task 'stalled' and wasted over 12 hours of GPU time on a not-that-bad GPU (NVidia GTX 1660 Ti)
If the GPUGrid project is willing to ask for and accept the in-kind donations of people's GPU time, then GPUGrid has an obligation to do what they can to resolve problematic tasks and code
I'm not sure what you're on about. you've only completed this one single task. and it was completed successfully. and you received credit for it.
what do you mean by "wasted over 12 hours of GPU time"? these tasks are VERY long running. 12hrs seems normal for that relatively weak GPU. and the ACEMD3 tasks can vary in length depending on what it's doing. there was a time when they only took 20mins, and a time where it took 24hrs. just depends on the work
so what's the problem exactly? it looks like your complaining about a valid/successful task. i see nothing wrong with this task.
____________
|
|
|
|
please read
Message 59133 - Posted: 19 Aug 2022 | 6:18:01 UTC
Quite clearly, you have not.
you've only completed this one single task
Wow.
So, I have a total "Credit: 11,845,453" by having completed one single task.
Amazing.
Indeed, I used to give high preference to GPUGrid among GPU projects because I felt its scientific merits deserved it, despite this project giving far, far less credit per GPU hour than a number of other projects (e.g., PrimeGrid, SRBase)
what do you mean by "wasted over 12 hours of GPU time"?
please read
Message 59133 - Posted: 19 Aug 2022 | 6:18:01 UTC
the first post in this thread.
LLP, PhD, Prof. Engr.
____________
|
|
|
|
date _ time _ progress _ elapsed _ remaining
8/18 10:01 A 57.799% 1d 10:34:05 1d 01:14:17
8/18 10:56 A 57.799% 1d 11:29:30 1d 01:54:45
8/18 11:00 A 57.799% 1d 11:32:26 1d 01:56:54
...At this point, I suspended the WU for 1hr and 20 min. then restarted
...the BOINC 'elapsed' timer mysteriously 'shortened' the actual run time,
8/18 12:19 P 57.799% 23:01:10 1d 14:28:10 by 12 hr, 31min
The above is from the event log (I have saved screen shots to document this)
Thus, this WU was 'hung up' for who knows how long.
At the very minimum, this WU wasted over 12 1/2 hours of GPU time
GPUGrid admins:
1 PLEASE fix all ongoing problem(s) with GPUGrid tasks
2 PLEASE use the Notices tab in BIONC Manager to communicate info (or direct link to such info) regarding needed patches, mods, etc for GPUGird WUs to run properly
Thank you
____________
|
|
|
|
It's clear that the spamming of your credentials hasn't aided in your critical thinking ability. It should be very clear that I was referencing RECENTLY completed tasks. Or rather I should say the singular "task". Not sure how you can extrapolate ONE issue on ONE system to be an endemic problem with all ADRIA tasks. Wow.
since this has not come up as a wide spread issue, it's much more likely to be an issue with your system and nothing wrong with the tasks. I have completed thousands of these tasks with this application and earned billions of credits from them. and not once has this happened.
ACEMD3 tasks (of various campaigns, ADRIA or otherwise), are particularly stressful for the GPU as compared to many other projects, and stress areas of the GPU that other projects might not. and it's not uncommon to have driver crashes in Windows as a result of that stress. when the driver crashes in windows and tries to recover, I could see that hanging up a GPU task in BOINC. whenever a task is suspended and resumed in BOINC, that triggers it to restart from the last checkpoint (which is why the timer reset).
you should up date your drivers (looks like they aren't recent), update Windows, and verify your system is clean of dust or other issues that might cause thermal issues like bad airflow to the GPU. and finally could be a faulty GPU or power problem or some other hardware issue with the system.
try less condescension and outrage, and more big picture thinking and problem solving that Engineers are known for.
NASA Engineer.
____________
|
|
|