Author |
Message |
tzpmrzSend message
Joined: 8 May 10 Posts: 5 Credit: 140,025,313 RAC: 0 Level
Scientific publications
|
I've got several of these in a row now. They usually say "long runs(8-12 hours on fastest card)" They run for 55+ hours or so, longer than the report deadline. These jobs make my PC very very slow to where I can barely get the BOINC manager up to stop the processing. If I restart processing they run fine for about 30 mins and then the PC is in the same state again. It seems when the PC is very sluggish the GPUGRID job gets no progress toward completion. How can I cancel one of these jobs myself?
|
|
|
skgivenVolunteer moderator Volunteer tester
Send message
Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level
Scientific publications
|
If you have to terminate a task,
In Boinc Manager (Advanced View) under the Tasks tab, click on the task, and click Abort.
I'm guessing this task was ok; it finished on a GTX 470 in about 10h.
Your task ran for about 2.4days, so your system was off for more than 2.6days during the 5 day limit:
Sent 14 Mar 2011 11:32:13 UTC
Received 20 Mar 2011 0:20:53 UTC
Report deadline 19 Mar 2011 11:32:13 UTC
This task was started and stoped 18 times. That's far too much, and no doubt interfeed with checkpointing (after a restart you go back to the last checkpoint). The first checkpoint might not be being reached before the task is stopped (@ 5% I think).
Do you have Leave Appliucations In Memory (LAIM) checked? All GPUGrid crunchers should have.
If not I guess you are restarting, shutting down/starting up 3 or 4 times a day.
|
|
|
tzpmrzSend message
Joined: 8 May 10 Posts: 5 Credit: 140,025,313 RAC: 0 Level
Scientific publications
|
This is true, I do stop the jobs since the PC is not totally dedicated to the project only. It's my home PC. I won't stop processing for say web browsing but I have to stop it to play video games, it is what it is.
The problem with the jobs I'm complaining about is that I can't even web browse, they are leaking video memory or something because they cripple the machine. There are GPUGRID jobs running now and I can still web browse, 99% of the jobs don't cripple the machine. |
|
|
tzpmrzSend message
Joined: 8 May 10 Posts: 5 Credit: 140,025,313 RAC: 0 Level
Scientific publications
|
the change in my system is that I have enable SLI.
it works out if I only run one GPUGRID project at a time I don't get the crippling affect. I can suspend either one of the jobs and the PC becomes usable again. I think the system is video memory thrashing when just the right mix of GPUGRID projects are run together.
|
|
|
skgivenVolunteer moderator Volunteer tester
Send message
Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level
Scientific publications
|
It's recommended that crunchers disable Sli when crunching. |
|
|
skgivenVolunteer moderator Volunteer tester
Send message
Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level
Scientific publications
|
A while back I had two GT240's in one i7-860 2003 x64 system. I noticed a few tasks that caused an error message and then restarted the task (from the start). It only happened now and again, but over time there was quite a lot of such failures. Sometimes the tasks ran for many hours, before resetting (5h, 12h, 15h). Each time a Windows error message popped up. In the past we could restart the system and these tasks would begin from the last checkpoint. This no longer works. When I only used one GPU the situation seemed to go away.
I have since replaced the two GT240's with one GTX260-216, and moved to 6.13 to crunch the long tasks. Until now this had worked without any problem. This morning the problem reappeared with an IBUCH task.
27/3/11
Faulting application acemdlong_6.13_windows_intelx86__cuda31.exe, version 0.0.0.0, faulting module acemdlong_6.13_windows_intelx86__cuda31.exe, version 0.0.0.0, fault address 0x000025e4.
19/3/11
Faulting application acemd2_6.12_windows_intelx86__cuda, version 0.0.0.0, faulting module acemd2_6.12_windows_intelx86__cuda, version 0.0.0.0, fault address 0x000026e4.
17/3/11 (twice)
Faulting application acemd2_6.12_windows_intelx86__cuda, version 0.0.0.0, faulting module acemd2_6.12_windows_intelx86__cuda, version 0.0.0.0, fault address 0x000026e4.
16/3/11
Faulting application acemd2_6.12_windows_intelx86__cuda, version 0.0.0.0, faulting module acemd2_6.12_windows_intelx86__cuda, version 0.0.0.0, fault address 0x000026e4.
7/3/11
Faulting application acemd2_6.12_windows_intelx86__cuda, version 0.0.0.0, faulting module acemd2_6.12_windows_intelx86__cuda, version 0.0.0.0, fault address 0x000026e4. |
|
|
skgivenVolunteer moderator Volunteer tester
Send message
Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level
Scientific publications
|
Got another IBUCH failure, p5-IBUCH_1_wtEGFR_110325-1-10-RND6118_2. This one did not restart, so perhaps this is a different problem. |
|
|
skgivenVolunteer moderator Volunteer tester
Send message
Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level
Scientific publications
|
Got another Error message just after a reboot. This seems to be the common theme.
Opened Task Manager and killed the process trying to report the error and this prevented the task failing. This is clearly a MS security handling issue and nothing to do with the cuda app, agian. |
|
|
skgivenVolunteer moderator Volunteer tester
Send message
Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level
Scientific publications
|
... and again. This time on restarting after tasks were suspended. No ability to close anything in Task Manager to stop the task error, and closing the error message [X] killed the task! This is down to the way windows interprets the Application Popup.
Event Type: Information
Event Source: Application Popup
Event Category: None
Event ID: 26
Date: 08/04/2011
Time: 19:00:09
User: N/A
Computer: S
Description:
Application popup: acemdlong_6.13_windows_intelx86__cuda31.exe - Application Error : The exception unknown software exception (0x40000015) occurred in the application at location 0x004025e4.
Click on OK to terminate the program
Click on CANCEL to debug the program
Event Type: Error
Event Source: Application Error
Event Category: (100)
Event ID: 1000
Date: 08/04/2011
Time: 19:00:09
User: N/A
Computer: S
Description:
Faulting application acemdlong_6.13_windows_intelx86__cuda31.exe, version 0.0.0.0, faulting module acemdlong_6.13_windows_intelx86__cuda31.exe, version 0.0.0.0, fault address 0x000025e4.
Failed after 5h for no reason other than the operating system decided to terminate it. |
|
|
skgivenVolunteer moderator Volunteer tester
Send message
Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level
Scientific publications
|
2003 server x64 with GTX260. Running a long IBUCH task for 8h50min, 4h to go, but %age complete is zero and I get this acemdlong_6.13_windows_intelx86__cuda21.exe error,
The exception unknown software exception (0x40000005) occurred in the application at location 0x004025e4.
Click on OK to terminate the program
Click on CANCEL to debug the program
|
|
|