Advanced search

Message boards : Number crunching : Stalled WUs?

Author Message
lohphat
Send message
Joined: 21 Jan 10
Posts: 44
Credit: 1,310,436,425
RAC: 7,232,862
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 48295 - Posted: 7 Dec 2017 | 20:26:58 UTC

I have a WU which has been running for several days and seems to get stuck at a percentage complete.

Then I shutdown BOINC and relaunch and the accumulated work disappears and it restarts from a much lower percentage.

Rinse repeat.

The WU is crunching but with no percentage progress.

PappaLitto
Send message
Joined: 21 Mar 16
Posts: 511
Credit: 4,672,242,755
RAC: 0
Level
Arg
Scientific publications
watwatwatwatwatwatwatwat
Message 48296 - Posted: 7 Dec 2017 | 20:31:45 UTC - in response to Message 48295.

Just abort it

Profile JStateson
Avatar
Send message
Joined: 31 Oct 08
Posts: 186
Credit: 3,384,993,979
RAC: 1,279,527
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 48306 - Posted: 8 Dec 2017 | 16:52:13 UTC

i had transfers stalled for weeks on a system that had "no more work" on gpugrid (board too slow). Aborting worked only until the next reboot. Only got rid by detaching and reattaching. may have been stuck for months as i rarely check that feature. maybe this was the "1" task the server status shows as ready to be sent. i finally got rid of it a few minutes ago.

lohphat
Send message
Joined: 21 Jan 10
Posts: 44
Credit: 1,310,436,425
RAC: 7,232,862
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 48309 - Posted: 8 Dec 2017 | 19:12:16 UTC

Two more work units stalled which I had to abort.

Methinks there's a systemic problem managing WUs.

lohphat
Send message
Joined: 21 Jan 10
Posts: 44
Credit: 1,310,436,425
RAC: 7,232,862
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 48718 - Posted: 22 Jan 2018 | 18:41:37 UTC

It seems related to running Firefox (knowing it has h/w acceleration options) -- I'm still playing woth the settings but I can get GPUGRID work units to stall simply by opening up YouTube and playing a video.

The WU percentage stops increasing but it still shows active.

After 10hours I exit BOINC and restart and the hours worked drops back down to the point where it stalled.

So it's still happening, and I can recreate the failure consistently.

I've restarted the project to refresh resources.

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2343
Credit: 16,220,165,968
RAC: 1,511,951
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 48723 - Posted: 22 Jan 2018 | 23:37:15 UTC - in response to Message 48718.

It seems related to running Firefox (knowing it has h/w acceleration options) -- I'm still playing woth the settings but I can get GPUGRID work units to stall simply by opening up YouTube and playing a video.
I suppose that this card is your GTX 980Ti. If it's overclocked, then you should reduce it's clock speed by 100MHz, to see if it makes it more stable. Your card reaches 78°C (172°F) which could be too much while using it for crunching and other purposes simultaneously. It is also recommended to dust off its fins with compressed air.

Betting Slip
Send message
Joined: 5 Jan 09
Posts: 670
Credit: 2,498,095,550
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 48725 - Posted: 23 Jan 2018 | 11:40:22 UTC - in response to Message 48718.

It seems related to running Firefox (knowing it has h/w acceleration options) -- I'm still playing woth the settings but I can get GPUGRID work units to stall simply by opening up YouTube and playing a video.

The WU percentage stops increasing but it still shows active.

After 10hours I exit BOINC and restart and the hours worked drops back down to the point where it stalled.

So it's still happening, and I can recreate the failure consistently.

I've restarted the project to refresh resources.


I had to roll back driver to 385.41 which is the latest driver not to have issues with Firefox browser. It is on Nvidia forums, I had driver "stopped responding and recovered" while browsing with Firefox.

lohphat
Send message
Joined: 21 Jan 10
Posts: 44
Credit: 1,310,436,425
RAC: 7,232,862
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 48732 - Posted: 24 Jan 2018 | 7:45:57 UTC - in response to Message 48725.

I had to roll back driver to 385.41 which is the latest driver not to have issues with Firefox browser. It is on Nvidia forums, I had driver "stopped responding and recovered" while browsing with Firefox.


That did it.

However I have in my notes that 385.41 caused WU errors with Einstein@Home -- I'm awaiting for the project to issue me new WUs to verify.

But as for GPUGRID, it fixed the problem.

FWIW, it never crashed the driver or FFox -- it just caused GPUGRID WUs to stall but not error out.

Erico
Send message
Joined: 19 Apr 18
Posts: 1
Credit: 149,850
RAC: 0
Level

Scientific publications
wat
Message 49355 - Posted: 24 Apr 2018 | 20:51:28 UTC

Just wanted to say I had this problem too. The project stalled three times from 0 to 50%, then I reduced my GTX 970's memory clock from 3800 MHz (which never had any issues running another project, Milkyway@Home) to the default of 3500 MHz and the last 50% didn't stall. I don't know if it's just a coincidence or not.

I stopped running GPUGRID because of this, so if the admins want to know what project it was, just look at the last one I turned over.

Betting Slip
Send message
Joined: 5 Jan 09
Posts: 670
Credit: 2,498,095,550
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 49356 - Posted: 24 Apr 2018 | 21:23:26 UTC - in response to Message 49355.
Last modified: 24 Apr 2018 | 21:40:14 UTC


I stopped running GPUGRID because of this, so if the admins want to know what project it was, just look at the last one I turned over.


A lot of the work on this project is far more demanding of video ram and gpu's than the projects you mention. Overclocking your vram was your problem not this projects.

When you see "The simulation has become unstable. Terminating to avoid lock-up" it's almost always due to overclocking.
____________
Radio Caroline, the world's most famous offshore pirate radio station.
Great music since April 1964. Support Radio Caroline Team -
Radio Caroline

lukeu
Send message
Joined: 14 Oct 11
Posts: 31
Credit: 81,420,504
RAC: 4,750
Level
Thr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 51356 - Posted: 18 Jan 2019 | 13:49:29 UTC - in response to Message 49356.
Last modified: 18 Jan 2019 | 13:49:54 UTC

I have seen similar stalls. Just now:

- Before suspending: 10 hr done, 10 min to go, yet 21% progress.
- After resuming: 2 hr done, 8.5 hr to go (also 21% progress)

Firefox was definitely playing YouTube, although I didn't investigate whether it was a cause.

I think my driver (391.35) just came via Windows-10 update so could be quite old now.

Can anyone confirm if later drivers resolve this? I'd prefer to go forwards rather than roll back, if possible.

Dylan
Send message
Joined: 16 Jul 12
Posts: 98
Credit: 386,043,752
RAC: 0
Level
Asp
Scientific publications
watwatwatwatwatwatwat
Message 51362 - Posted: 21 Jan 2019 | 14:20:47 UTC - in response to Message 51356.
Last modified: 21 Jan 2019 | 14:21:16 UTC

I have seen similar stalls. Just now:

- Before suspending: 10 hr done, 10 min to go, yet 21% progress.
- After resuming: 2 hr done, 8.5 hr to go (also 21% progress)

Firefox was definitely playing YouTube, although I didn't investigate whether it was a cause.

I think my driver (391.35) just came via Windows-10 update so could be quite old now.

Can anyone confirm if later drivers resolve this? I'd prefer to go forwards rather than roll back, if possible.




I have a 1060 6GB with the 416.34 driver and don't have this issue, but I use chrome, not Firefox for youtube. Another thing, I use an extension that makes youtube stream in H.264 instead of VP9. I don't know how this compares to Firefox though, sorry.

Maybe test it with Chrome. Here is the same extension, but for firefox, you should try it as well. Could help, could do nothing.

https://addons.mozilla.org/en-US/firefox/addon/h264ify/

Post to thread

Message boards : Number crunching : Stalled WUs?

//