Advanced search

Message boards : Number crunching : Three Computation Errors in 40 mins

Author Message
Profile Zydor
Send message
Joined: 8 Feb 09
Posts: 252
Credit: 1,309,451
RAC: 0
Level
Ala
Scientific publications
watwatwatwat
Message 9388 - Posted: 6 May 2009 | 19:27:52 UTC
Last modified: 6 May 2009 | 20:00:07 UTC

Just had three WUs go bang in 40 mins, that makes four in the last 24 hrs. Something strange is going on. I am alive to the fact it could be hardware, however its a new(ish) card only 2 months old, and has produced flawless WUs for three weeks or more since the last failure. I was not at the PC this time, I was having dinner, came back and bang three WUs totalled.

There is a CUDA error shown in the results of tonights 3 failures, the same one for two of them, a different CUDA error for the third. Interestingly one of the wingmen who also had hassles with these also had a 9800GTX. These are not single failures, others have totalled them as well, lending weight to a possibility of something within the WU causing it. All a bit strange .....

Grateful someone take a look and see if there is anything obvious from the results file, before I download another and try again. I am wary now of downloading more and crashing them, until I can nail this.

My task page
http://www.gpugrid.net/results.php?userid=15789

I'm going to nip off for ten minutes reboot, have look etc etc - back soon

[Edit] Link was wrong - sorry - its the correct link now. I have tested out the card all seems ok, hardware tests elsewhere seem ok, temps normal etc etc. I'll try and download another one, and see what happens.[/edit]

Regards
Zy

Profile Zydor
Send message
Joined: 8 Feb 09
Posts: 252
Credit: 1,309,451
RAC: 0
Level
Ala
Scientific publications
watwatwatwat
Message 9390 - Posted: 6 May 2009 | 19:53:49 UTC - in response to Message 9388.

Started up this one - see what happens

http://www.gpugrid.net/result.php?resultid=636960

Regards
Zy

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 9526 - Posted: 9 May 2009 | 11:35:19 UTC - in response to Message 9390.
Last modified: 9 May 2009 | 12:24:48 UTC

EDIT: forget what I said.. saw the separate thread on this afterwards.

Overall you have quite a few errors. A strange coincidence: recently my 9800GTX+ also errored out 3 WUs, which it otherwise didn't do. I think you and me should lower our clocks a bit or increase the fan speed and see what we get. I know that here the temperatures increased quite a bit compared to a few weeks ago.

Regarding the possible G92 issue:
- almost all of the tasks have been finished by G200-class cards
- all of them have errored out on G92
- BUT: there were also 2 errors on G200 cards

And it's remarkable that not many G92 or older chips were involved at all, so I suppose the numbers are too small to allow statistically relevant statements (i.e. "all of these tasks fail on G92").

MrS
____________
Scanning for our furry friends since Jan 2002

Post to thread

Message boards : Number crunching : Three Computation Errors in 40 mins

//