Advanced search

Message boards : Number crunching : GPUGrid causes blue screen.

Author Message
Redirect Left
Send message
Joined: 8 Dec 12
Posts: 23
Credit: 181,940,893
RAC: 10
Level
Ile
Scientific publications
watwatwatwatwatwatwatwat
Message 40983 - Posted: 29 Apr 2015 | 18:12:42 UTC
Last modified: 29 Apr 2015 | 18:13:06 UTC

Hello. Today GPUGrid has just started to crash my system. Basically the system loads fine, until BOINC begins and the tasks start, then computer rapidly descends into blue screen city with the error caused by file nvlddmkm.sys

I went into safe mode and disabled BOINC on start up, and ran all other tasks other than the ones on GPUGrid, they all completed fine. Is it possible the current tasks (or one of them) is a little buggy? I've managed to max out my cards (both) on games, and the cards themselves are responding and working fine, when either or both are maxed out, except if its being used by GPUGrid.

I was doing tasks 14147186 & 14146656, which are now aborted (for perhaps obvious reasons) I assume someone in the know can track down those tasks and identify any potential bum ups?

I've been running GPUGrid (and BOINC and lots of other projects) for years now, this is the first instance of this.

Redirect Left
Send message
Joined: 8 Dec 12
Posts: 23
Credit: 181,940,893
RAC: 10
Level
Ile
Scientific publications
watwatwatwatwatwatwatwat
Message 40985 - Posted: 29 Apr 2015 | 20:10:27 UTC - in response to Message 40983.


I was doing tasks 14147186 & 14146656

This is incorrect, those are the two waiting to be done i cancelled, the ones in progress were 14146005 & 14143996

MichaelMac
Send message
Joined: 2 Sep 12
Posts: 16
Credit: 609,890,687
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 40986 - Posted: 29 Apr 2015 | 21:32:42 UTC - in response to Message 40985.

Typically this kind of problem is related to temp or power. These tasks can really push a GPU. When I first started, I had power and temperature problems. It's hard to tell absent any details on your system.

If your temp is fine, and you're feed it sufficient power...then you might completely uninstall and reinstall your driver. I had to do that myself after a recent update.

-MichaelMac

Redirect Left
Send message
Joined: 8 Dec 12
Posts: 23
Credit: 181,940,893
RAC: 10
Level
Ile
Scientific publications
watwatwatwatwatwatwatwat
Message 40987 - Posted: 29 Apr 2015 | 23:27:59 UTC

Well as part of my debugging, i got some games to run on both cards, and max it out at 100%, then i loaded two games and forced them onto the two seperate cards, giving both of them an 80 to 100% average load, nothing bad happened. So I assume power wise and card wise, everything is going fine.

As for the system:

Windows 7
Intel i5-4690 (4x 3.5GHz)
550W PSU
2X GFX Cards, nVidia GeForce GTX650Ti & GT 640 (both 2GB variants)
8GB system RAM

As mentioned before, I've been running GPUGrid on this system for a while with no problems prior. The error occurs at run time, as soon as the BOINC manager loads, so the temperatures never really get above 35'C anyway.

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2343
Credit: 16,206,655,749
RAC: 261,147
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 40988 - Posted: 30 Apr 2015 | 1:11:40 UTC - in response to Message 40987.

Your problem could be caused by insufficient power, so I would check all power connectors (MB+CPU+GPU) and then take one of the GPUs out, and test the system with GPUGrid. I would do a file system check and then I would try these steps.
BTW the GPUGrid app uses different parts of the GPU than a game does, so you couldn't validate your system for the GPUGrid app by testing it with games.
What is the exact type (and manufacturer) of your PSU? What is its efficiency rating (80+, Bronze, Silver, Gold, Platinum)?

Redirect Left
Send message
Joined: 8 Dec 12
Posts: 23
Credit: 181,940,893
RAC: 10
Level
Ile
Scientific publications
watwatwatwatwatwatwatwat
Message 40989 - Posted: 30 Apr 2015 | 1:38:40 UTC - in response to Message 40988.

PSU rating is 80$ efficiency
all stress tests I have done find no problem, psu provided the power needed, but when gpugrid goes it, all goes to pot.

Anyone got any idea what exactly is the problem. I'd love to keep supporting the cause

TJ
Send message
Joined: 26 Jun 09
Posts: 815
Credit: 1,470,385,294
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 40992 - Posted: 30 Apr 2015 | 9:39:06 UTC - in response to Message 40989.

Could be several issues. Can be the driver (I never use the latest driver), can be the WU. I see it are SDOER's where you have the problems. So perhaps that WU and your specific set-up don't play nice together. Or it could be another program on the back that causes this after running good for a long time. I had that once and it took months before I found that out. PCAngel was causing me the blue screens as soon BOINC started running.

You can try to take one card out and let it run with only one card. Let a WU finish. Then test the other card.
If it are SDOER's, wait to get another WU and see what happens.
Does it run fine with one card put the second one back in and test again.
No joy then revert to another stable driver i.e. 347.88

If all no luck then post back here and we can have a look again.
Hope this helps a bit.
____________
Greetings from TJ

MichaelMac
Send message
Joined: 2 Sep 12
Posts: 16
Credit: 609,890,687
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 40994 - Posted: 30 Apr 2015 | 11:44:22 UTC

I notice that you're running version 350.12 of the NVidia driver. I just finished debugging really strange errors using that version of the driver. I couldn't finish a wu, but the errors (when I got them) were unknown. Granted, I was running on Win XP, but it may be related to your problem.

Get DDU at http://www.guru3d.com/files-details/display-driver-uninstaller-download.html, this is a "complete" uninstaller.

I uninstalled the driver using Windows.

I rebooted in Safe mode.

I ran DDU and did a uninstall.

I then installed version 344.75 of the driver.

Everything worked great again after that.

-MichaelMac



Jacob Klein
Send message
Joined: 11 Oct 08
Posts: 1127
Credit: 1,901,927,545
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 41015 - Posted: 3 May 2015 | 3:45:38 UTC
Last modified: 3 May 2015 | 3:49:22 UTC

.. hmm... Did you have a power outage recently?
Because there is a bug where ... if GPUGrid tasks are stopped abruptly from a power outage, then when they restart, they TDR infinitely until they BSOD.

Maybe you could try restarting with GPU computing suspended, then aborting problematic tasks, then seeing if new ones produce the same problem or not?

Or, if you already did this, can you tell us if the problems happened just on those 2 tasks?

Redirect Left
Send message
Joined: 8 Dec 12
Posts: 23
Credit: 181,940,893
RAC: 10
Level
Ile
Scientific publications
watwatwatwatwatwatwatwat
Message 41037 - Posted: 5 May 2015 | 20:07:14 UTC - in response to Message 41015.
Last modified: 5 May 2015 | 20:07:45 UTC


Because there is a bug where ... if GPUGrid tasks are stopped abruptly from a power outage, then when they restart, they TDR infinitely until they BSOD.

Hm. That might be it then! My dog knocked the PCs power cable out before this occured. I thought nothing of it at the time.

Perhaps that is the cause? It's all working as normal now, on all 3 of my machines.

If that is the cause, then nasty little glitch there. Hope it can be resolved.

Wdethomas
Send message
Joined: 6 Feb 10
Posts: 38
Credit: 274,204,838
RAC: 0
Level
Asn
Scientific publications
watwatwatwatwatwat
Message 41072 - Posted: 13 May 2015 | 1:43:39 UTC - in response to Message 41037.

I had the same problem. Was running on Windows 7. I installed Windows 8.1 Pro and running so far ok. I think windows needed a refresh. So far so good.

Wdethomas
Send message
Joined: 6 Feb 10
Posts: 38
Credit: 274,204,838
RAC: 0
Level
Asn
Scientific publications
watwatwatwatwatwat
Message 41073 - Posted: 13 May 2015 | 13:51:38 UTC - in response to Message 41072.
Last modified: 13 May 2015 | 14:02:44 UTC

Nope. Problem is back, but without blue screen. Just freezes and UPS starts beeping because of overload. Connected directly to the wall and same problem, it freezes. Always when the work units are reaching the 80% or above mark. Only happens with GPUGRID work units running.

Using driver 350.12

Any ideas?

TJ
Send message
Joined: 26 Jun 09
Posts: 815
Credit: 1,470,385,294
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 41074 - Posted: 13 May 2015 | 14:50:36 UTC - in response to Message 41073.

Temperature of the room and temperature of the GPU(s)?
____________
Greetings from TJ

MichaelMac
Send message
Joined: 2 Sep 12
Posts: 16
Credit: 609,890,687
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 41075 - Posted: 13 May 2015 | 16:45:46 UTC

yeah...I had similar problems with driver 350.12. I went back a revision.

I describe in my post, above, what I did.

-MichaelMac

Wdethomas
Send message
Joined: 6 Feb 10
Posts: 38
Credit: 274,204,838
RAC: 0
Level
Asn
Scientific publications
watwatwatwatwatwat
Message 41077 - Posted: 13 May 2015 | 18:43:03 UTC - in response to Message 41075.

Okay. I am doing that now. Thanks

don't think heat is an issue. Using GPUZ and all okay. Remember,it was doing it at 80% and above on the work unit progress.

Will check and report back.

Jacob Klein
Send message
Joined: 11 Oct 08
Posts: 1127
Credit: 1,901,927,545
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 41078 - Posted: 13 May 2015 | 19:22:20 UTC

I believe the problems the original poster, "Redirect Left", had, were caused by a GPUGrid's inability to properly start tasks after a power interruption.

I believe that other problems that are unrelated to a power interruption, should probably be in their own thread, maybe.

TJ
Send message
Joined: 26 Jun 09
Posts: 815
Credit: 1,470,385,294
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 41079 - Posted: 13 May 2015 | 23:53:44 UTC - in response to Message 41078.

I believe the problems the original poster, "Redirect Left", had, were caused by a GPUGrid's inability to properly start tasks after a power interruption.

I believe that other problems that are unrelated to a power interruption, should probably be in their own thread, maybe.


If I read the first post then I do not read anything about a power interruption.
____________
Greetings from TJ

Wdethomas
Send message
Joined: 6 Feb 10
Posts: 38
Credit: 274,204,838
RAC: 0
Level
Asn
Scientific publications
watwatwatwatwatwat
Message 41080 - Posted: 14 May 2015 | 2:25:18 UTC - in response to Message 41079.

I did have some power interruptions or spikes and the computer does shut down even with the UPS. But I don't think it's power related to the house because all electric clocks are still working okay, not blinking. It's like if the computer spiked and then the UPS beeps as if something had happened and then you get the blue screen. That was with windows 7.

With Windows 8.1 Pro, I heard the UPS beeping and it had an overload emblem on it and the screen had freezed. I rebooted and started up Boinc again and as soon as the work units started, the freeze and UPS beeping again. So I uninstalled Boinc and changed the driver to the one I mentioned. I am waiting to see when the work unit reaches 80% or more if it does it again.

I am using a MSI X87 gamer MB and six gpus with X1 USB powered risers. wasn't having any trouble until yesterday. If I use milkyway@home or seti@home, this does not happen ever, only gpugrid.

Will let you know later on what happens.

Thanks for the responses!!

Redirect Left
Send message
Joined: 8 Dec 12
Posts: 23
Credit: 181,940,893
RAC: 10
Level
Ile
Scientific publications
watwatwatwatwatwatwatwat
Message 41081 - Posted: 14 May 2015 | 4:16:54 UTC - in response to Message 41078.

I believe the problems the original poster, "Redirect Left", had, were caused by a GPUGrid's inability to properly start tasks after a power interruption

This is correct. Reinstalling or downgrading drivers didn't resolve the issue. The only resolution I found was to start in safe mode and disable the bad tasks, as starting normally BOINC + tasks loaded before i had the chance to terminate them and prevent the bluescreen.

I'd suggest the problems I have seen after my reply are unrelated.

Jacob Klein
Send message
Joined: 11 Oct 08
Posts: 1127
Credit: 1,901,927,545
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 41082 - Posted: 14 May 2015 | 4:56:59 UTC - in response to Message 41079.

I believe the problems the original poster, "Redirect Left", had, were caused by a GPUGrid's inability to properly start tasks after a power interruption.

I believe that other problems that are unrelated to a power interruption, should probably be in their own thread, maybe.


If I read the first post then I do not read anything about a power interruption.


You'll have to read more than the first post, I'm afraid.
https://www.gpugrid.net/forum_thread.php?id=4082&nowrap=true#41037

Jacob Klein
Send message
Joined: 11 Oct 08
Posts: 1127
Credit: 1,901,927,545
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 41083 - Posted: 14 May 2015 | 4:57:41 UTC
Last modified: 14 May 2015 | 5:09:43 UTC

MJH:
Any chance you guys could actually investigate and fix this problem?

First reported 2 months ago.
https://www.gpugrid.net/forum_thread.php?id=4038&nowrap=true#40492

TJ
Send message
Joined: 26 Jun 09
Posts: 815
Credit: 1,470,385,294
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 41084 - Posted: 14 May 2015 | 8:35:47 UTC - in response to Message 41082.

I believe the problems the original poster, "Redirect Left", had, were caused by a GPUGrid's inability to properly start tasks after a power interruption.

I believe that other problems that are unrelated to a power interruption, should probably be in their own thread, maybe.


If I read the first post then I do not read anything about a power interruption.


You'll have to read more than the first post, I'm afraid.
https://www.gpugrid.net/forum_thread.php?id=4082&nowrap=true#41037

You are right Jacob, my bad.
____________
Greetings from TJ

Wdethomas
Send message
Joined: 6 Feb 10
Posts: 38
Credit: 274,204,838
RAC: 0
Level
Asn
Scientific publications
watwatwatwatwatwat
Message 41085 - Posted: 14 May 2015 | 12:14:25 UTC - in response to Message 41084.

Okay, got several over 80%. One completed at 100%. Could really be the power spikes on the line and the UPS not reacting to them the way it should. I'll see and report.

I know for a fact that before having the UPS, if the lights went out and you were using Boinc with GUGRID you would lose all the work units. After the computer did a reboot, the screen would flicker, get a video driver recover message and all work units would error out. That started MONTHS ago and I also reported that. Nothing has been done to fix that or maybe it just can't be fixed. "Leave applications in memory" does not help to resolve this.

Thanks

Jim1348
Send message
Joined: 28 Jul 12
Posts: 819
Credit: 1,591,285,971
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 41086 - Posted: 14 May 2015 | 13:58:41 UTC - in response to Message 41085.
Last modified: 14 May 2015 | 13:59:55 UTC

Okay, got several over 80%. One completed at 100%. Could really be the power spikes on the line and the UPS not reacting to them the way it should. I'll see and report.

Some UPS do not handle the high-efficiency (e.g., 90+ Gold or Platinum) power supplies well. That is because of the Power Factor Correction (PFC) that these supplies use. Most UPS output a stepped-sinewave, which is only an approximation to a true sinewave. That can cause problems with these power supplies, and as a result you have to de-rate the maximum power of the power supplies by about 2/3. So if it rated at 450 watts, you should not try to get more than about 300 watts from it, or you run into trouble.

FWIW, I now use the CyberPower "Pure sine wave" UPS, which avoids the problem.

Wdethomas
Send message
Joined: 6 Feb 10
Posts: 38
Credit: 274,204,838
RAC: 0
Level
Asn
Scientific publications
watwatwatwatwatwat
Message 41087 - Posted: 14 May 2015 | 17:16:30 UTC - in response to Message 41086.

Okay. I have the Xfinity 1400VA. It uses modified sine wave. Got a Corsair AX1200i and a 750W power supply on it. That's one of the problems.

Thanks

Post to thread

Message boards : Number crunching : GPUGrid causes blue screen.

//