Message boards : Number crunching : Lots of errors
Author | Message |
---|---|
I'm starting to get a high number or short tasks that error out. Can someone explain why this is happening and how I can fix it? Have changed no settings. | |
ID: 41152 | Rating: 0 | rate: / Reply Quote | |
I am getting errors too, but mine are with GERARD_EQUI WU's. Three had errors, two finished ok. It seems to be bad batch. | |
ID: 41154 | Rating: 0 | rate: / Reply Quote | |
I'm getting some nasty errors too, with the GERARD_EQUI_26Apr_CXCL tasks. They're causing major TDRs, which in turn then make the computer have hardware acceleration problems in other tasks (like web browsing, or gaming), and also cause driver problems where the clocks never go back to 3d-mode clocks. | |
ID: 41155 | Rating: 0 | rate: / Reply Quote | |
All my tasks are now erroring out. Suspending this project for now until this issue is resolved. | |
ID: 41156 | Rating: 0 | rate: / Reply Quote | |
I've actually been having issues with the Graphics drivers themselves crashing and windows having to recover. | |
ID: 41157 | Rating: 0 | rate: / Reply Quote | |
Same here. Now have five GERARD_EQUI_26Apr_CXCL tasks crashed. | |
ID: 41158 | Rating: 0 | rate: / Reply Quote | |
Could you please post your errors in this thread? I will cancel the batch if they persist. Thanks for your patience... | |
ID: 41160 | Rating: 0 | rate: / Reply Quote | |
https://www.gpugrid.net/result.php?resultid=14210324 | |
ID: 41161 | Rating: 0 | rate: / Reply Quote | |
Could you please post your errors in this thread? I will cancel the batch if they persist. Thanks for your patience... https://www.gpugrid.net/result.php?resultid=14210504 | |
ID: 41162 | Rating: 0 | rate: / Reply Quote | |
We detected an unexpected parameterization error in some of the simulations and we just cancelled them. Sorry for any inconvience caused and thank your for reporting it to us! If you find any other errors please do not hesitate to tell us (hopefully this particular issue is already resolved). | |
ID: 41164 | Rating: 0 | rate: / Reply Quote | |
Excellent. Thank you!! | |
ID: 41166 | Rating: 0 | rate: / Reply Quote | |
All short run tasks still failing here. Links to last 4 | |
ID: 41168 | Rating: 0 | rate: / Reply Quote | |
nanoprobe: | |
ID: 41169 | Rating: 0 | rate: / Reply Quote | |
Excellent. Thank you!! +1 :) ____________ [CSF] Thomas H.V. Dupont Founder of the team CRUNCHERS SANS FRONTIERES 2.0 www.crunchersansfrontieres | |
ID: 41170 | Rating: 0 | rate: / Reply Quote | |
nanoprobe: Cards are PNY 750Ti. No factory O/C. No six pin PCI-E power plugs. 60Watt load @99%. They've been running stock out of the box since I bought them and I've been running the short tasks on these cards since I got them and have never had the failure rate I've been experiencing lately. If it was one card producing all/most of the errors then I would suspect the card but the tasks are failing on both cards. | |
ID: 41171 | Rating: 0 | rate: / Reply Quote | |
1232906x8-GERARD_EQUI_26Apr_CXCL-0-1-RND1418_4 causes a lot of crash of gpu drivers. Stopped! | |
ID: 41172 | Rating: 0 | rate: / Reply Quote | |
nanoprobe: | |
ID: 41173 | Rating: 0 | rate: / Reply Quote | |
nanoprobe: WOW! Let me offer you some advise. If it doesn't concern life, death or health then is surely isn't worth getting frustrated over. FWIW the issue seems to have cleared up. The faulty WUs have been taken care of. Thanks for your help. | |
ID: 41190 | Rating: 0 | rate: / Reply Quote | |
I have had over 20 WUs fail...on my GTX 660 Ti devices. I will stop gettings tasks and now go to bed.... | |
ID: 41191 | Rating: 0 | rate: / Reply Quote | |
nanoprobe: There were some faulty WUs, but they have nothing to do with tasks erroring out with "Simulation has become unstable." messages and no other error messages. Errors like yours are usuall a result of overclocking too much. Please keep my advice (lower clocks to reference clocks) in mind, the next time you try to troubleshoot those errors. Good luck, Jacob | |
ID: 41192 | Rating: 0 | rate: / Reply Quote | |
There were some faulty WUs, but they have nothing to do with tasks erroring out with "Simulation has become unstable." messages and no other error messages. There is no way you could know this for sure. Errors like yours are usuall a result of overclocking too much. As I stated before these cards are not overclocked. The problem came and left without me changing anything on my set up. Therefore my conclusion is that the WUs were the problem. Moving on. | |
ID: 41194 | Rating: 0 | rate: / Reply Quote | |
The problem is still ongoing, for you. | |
ID: 41197 | Rating: 0 | rate: / Reply Quote | |
Jacob: | |
ID: 41198 | Rating: 0 | rate: / Reply Quote | |
I have definitely had certain GPU things, such as games and GPUGrid tasks, crash or error ("Simulation has become unstable"), as a direct result of a factory-overclock that was too aggressive. If the GPU is overclocked at all, and you are trying to resolve any GPU problem, you should see if lowering the clocks resolves the problem. | |
ID: 41200 | Rating: 0 | rate: / Reply Quote | |
Nanoprobe, It may be the case that these Noelia WU's tax the card more than other WU's and it does appear (from what I've seen) to effect the smaller/older cards more; the same WU's that fail on your GTX750Ti complete on other systems but some also fail on the older/smaller cards. | |
ID: 41201 | Rating: 0 | rate: / Reply Quote | |
Jacob: Sometimes this is the only (and the fastest) way to fix a malfunctioning system. I had some GPUGrid app crashes in the past on one of my dual GPU systems which caused the other GPU to fail tasks too. In my opinion it's a good practice to restart (by a scheduled task) a Windows based system once a week - regardless if it's running error free - to maintain its stability (especially when running GPU and CPU tasks simultaneously). Let´s put it that way: If GPU -stuff is not running properly on a majority of systems and several users have the same experience and we ( the users ) do that for free - we shut down a over all properly running system for that ? This is more like a rhetorical question, but - as you probably know - there's no warranty for any software (free or commercial) to work on every existing hardware. Besides, your question takes a set of other softwares as a reference which qualifies a system properly running, but from the "no warranty" thing comes that there's no such set of softwares exist. To put it in another way: I wouldn't call a system properly running, if GPUGrid tasks produce "The simulation has become unstable. Terminating to avoid lock-up" messages on that particular system only while these tasks run fine on the next host they were assigned to. I really don´t think so. If someone ask for help, it comes from that they can't figure out the reason of the error, so it might be useful to try things which don't make sense at first sight. I have a GTX780Ti on which I had to reduce the GDDR5 clock to 2900MHz (from 3500MHz) to make it work with GPUGrid (it was brand new). GPUs (and other components) are aging so they might not perform as good as before, different tasks tax the GPU differently. You can't step in the same river twice. If GPU will not work properly, I simply switch to something else - you know, MY PC, MY time, MY decission. If all else fails, or you've tired of trying different workarounds you can do it. Still, fixing the errors on a given system is not the project's responsibility. | |
ID: 41202 | Rating: 0 | rate: / Reply Quote | |
The problem is still ongoing, for you. I think 1 error out of 20 tasks is about the same I was experiencing before the problem WUs arrived. And just for the record the task you linked to completed and validated. The last failed one was more that 2 days ago. https://www.gpugrid.net/result.php?resultid=14213349 | |
ID: 41203 | Rating: 0 | rate: / Reply Quote | |
Please keep my advice (use lower clocks) in mind, the next time you try to troubleshoot any GPU errors. | |
ID: 41204 | Rating: 0 | rate: / Reply Quote | |
Please keep my advice (use lower clocks) in mind, the next time you try to troubleshoot any GPU errors. Jacob, please keep in mind this point : your tips are always appreciated and fortunately that we have you ! :) I will also make adjustments on my GTX 760 via Precision X because I also get errors with LONG RUNS (Gerard). I will publish the settings and the results in this thread. Of course, I also think to Retvari Zoltan* and skgiven whose advices are also very valuable :) Thanks guys! ____________ [CSF] Thomas H.V. Dupont Founder of the team CRUNCHERS SANS FRONTIERES 2.0 www.crunchersansfrontieres | |
ID: 41211 | Rating: 0 | rate: / Reply Quote | |
Message boards : Number crunching : Lots of errors