Message boards : Number crunching : strange behaviour...
Author | Message |
---|---|
Hi there, | |
ID: 33072 | Rating: 0 | rate: / Reply Quote | |
Did you reboot the machine? Power off, remove the power cord, wait 10+ mins and power back on? Driver reinstall, maybe just straight the new 326.80? Is BOINC actually saying "running" in the manager? | |
ID: 33076 | Rating: 0 | rate: / Reply Quote | |
Hi, | |
ID: 33080 | Rating: 0 | rate: / Reply Quote | |
Hmmm...the 326.80 isn't stable but beta. Since this is not a boinc-only machine, I'd prefer to stay with the stable drivers. Hi Rene, just for info I have 8 machines running here on 326.80 with no noticeable problems. In fact they all have both NVidia and AMD GPUs installed. | |
ID: 33081 | Rating: 0 | rate: / Reply Quote | |
Hi, | |
ID: 33085 | Rating: 0 | rate: / Reply Quote | |
Non, not even with the new drivers does it work. The application still does nothing... I cancelled both WUs. | |
ID: 33086 | Rating: 0 | rate: / Reply Quote | |
pnitrox122-NOELIA_INS1P-1-12-RND5810_0 | |
ID: 33089 | Rating: 0 | rate: / Reply Quote | |
This run keeps increasing its remaining time with no end in sight. | |
ID: 33091 | Rating: 0 | rate: / Reply Quote | |
Was that on the GTX 650 Ti BOOST? I think you also have a GTX 660 as I recall. I want to give mine a try again on the just-released 327.23 drivers, but the 660s seem to have been somewhat problematic recently. | |
ID: 33094 | Rating: 0 | rate: / Reply Quote | |
@skgiven | |
ID: 33095 | Rating: 0 | rate: / Reply Quote | |
The WU which did run for some hours has lot's of "# BOINC suspending at user request (thread suspend)" lines in the log. If it's a new installation: did you already check "Nutze die GPU wenn der Computer benutzt wird" in the local BOINC settings, CPU tab? And "Wenn CPU-Auslastung geringer als x%" with x set to 0? | |
ID: 33097 | Rating: 0 | rate: / Reply Quote | |
Sure, see screenshot. Those message lines are more than interesting, but I can't explain what causes them. | |
ID: 33098 | Rating: 0 | rate: / Reply Quote | |
The thing is that all other GPU tasks (SETI, Einstein, PrimeGrid, POEM) are running fine on this machine. | |
ID: 33099 | Rating: 0 | rate: / Reply Quote | |
I got a similar problem for months .. I already try all the tutorials on this forum and the clean reinstal win 8 64 bit .. and even observe the problem on my hardware manufacturer's website .. | |
ID: 33100 | Rating: 0 | rate: / Reply Quote | |
when I installed nvidia drivers, nvidia driver fell again in a few second intervals, pop up notification of a collapse of the controls is flashed ... it's crazy.. after the next reboot while it works well but one task will count 9 -10 hours .. so again is really something wrong.. | |
ID: 33102 | Rating: 0 | rate: / Reply Quote | |
Just so.. | |
ID: 33103 | Rating: 0 | rate: / Reply Quote | |
Ok, did a couple of debug sessions and took a look into the app_control code. | |
ID: 33104 | Rating: 0 | rate: / Reply Quote | |
Ok, this was a quick solution ;) I think the CPU throttling in BOINC 7.0.64 is non-optimal. When you take a look at my screenshot of the BOINC options, you'll notice that I only allow to use 75% of my CPUs. I'm not running any CPU-only WUs on this machine, so there is always just 1 active WU, since I've only got one GPU. | |
ID: 33105 | Rating: 0 | rate: / Reply Quote | |
@Josef | |
ID: 33106 | Rating: 0 | rate: / Reply Quote | |
Ok, this was a quick solution ;) I think the CPU throttling in BOINC 7.0.64 is non-optimal. Correct. That was a brief (and fortunately now abandoned) aberration in BOINC. Later developmental versions (and BOINC v7.2 when it's released "real soon now") will go back to the old behaviour - CPU throttling not applied to GPU apps. I've written up details of the exact versions affected on some project's message board - I'll try and work out which project it was, and copy them back here later. | |
ID: 33107 | Rating: 0 | rate: / Reply Quote | |
Richard, | |
ID: 33109 | Rating: 0 | rate: / Reply Quote | |
Richard, Yes, in the client. It's meant for thermal control of CPUs, and it dates back to the early days of BOINC. If you look at the Computing preferences on your account here, the bottom item under Processor usage is: Use at most The implementation is crude: they wanted it to use the same source code on every platform, and there isn't a fine control like that. So it operates on a granularity of 1 second, so capeITLabs' 75% would have been 3 seconds on and 1 second off. That, of course, means three eternities on and one eternity off at the speeds GPUs operate. David Anderson made a gut reaction to a single user's request on the mailing list back in January: http://lists.ssl.berkeley.edu/pipermail/boinc_dev/2013-January/019305.html - I'm sure you can think of such a reason. That emerged in version 7.0.45 It was removed with v7.2.1. You might like to look at the note: client: don't apply CPU throttling to apps that use < .5 CPUs (like GPU, NCI). and http://boinc.berkeley.edu/trac/changeset/4cb34a123aacfaccc28b5f1f76717864b0b63a57/boinc-v2 with respect to the requested CPU reservation for Keplers and above (and make the same suggestion to any OpenCL developers you know). Links to the earlier changesets are contained in my email at http://lists.ssl.berkeley.edu/pipermail/boinc_dev/2013-July/020131.html Any casual reader here who wishes to apply thermal control to their CPU or GPU under Windows (only) would be better advised to consider TThrottle | |
ID: 33111 | Rating: 0 | rate: / Reply Quote | |
Thanks Richard, | |
ID: 33112 | Rating: 0 | rate: / Reply Quote | |
660 ... have aborted run ... just installed latest driver. | |
ID: 33117 | Rating: 0 | rate: / Reply Quote | |
660 ... have aborted run ... just installed latest driver. I have updated the drivers on my two GTX 660s to 327.23 and completed my first Noelia with no problems (4-NOELIA_INS1P-9-15-RND4205_0 14:09:09). Each card is running another Noelia with no problems thus far, so I will let them run and see what happens. | |
ID: 33120 | Rating: 0 | rate: / Reply Quote | |
Thanks Richard, I don't know whether this is concidence, or whether you've been in communication behind the scenes, but David Anderson has just started work on a better throttling solution. "client: preliminary implementation (commented out) of sub-second throttling" http://boinc.berkeley.edu/trac/changeset/ebde7809ceaca8cc35d75c2a2b5adc32c19694e5/boinc-v2 http://boinc.berkeley.edu/trac/changeset/35f489d36f4c7734d13f76af5844ec42d244be59/boinc-v2 | |
ID: 33122 | Rating: 0 | rate: / Reply Quote | |
I'm against coarse-grained throttling for thermal control as it's inefficient for any hardware using adaptive power states (like boosting nVidias and Intels + AMDs with Turbo). The reason: during activity the hardware boosts into the maximum power state supported, which implies a high voltage and lower power efficiency, whereas during idle periods it obviously does nothing. | |
ID: 33124 | Rating: 0 | rate: / Reply Quote | |
I91R9-NATHAN_KIDc22_glu-7-10-RND1126_1 | |
ID: 33163 | Rating: 0 | rate: / Reply Quote | |
Hi, GPUGrid Folks: | |
ID: 33165 | Rating: 0 | rate: / Reply Quote | |
Paul and John, | |
ID: 33166 | Rating: 0 | rate: / Reply Quote | |
If the GPU is cooler than normal, I suggest shutting the system down, turning the PSU off for a minute and then turning it back on and starting the system up again - doing this allowed me to finish a WU that had run for 5days (but had really stopped after about 6h). That fixed it for me with I18R10-NATHAN_KIDc22_glu-8-10-RND4986_1, which was taking 30 hours to complete on a GTX 660 (327.23 drivers). It had previously completed three others in the NATHAN_KIDc22 series with no problems in about 12 hours. That is unfortunately not a practical solution for me, since I lost 10 hours of CEP2 work running on the CPU. It seems to be more of a problem with the mid-range cards (GTX 660, 660 Ti). Are the 700 series cards immune? | |
ID: 33167 | Rating: 0 | rate: / Reply Quote | |
Many thanks, skgiven. Problem fixed. I had hoped to run these tasks in a 'set and forget' mode, but that may not be possible. Being unable to sleep last night, I took a peek at my machine at around 05:00h to see if all is well and that's when I discovered the long run. | |
ID: 33168 | Rating: 0 | rate: / Reply Quote | |
It seems to be more of a problem with the mid-range cards (GTX 660, 660 Ti). Are the 700 series cards immune? Depends on whether the same thing that is causing your system to just stop processing is the same thing that causes 780s/Titans to have constant "Access violations" and app restarts. Could be the same thing causing different symptoms using different GPUs. I think it's all down to 8.14 myself. Operator. ____________ | |
ID: 33171 | Rating: 0 | rate: / Reply Quote | |
Depends on whether the same thing that is causing your system to just stop processing is the same thing that causes 780s/Titans to have constant "Access violations" and app restarts. The Memory Controller Load apparently runs at a constant 14% rate when it is running slowly, so I doubt that it is the start/stop condition. (It should run about 30% normally on these work units.) I know they had a similar problem with the older apps (before the 8 series), particularly with the GTX 660s, and thought it might have been solved. Otherwise, the 8.14 app works very nicely that I can see, except for one Noelia that errored out, but no crashes or other bad behavior. I hope they can get the last wrinkles ironed out for the mid-range cards, and also for the 700 cards or else there is not much incentive to upgrade to those. | |
ID: 33172 | Rating: 0 | rate: / Reply Quote | |
Shut the machine down while I went to work, 12 hrs later turned it back on. | |
ID: 33173 | Rating: 0 | rate: / Reply Quote | |
This is the second run I have aborted. My GPUGRID credits are decreasing because I am running programs that don't work and I have to abort. | |
ID: 33174 | Rating: 0 | rate: / Reply Quote | |
All this started happening just recently ... | |
ID: 33175 | Rating: 0 | rate: / Reply Quote | |
Depends on whether the same thing that is causing your system to just stop processing is the same thing that causes 780s/Titans to have constant "Access violations" and app restarts. Have you checked if the GPU clock runs still at full load (to load you want or have set to)? I have had a lot of troubles with my 660's, even bought a new motherboard. They do fine now with the beta's and long runs and 8.14. Short runs give (still) the most problems. My 770 from Asus is almost error free with all types of WU, and more over most WU's don't even stop en route, they run in one go. We can now see that with the new stderr Matt has made. So in my new builds only 770, 780 and Titan. ____________ Greetings from TJ | |
ID: 33176 | Rating: 0 | rate: / Reply Quote | |
If nothing will fix this, I will delete GPUGRID and run another BOINC program. | |
ID: 33178 | Rating: 0 | rate: / Reply Quote | |
Have you checked if the GPU clock runs still at full load (to load you want or have set to)? I have had a lot of troubles with my 660's, even bought a new motherboard. They do fine now with the beta's and long runs and 8.14. Short runs give (still) the most problems. Yes, the GPU clock shows running a full speed on GPU-Z. It is normally 993 MHz as set by the card, but I had reduced it to 980 MHz (hardly a difference) and also bumped up the core voltage slightly (by 12.5 mv) with Nvidia Inspector. But there was no obvious down-clocking, as was a problem for some Nvidia cards a few years ago. But maybe not all the relevant clocks are shown by GPU-Z? It is nothing I can fix at any rate, and I have seen no reports of such problems for these current drivers. It is on a Z77 motherboard with an Ivy Bridge i7-3770, with each GPU supported by a vitual CPU core, so that should not be a limitation. And the fact that a reboot fixes it would indicate that it is a software, not a hardware problem (to me at any rate). There was some speculation earlier on various reasons that some cards were affected and others weren't, such as cache size, memory bandwidth, etc., but I don't think any definitive answer has been found. It is apparently something only GPUGrid can fix. | |
ID: 33179 | Rating: 0 | rate: / Reply Quote | |
Jim, I think that is a fair assessment. | |
ID: 33180 | Rating: 0 | rate: / Reply Quote | |
If nothing will fix this, I will delete GPUGRID and run another BOINC program. Paul, Looking at your tasks' stderr.out file, they are full of: # BOINC suspending at user request (thread suspend) # BOINC resuming at user request (thread suspend) which means that your BOINC manager keeps on suspending and resuming the GPUGrid application. This could be the result of improper settings of the BOINC manager and/or Windows. For example: 1. The CPU project you are running uses too much CPU time resolution: you should limit the CPU usage of those projects to give the GPU projects a single core per GPU by the "on multiprocessor systems use at most" 50% of the processor cores (as you have a dual core CPU, so 1 CPU core is 50% on your system) in the Boinc Manager (Advanced View) / Tools / Computing preferences / processor usage tab 2. The BOINC manager "throttles" its applications This setting is used to limit the heat generated by the CPU, but it throttles the GPU applications also by mistake. resolution: go to Boinc Manager (Advanced View) / Tools / Computing preferences / processor usage tab / use at most 100% cpu time 3. The BOINC manager is not using the GPU while you are using your computer resoluntion: go to Boinc Manager (Advanced View) / Tools / Computing preferences / processor usage tab / check the "While the computer is in use" and the "Use GPU while the computer is in use" checkboxes If you play games which needs the GPU, you should put them on the list in the "Exclusive Applications" tab 4. Windows power management limits the time while your computer is "awake" resolution: go to Start / Control Panel (large icon view) / Power options / change current scheme settings / Put the computer to sleep: Never | |
ID: 33182 | Rating: 0 | rate: / Reply Quote | |
On my 660 two times today the GPU clock was down clocked. The WU's where beta ans Santi. My 660 has the most problems with Santi and almost none with Noelia's. | |
ID: 33187 | Rating: 0 | rate: / Reply Quote | |
I have deleted gpugrid from my computer. | |
ID: 33191 | Rating: 0 | rate: / Reply Quote | |
Will try 50% processor and 100% CPU. | |
ID: 33213 | Rating: 0 | rate: / Reply Quote | |
Had my first GPU completion in quite awhile after the CPU & processor usage change ... Thanks ! | |
ID: 33226 | Rating: 0 | rate: / Reply Quote | |
Sooo whats the official solution for this, because this problem seems to still remain on short queue here on a 560ti 448 core edition. Lucky i can switch to long runs only with this card, but want to set it for both queues any later again for dont lose any publicationbadges ;) restart the machine after every short unit on an unattended machine is no solution ;) | |
ID: 33638 | Rating: 0 | rate: / Reply Quote | |
What drivers are you using? It looks like your computer is using 301.42 from May 2012. | |
ID: 33639 | Rating: 0 | rate: / Reply Quote | |
Hmm. ok i was only wondering because TJ has the same problem and updated the drivers but still fail. | |
ID: 33640 | Rating: 0 | rate: / Reply Quote | |
Hmm. ok i was only wondering because TJ has the same problem and updated the drivers but still fail. Hello dskagcommunity, I need to tell you that I didn't run any short WU in the last 30 days, I don't dare to. Later this week when I am at the computer with the 660 I will run a few short ones and let you know how they do. ____________ Greetings from TJ | |
ID: 33657 | Rating: 0 | rate: / Reply Quote | |
oh i read it like that because you answered me O.o ok then, i will upgrade the driver ^^ | |
ID: 33667 | Rating: 0 | rate: / Reply Quote | |
Hello dskagcommunity, as promised I would let you know about the SR's. | |
ID: 33681 | Rating: 0 | rate: / Reply Quote | |
few weeks I have a problem. When im start a project manager Boinc gpu grid starts to work,then run the Chrome browser,or movie player program..or gpu-Z,youtube a whole computer slows down like in slow motion, if I in this manage to hit the slow mode mouse and clicking on the boinc manager and Pause only gpu grid, everything starts working normally-off in this | |
ID: 33773 | Rating: 0 | rate: / Reply Quote | |
few weeks I have a problem. When im start a project manager Boinc gpu grid starts to work,then run the Chrome browser,or movie player program..or gpu-Z,youtube a whole computer slows down like in slow motion, if I in this manage to hit the slow mode mouse and clicking on the boinc manager and Pause only gpu grid, everything starts working normally-off in this You are running Noelia Ins1p units on you 275 card which doesn't have enough memory this will cause what you describe. ____________ Radio Caroline, the world's most famous offshore pirate radio station. Great music since April 1964. Support Radio Caroline Team - Radio Caroline | |
ID: 33775 | Rating: 0 | rate: / Reply Quote | |
Message boards : Number crunching : strange behaviour...