strange behaviour...

Message boards : Number crunching : strange behaviour...

Author	Message
capeITLabs Send message Joined: 17 Nov 12 Posts: 30 Credit: 111,887,025 RAC: 0 Level Scientific publications	Message 33072 - Posted: 18 Sep 2013 \| 19:36:05 UTC
	Hi there, one of my boinc machines is a Win7 Pro 64Bit with an ASUS GTX570 card. The NVidia driver is the latest 320.49. This machine shows a strange behaviour: each of the WUs (http://www.gpugrid.net/results.php?hostid=158339) will be started without any failure, seems to run for hours, but nothing happens...no CPU usage, no GPU usage, no progress... What's wrong here ? best regards, Rene
	ID: 33072 \| Rating: 0 \| rate: / Reply Quote

ExtraTerrestrial Apes Volunteer moderator Volunteer tester Send message Joined: 17 Aug 08 Posts: 2705 Credit: 1,311,122,549 RAC: 0 Level Scientific publications	Message 33076 - Posted: 18 Sep 2013 \| 20:10:39 UTC - in response to Message 33072.
	Did you reboot the machine? Power off, remove the power cord, wait 10+ mins and power back on? Driver reinstall, maybe just straight the new 326.80? Is BOINC actually saying "running" in the manager? MrS ____________ Scanning for our furry friends since Jan 2002
	ID: 33076 \| Rating: 0 \| rate: / Reply Quote

capeITLabs Send message Joined: 17 Nov 12 Posts: 30 Credit: 111,887,025 RAC: 0 Level Scientific publications	Message 33080 - Posted: 18 Sep 2013 \| 20:38:37 UTC - in response to Message 33076.
	Hi, yes I did. The BOINC manager says that it is running and the messages file shows it too. I've got another machine for GPUGRID with the same OS and drivers, but with a GTX480 and a GTX560Ti. This machine doesn't show any unusual behaviour. Hmmm...the 326.80 isn't stable but beta. Since this is not a boinc-only machine, I'd prefer to stay with the stable drivers. Rene
	ID: 33080 \| Rating: 0 \| rate: / Reply Quote

Beyond Send message Joined: 23 Nov 08 Posts: 1112 Credit: 6,162,416,256 RAC: 0 Level Scientific publications	Message 33081 - Posted: 18 Sep 2013 \| 21:12:52 UTC - in response to Message 33080.
	Hmmm...the 326.80 isn't stable but beta. Since this is not a boinc-only machine, I'd prefer to stay with the stable drivers. Hi Rene, just for info I have 8 machines running here on 326.80 with no noticeable problems. In fact they all have both NVidia and AMD GPUs installed.
	ID: 33081 \| Rating: 0 \| rate: / Reply Quote

capeITLabs Send message Joined: 17 Nov 12 Posts: 30 Credit: 111,887,025 RAC: 0 Level Scientific publications	Message 33085 - Posted: 19 Sep 2013 \| 5:28:25 UTC - in response to Message 33081.
	Hi, thanks for the info. Maybe I should give it a try...
	ID: 33085 \| Rating: 0 \| rate: / Reply Quote

capeITLabs Send message Joined: 17 Nov 12 Posts: 30 Credit: 111,887,025 RAC: 0 Level Scientific publications	Message 33086 - Posted: 19 Sep 2013 \| 5:50:20 UTC - in response to Message 33085.
	Non, not even with the new drivers does it work. The application still does nothing... I cancelled both WUs.
	ID: 33086 \| Rating: 0 \| rate: / Reply Quote

skgiven Volunteer moderator Volunteer tester Send message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level Scientific publications	Message 33089 - Posted: 19 Sep 2013 \| 8:40:51 UTC - in response to Message 33086. Last modified: 19 Sep 2013 \| 8:42:23 UTC
	pnitrox122-NOELIA_INS1P-1-12-RND5810_0 2Mgx191-NOELIA_INS1P-6-12-RND2605_0 I99R1-NATHAN_KIDc22_glu-3-10-RND8774_1 Yesterday I reported similar behavior while running a NOELIA_INS1P WU (even on Linux), http://www.gpugrid.net/forum_thread.php?id=3466&nowrap=true#33057 ____________ FAQ's HOW TO: - Opt out of Beta Tests - Ask for Help
	ID: 33089 \| Rating: 0 \| rate: / Reply Quote

Paul Send message Joined: 25 Apr 13 Posts: 26 Credit: 219,745,553 RAC: 315,418 Level Scientific publications	Message 33091 - Posted: 19 Sep 2013 \| 9:10:39 UTC
	This run keeps increasing its remaining time with no end in sight. Should I abort it?
	ID: 33091 \| Rating: 0 \| rate: / Reply Quote

Jim1348 Send message Joined: 28 Jul 12 Posts: 819 Credit: 1,591,285,971 RAC: 0 Level Scientific publications	Message 33094 - Posted: 19 Sep 2013 \| 14:34:19 UTC - in response to Message 33089.
	Was that on the GTX 650 Ti BOOST? I think you also have a GTX 660 as I recall. I want to give mine a try again on the just-released 327.23 drivers, but the 660s seem to have been somewhat problematic recently.
	ID: 33094 \| Rating: 0 \| rate: / Reply Quote

capeITLabs Send message Joined: 17 Nov 12 Posts: 30 Credit: 111,887,025 RAC: 0 Level Scientific publications	Message 33095 - Posted: 19 Sep 2013 \| 16:37:30 UTC - in response to Message 33089.
	@skgiven the WU that is running (more or less) at the moment is a SANTI_RAP74. This one also does nothing... :(
	ID: 33095 \| Rating: 0 \| rate: / Reply Quote

ExtraTerrestrial Apes Volunteer moderator Volunteer tester Send message Joined: 17 Aug 08 Posts: 2705 Credit: 1,311,122,549 RAC: 0 Level Scientific publications	Message 33097 - Posted: 19 Sep 2013 \| 18:52:56 UTC
	The WU which did run for some hours has lot's of "# BOINC suspending at user request (thread suspend)" lines in the log. If it's a new installation: did you already check "Nutze die GPU wenn der Computer benutzt wird" in the local BOINC settings, CPU tab? And "Wenn CPU-Auslastung geringer als x%" with x set to 0? MrS ____________ Scanning for our furry friends since Jan 2002
	ID: 33097 \| Rating: 0 \| rate: / Reply Quote

capeITLabs Send message Joined: 17 Nov 12 Posts: 30 Credit: 111,887,025 RAC: 0 Level Scientific publications	Message 33098 - Posted: 19 Sep 2013 \| 19:19:45 UTC - in response to Message 33097.
	Sure, see screenshot. Those message lines are more than interesting, but I can't explain what causes them.
	ID: 33098 \| Rating: 0 \| rate: / Reply Quote

capeITLabs Send message Joined: 17 Nov 12 Posts: 30 Credit: 111,887,025 RAC: 0 Level Scientific publications	Message 33099 - Posted: 19 Sep 2013 \| 19:21:36 UTC - in response to Message 33098.
	The thing is that all other GPU tasks (SETI, Einstein, PrimeGrid, POEM) are running fine on this machine.
	ID: 33099 \| Rating: 0 \| rate: / Reply Quote

Jozef J Send message Joined: 7 Jun 12 Posts: 112 Credit: 1,140,895,172 RAC: 384,933 Level Scientific publications	Message 33100 - Posted: 19 Sep 2013 \| 20:00:18 UTC
	I got a similar problem for months .. I already try all the tutorials on this forum and the clean reinstal win 8 64 bit .. and even observe the problem on my hardware manufacturer's website .. A problem is in communication GPU grid taks and nvidia drivers .. cuda and programming errors .. Just two-week working gpu grid normally and then comes tasks wrong and all work is ** I see a lot of people who do not have problems, but they probably use computers only for gpu grid, or is in use linux .. But for many people discourage these problems by counting in GPUGRID For example, the Collatz Conjecture I for about a week, two, the average rac 650000 .. as well as the gpu grid for few months, but then the problems started about which is fully forum .. Two days ago I did one job for about 8-9 hours .. they are running me two because I have two cards in sli .. After today crash nvidia driver and subsequent BSOD and forced restarts, obviously wasteful tasks and credit .. I already shows one manager onetasks performed for about 9-10 hours ... weeks before the clean installation, win 8 64bit on my ssd, I one task done in 14-16 hours ... Then I had an older bios on board and voalaa,, I counts one task for 8 hours until this morning when back on the old problem of crash nvidia drivers, and, chrome browser, and others .. I've never not install the beta nvidia drivers, just WHQL,because with the beta drivers it worse.. just going to install nvidia 327.23 driver...((
	ID: 33100 \| Rating: 0 \| rate: / Reply Quote

Jozef J Send message Joined: 7 Jun 12 Posts: 112 Credit: 1,140,895,172 RAC: 384,933 Level Scientific publications	Message 33102 - Posted: 19 Sep 2013 \| 20:25:33 UTC - in response to Message 33100.
	when I installed nvidia drivers, nvidia driver fell again in a few second intervals, pop up notification of a collapse of the controls is flashed ... it's crazy.. after the next reboot while it works well but one task will count 9 -10 hours .. so again is really something wrong.. is proably never ending problems in this project :-)
	ID: 33102 \| Rating: 0 \| rate: / Reply Quote

Jozef J Send message Joined: 7 Jun 12 Posts: 112 Credit: 1,140,895,172 RAC: 384,933 Level Scientific publications	Message 33103 - Posted: 19 Sep 2013 \| 20:26:14 UTC - in response to Message 33099.
	Just so..
	ID: 33103 \| Rating: 0 \| rate: / Reply Quote

capeITLabs Send message Joined: 17 Nov 12 Posts: 30 Credit: 111,887,025 RAC: 0 Level Scientific publications	Message 33104 - Posted: 19 Sep 2013 \| 20:40:06 UTC - in response to Message 33099.
	Ok, did a couple of debug sessions and took a look into the app_control code. It seems that the task gets suspended due to CPU throttling. I'll have a deeper look now to find out why this is happening. Will keep you posted...
	ID: 33104 \| Rating: 0 \| rate: / Reply Quote

capeITLabs Send message Joined: 17 Nov 12 Posts: 30 Credit: 111,887,025 RAC: 0 Level Scientific publications	Message 33105 - Posted: 19 Sep 2013 \| 20:52:26 UTC - in response to Message 33104.
	Ok, this was a quick solution ;) I think the CPU throttling in BOINC 7.0.64 is non-optimal. When you take a look at my screenshot of the BOINC options, you'll notice that I only allow to use 75% of my CPUs. I'm not running any CPU-only WUs on this machine, so there is always just 1 active WU, since I've only got one GPU. After analyzing the debug output and the source code, I've just changed the option from 75% to 100%...BINGO!!! That worked :) Now the WU is running fine. But I think the CPU throttle handling in BOINC needs a bit of tweaking, since the GPUGrid task never ever used 75% of one CPU...
	ID: 33105 \| Rating: 0 \| rate: / Reply Quote

capeITLabs Send message Joined: 17 Nov 12 Posts: 30 Credit: 111,887,025 RAC: 0 Level Scientific publications	Message 33106 - Posted: 19 Sep 2013 \| 21:05:28 UTC - in response to Message 33100.
	@Josef maybe you should lower the GPU and memory clock speeds a bit. If the GPUs are running nearly at 100% for a long period of time, the electronics might not be able to support the factory clocks speeds any longer. In the past I've had the same problems (see http://www.gpugrid.net/forum_thread.php?id=3421#31554). After I lowered the clocks a bit, everything is running smooth. cheers Rene
	ID: 33106 \| Rating: 0 \| rate: / Reply Quote

Richard Haselgrove Send message Joined: 11 Jul 09 Posts: 1620 Credit: 8,832,166,430 RAC: 19,849,425 Level Scientific publications	Message 33107 - Posted: 19 Sep 2013 \| 21:42:29 UTC - in response to Message 33105.
	Ok, this was a quick solution ;) I think the CPU throttling in BOINC 7.0.64 is non-optimal. Correct. That was a brief (and fortunately now abandoned) aberration in BOINC. Later developmental versions (and BOINC v7.2 when it's released "real soon now") will go back to the old behaviour - CPU throttling not applied to GPU apps. I've written up details of the exact versions affected on some project's message board - I'll try and work out which project it was, and copy them back here later.
	ID: 33107 \| Rating: 0 \| rate: / Reply Quote

MJH Project administrator Project developer Project scientist Send message Joined: 12 Nov 07 Posts: 696 Credit: 27,266,655 RAC: 0 Level Scientific publications	Message 33109 - Posted: 19 Sep 2013 \| 22:14:28 UTC - in response to Message 33107.
	Richard, What's this CPU throtting thing? Do you know how it works? There's no thing germane in the library code so presumably it's all in the client. Matt
	ID: 33109 \| Rating: 0 \| rate: / Reply Quote

Richard Haselgrove Send message Joined: 11 Jul 09 Posts: 1620 Credit: 8,832,166,430 RAC: 19,849,425 Level Scientific publications	Message 33111 - Posted: 19 Sep 2013 \| 23:26:00 UTC - in response to Message 33109.
	Richard, What's this CPU throtting thing? Do you know how it works? There's no thing germane in the library code so presumably it's all in the client. Matt Yes, in the client. It's meant for thermal control of CPUs, and it dates back to the early days of BOINC. If you look at the Computing preferences on your account here, the bottom item under Processor usage is: Use at most Can be used to reduce CPU heat 100% of CPU time The implementation is crude: they wanted it to use the same source code on every platform, and there isn't a fine control like that. So it operates on a granularity of 1 second, so capeITLabs' 75% would have been 3 seconds on and 1 second off. That, of course, means three eternities on and one eternity off at the speeds GPUs operate. David Anderson made a gut reaction to a single user's request on the mailing list back in January: http://lists.ssl.berkeley.edu/pipermail/boinc_dev/2013-January/019305.html - I'm sure you can think of such a reason. That emerged in version 7.0.45 It was removed with v7.2.1. You might like to look at the note: client: don't apply CPU throttling to apps that use < .5 CPUs (like GPU, NCI). and http://boinc.berkeley.edu/trac/changeset/4cb34a123aacfaccc28b5f1f76717864b0b63a57/boinc-v2 with respect to the requested CPU reservation for Keplers and above (and make the same suggestion to any OpenCL developers you know). Links to the earlier changesets are contained in my email at http://lists.ssl.berkeley.edu/pipermail/boinc_dev/2013-July/020131.html Any casual reader here who wishes to apply thermal control to their CPU or GPU under Windows (only) would be better advised to consider TThrottle
	ID: 33111 \| Rating: 0 \| rate: / Reply Quote

MJH Project administrator Project developer Project scientist Send message Joined: 12 Nov 07 Posts: 696 Credit: 27,266,655 RAC: 0 Level Scientific publications	Message 33112 - Posted: 19 Sep 2013 \| 23:29:57 UTC - in response to Message 33111.
	Thanks Richard, I guess I'd better take a look and see exactly how this third suspend-resume mechanism works under the hood.. MJH
	ID: 33112 \| Rating: 0 \| rate: / Reply Quote

Paul Send message Joined: 25 Apr 13 Posts: 26 Credit: 219,745,553 RAC: 315,418 Level Scientific publications	Message 33117 - Posted: 20 Sep 2013 \| 20:40:06 UTC - in response to Message 33094.
	660 ... have aborted run ... just installed latest driver.
	ID: 33117 \| Rating: 0 \| rate: / Reply Quote

Jim1348 Send message Joined: 28 Jul 12 Posts: 819 Credit: 1,591,285,971 RAC: 0 Level Scientific publications	Message 33120 - Posted: 20 Sep 2013 \| 23:16:27 UTC - in response to Message 33117.
	660 ... have aborted run ... just installed latest driver. I have updated the drivers on my two GTX 660s to 327.23 and completed my first Noelia with no problems (4-NOELIA_INS1P-9-15-RND4205_0 14:09:09). Each card is running another Noelia with no problems thus far, so I will let them run and see what happens.
	ID: 33120 \| Rating: 0 \| rate: / Reply Quote

Richard Haselgrove Send message Joined: 11 Jul 09 Posts: 1620 Credit: 8,832,166,430 RAC: 19,849,425 Level Scientific publications	Message 33122 - Posted: 21 Sep 2013 \| 7:21:32 UTC - in response to Message 33112.
	Thanks Richard, I guess I'd better take a look and see exactly how this third suspend-resume mechanism works under the hood.. MJH I don't know whether this is concidence, or whether you've been in communication behind the scenes, but David Anderson has just started work on a better throttling solution. "client: preliminary implementation (commented out) of sub-second throttling" http://boinc.berkeley.edu/trac/changeset/ebde7809ceaca8cc35d75c2a2b5adc32c19694e5/boinc-v2 http://boinc.berkeley.edu/trac/changeset/35f489d36f4c7734d13f76af5844ec42d244be59/boinc-v2
	ID: 33122 \| Rating: 0 \| rate: / Reply Quote

ExtraTerrestrial Apes Volunteer moderator Volunteer tester Send message Joined: 17 Aug 08 Posts: 2705 Credit: 1,311,122,549 RAC: 0 Level Scientific publications	Message 33124 - Posted: 21 Sep 2013 \| 11:41:09 UTC
	I'm against coarse-grained throttling for thermal control as it's inefficient for any hardware using adaptive power states (like boosting nVidias and Intels + AMDs with Turbo). The reason: during activity the hardware boosts into the maximum power state supported, which implies a high voltage and lower power efficiency, whereas during idle periods it obviously does nothing. If the throttling took place fine-grained the hardare could adjust to the requested performance level and sustain a lower power state (lower voltage - higher power efficiency) and achieve the same throughput. Starting and stopping this often is inefficient from a software perspective, though. At least for GPU-Grid there's a far better solution: simply lower the GPUs power target and leave it at 100% time. It will take care of adjusting clocks and voltages down by itself. The downside of this: it requires the user to use tuning software, since this is not even available in nVidias control panel under Win (just checked mine). Let alone Linux or Mac OS, which generally don't have working hardware control software. Adjusting the power target down for CPU is also not as easy as it should be.. given Intels mobile chips already support cTDP in principle. And with AMD GPUs boosting is not yet as wide-soread, efficient and controllable as for the green team :/ MrS ____________ Scanning for our furry friends since Jan 2002
	ID: 33124 \| Rating: 0 \| rate: / Reply Quote

Paul Send message Joined: 25 Apr 13 Posts: 26 Credit: 219,745,553 RAC: 315,418 Level Scientific publications	Message 33163 - Posted: 23 Sep 2013 \| 9:26:42 UTC - in response to Message 33117.
	I91R9-NATHAN_KIDc22_glu-7-10-RND1126_1 Has been running for over 49 hours ... elapsed time increases, remaining time barely moves, but increases.
	ID: 33163 \| Rating: 0 \| rate: / Reply Quote

John C MacAlister Send message Joined: 17 Feb 13 Posts: 181 Credit: 144,871,276 RAC: 0 Level Scientific publications	Message 33165 - Posted: 23 Sep 2013 \| 9:54:28 UTC - in response to Message 33163. Last modified: 23 Sep 2013 \| 10:47:50 UTC
	Hi, GPUGrid Folks: Short run task has been grinding away for over 14 h...... 251-NOELIA_CRYST1-9-12-RND5111_0 (60% complete) :( John
	ID: 33165 \| Rating: 0 \| rate: / Reply Quote

skgiven Volunteer moderator Volunteer tester Send message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level Scientific publications	Message 33166 - Posted: 23 Sep 2013 \| 11:21:25 UTC - in response to Message 33165. Last modified: 23 Sep 2013 \| 11:22:51 UTC
	Paul and John, If the GPU is cooler than normal, I suggest shutting the system down, turning the PSU off for a minute and then turning it back on and starting the system up again - doing this allowed me to finish a WU that had run for 5days (but had really stopped after about 6h). Keep an eye on your runtime before and after you restart. ____________ FAQ's HOW TO: - Opt out of Beta Tests - Ask for Help
	ID: 33166 \| Rating: 0 \| rate: / Reply Quote

Jim1348 Send message Joined: 28 Jul 12 Posts: 819 Credit: 1,591,285,971 RAC: 0 Level Scientific publications	Message 33167 - Posted: 23 Sep 2013 \| 12:35:59 UTC - in response to Message 33166.
	If the GPU is cooler than normal, I suggest shutting the system down, turning the PSU off for a minute and then turning it back on and starting the system up again - doing this allowed me to finish a WU that had run for 5days (but had really stopped after about 6h). That fixed it for me with I18R10-NATHAN_KIDc22_glu-8-10-RND4986_1, which was taking 30 hours to complete on a GTX 660 (327.23 drivers). It had previously completed three others in the NATHAN_KIDc22 series with no problems in about 12 hours. That is unfortunately not a practical solution for me, since I lost 10 hours of CEP2 work running on the CPU. It seems to be more of a problem with the mid-range cards (GTX 660, 660 Ti). Are the 700 series cards immune?
	ID: 33167 \| Rating: 0 \| rate: / Reply Quote

John C MacAlister Send message Joined: 17 Feb 13 Posts: 181 Credit: 144,871,276 RAC: 0 Level Scientific publications	Message 33168 - Posted: 23 Sep 2013 \| 12:57:43 UTC - in response to Message 33166. Last modified: 23 Sep 2013 \| 12:58:18 UTC
	Many thanks, skgiven. Problem fixed. I had hoped to run these tasks in a 'set and forget' mode, but that may not be possible. Being unable to sleep last night, I took a peek at my machine at around 05:00h to see if all is well and that's when I discovered the long run. I will try again and if the problem recurs I will make the suggested fix. Thanks again, John
	ID: 33168 \| Rating: 0 \| rate: / Reply Quote

Operator Send message Joined: 15 May 11 Posts: 108 Credit: 297,176,099 RAC: 0 Level Scientific publications	Message 33171 - Posted: 23 Sep 2013 \| 16:04:33 UTC - in response to Message 33167.
	It seems to be more of a problem with the mid-range cards (GTX 660, 660 Ti). Are the 700 series cards immune? Depends on whether the same thing that is causing your system to just stop processing is the same thing that causes 780s/Titans to have constant "Access violations" and app restarts. Could be the same thing causing different symptoms using different GPUs. I think it's all down to 8.14 myself. Operator. ____________
	ID: 33171 \| Rating: 0 \| rate: / Reply Quote

Jim1348 Send message Joined: 28 Jul 12 Posts: 819 Credit: 1,591,285,971 RAC: 0 Level Scientific publications	Message 33172 - Posted: 23 Sep 2013 \| 18:17:05 UTC - in response to Message 33171.
	Depends on whether the same thing that is causing your system to just stop processing is the same thing that causes 780s/Titans to have constant "Access violations" and app restarts. The Memory Controller Load apparently runs at a constant 14% rate when it is running slowly, so I doubt that it is the start/stop condition. (It should run about 30% normally on these work units.) I know they had a similar problem with the older apps (before the 8 series), particularly with the GTX 660s, and thought it might have been solved. Otherwise, the 8.14 app works very nicely that I can see, except for one Noelia that errored out, but no crashes or other bad behavior. I hope they can get the last wrinkles ironed out for the mid-range cards, and also for the 700 cards or else there is not much incentive to upgrade to those.
	ID: 33172 \| Rating: 0 \| rate: / Reply Quote

Paul Send message Joined: 25 Apr 13 Posts: 26 Credit: 219,745,553 RAC: 315,418 Level Scientific publications	Message 33173 - Posted: 23 Sep 2013 \| 19:45:52 UTC - in response to Message 33166.
	Shut the machine down while I went to work, 12 hrs later turned it back on. The elapsed time increases, the remaining stagnant ?
	ID: 33173 \| Rating: 0 \| rate: / Reply Quote

Paul Send message Joined: 25 Apr 13 Posts: 26 Credit: 219,745,553 RAC: 315,418 Level Scientific publications	Message 33174 - Posted: 23 Sep 2013 \| 20:06:18 UTC - in response to Message 33173.
	This is the second run I have aborted. My GPUGRID credits are decreasing because I am running programs that don't work and I have to abort.
	ID: 33174 \| Rating: 0 \| rate: / Reply Quote

Paul Send message Joined: 25 Apr 13 Posts: 26 Credit: 219,745,553 RAC: 315,418 Level Scientific publications	Message 33175 - Posted: 23 Sep 2013 \| 20:11:55 UTC - in response to Message 33174.
	All this started happening just recently ...
	ID: 33175 \| Rating: 0 \| rate: / Reply Quote

TJ Send message Joined: 26 Jun 09 Posts: 815 Credit: 1,470,385,294 RAC: 0 Level Scientific publications	Message 33176 - Posted: 23 Sep 2013 \| 21:26:38 UTC - in response to Message 33172.
	Depends on whether the same thing that is causing your system to just stop processing is the same thing that causes 780s/Titans to have constant "Access violations" and app restarts. The Memory Controller Load apparently runs at a constant 14% rate when it is running slowly, so I doubt that it is the start/stop condition. (It should run about 30% normally on these work units.) I know they had a similar problem with the older apps (before the 8 series), particularly with the GTX 660s, and thought it might have been solved. Otherwise, the 8.14 app works very nicely that I can see, except for one Noelia that errored out, but no crashes or other bad behavior. I hope they can get the last wrinkles ironed out for the mid-range cards, and also for the 700 cards or else there is not much incentive to upgrade to those. Have you checked if the GPU clock runs still at full load (to load you want or have set to)? I have had a lot of troubles with my 660's, even bought a new motherboard. They do fine now with the beta's and long runs and 8.14. Short runs give (still) the most problems. My 770 from Asus is almost error free with all types of WU, and more over most WU's don't even stop en route, they run in one go. We can now see that with the new stderr Matt has made. So in my new builds only 770, 780 and Titan. ____________ Greetings from TJ
	ID: 33176 \| Rating: 0 \| rate: / Reply Quote

Paul Send message Joined: 25 Apr 13 Posts: 26 Credit: 219,745,553 RAC: 315,418 Level Scientific publications	Message 33178 - Posted: 23 Sep 2013 \| 22:51:42 UTC - in response to Message 33175.
	If nothing will fix this, I will delete GPUGRID and run another BOINC program.
	ID: 33178 \| Rating: 0 \| rate: / Reply Quote

Jim1348 Send message Joined: 28 Jul 12 Posts: 819 Credit: 1,591,285,971 RAC: 0 Level Scientific publications	Message 33179 - Posted: 24 Sep 2013 \| 0:59:01 UTC - in response to Message 33176. Last modified: 24 Sep 2013 \| 1:04:20 UTC
	Have you checked if the GPU clock runs still at full load (to load you want or have set to)? I have had a lot of troubles with my 660's, even bought a new motherboard. They do fine now with the beta's and long runs and 8.14. Short runs give (still) the most problems. Yes, the GPU clock shows running a full speed on GPU-Z. It is normally 993 MHz as set by the card, but I had reduced it to 980 MHz (hardly a difference) and also bumped up the core voltage slightly (by 12.5 mv) with Nvidia Inspector. But there was no obvious down-clocking, as was a problem for some Nvidia cards a few years ago. But maybe not all the relevant clocks are shown by GPU-Z? It is nothing I can fix at any rate, and I have seen no reports of such problems for these current drivers. It is on a Z77 motherboard with an Ivy Bridge i7-3770, with each GPU supported by a vitual CPU core, so that should not be a limitation. And the fact that a reboot fixes it would indicate that it is a software, not a hardware problem (to me at any rate). There was some speculation earlier on various reasons that some cards were affected and others weren't, such as cache size, memory bandwidth, etc., but I don't think any definitive answer has been found. It is apparently something only GPUGrid can fix.
	ID: 33179 \| Rating: 0 \| rate: / Reply Quote

skgiven Volunteer moderator Volunteer tester Send message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level Scientific publications	Message 33180 - Posted: 24 Sep 2013 \| 9:11:15 UTC - in response to Message 33179. Last modified: 24 Sep 2013 \| 9:13:21 UTC
	Jim, I think that is a fair assessment. This issue is most likely caused by the WU running on the app in a slightly different way than the other WU's. It's been around for some time, but difficult to spot due to other errors (especially in the summer months). I have mainly just been running the Beta WU's (for half the normal Long WU credits), and recently only came across this issue the once on a Linux system. If it's just being caused by Noelia WU's then we need a mechanism to allow crunchers to select to not run these WU; either put the Noelia's in the Beta queue or create another queue. It doesn't appear to be effecting some systems, and the Noelia WU's pay the best, so others will want to run these WU's. ____________ FAQ's HOW TO: - Opt out of Beta Tests - Ask for Help
	ID: 33180 \| Rating: 0 \| rate: / Reply Quote

Retvari Zoltan Send message Joined: 20 Jan 09 Posts: 2343 Credit: 16,206,655,749 RAC: 261,147 Level Scientific publications	Message 33182 - Posted: 24 Sep 2013 \| 13:41:03 UTC - in response to Message 33178.
	If nothing will fix this, I will delete GPUGRID and run another BOINC program. Paul, Looking at your tasks' stderr.out file, they are full of: # BOINC suspending at user request (thread suspend) # BOINC resuming at user request (thread suspend) which means that your BOINC manager keeps on suspending and resuming the GPUGrid application. This could be the result of improper settings of the BOINC manager and/or Windows. For example: 1. The CPU project you are running uses too much CPU time resolution: you should limit the CPU usage of those projects to give the GPU projects a single core per GPU by the "on multiprocessor systems use at most" 50% of the processor cores (as you have a dual core CPU, so 1 CPU core is 50% on your system) in the Boinc Manager (Advanced View) / Tools / Computing preferences / processor usage tab 2. The BOINC manager "throttles" its applications This setting is used to limit the heat generated by the CPU, but it throttles the GPU applications also by mistake. resolution: go to Boinc Manager (Advanced View) / Tools / Computing preferences / processor usage tab / use at most 100% cpu time 3. The BOINC manager is not using the GPU while you are using your computer resoluntion: go to Boinc Manager (Advanced View) / Tools / Computing preferences / processor usage tab / check the "While the computer is in use" and the "Use GPU while the computer is in use" checkboxes If you play games which needs the GPU, you should put them on the list in the "Exclusive Applications" tab 4. Windows power management limits the time while your computer is "awake" resolution: go to Start / Control Panel (large icon view) / Power options / change current scheme settings / Put the computer to sleep: Never
	ID: 33182 \| Rating: 0 \| rate: / Reply Quote

TJ Send message Joined: 26 Jun 09 Posts: 815 Credit: 1,470,385,294 RAC: 0 Level Scientific publications	Message 33187 - Posted: 24 Sep 2013 \| 22:12:56 UTC - in response to Message 33180.
	On my 660 two times today the GPU clock was down clocked. The WU's where beta ans Santi. My 660 has the most problems with Santi and almost none with Noelia's. I regret that I have bought two 660's during summer, as my 770 is running fine with all WU's and very little errors. The card was more expensive but absolutely no frustration to run, while the 660 is frustrating me several times a day. The will be replaced by 7xx before the year is over. ____________ Greetings from TJ
	ID: 33187 \| Rating: 0 \| rate: / Reply Quote

Paul Send message Joined: 25 Apr 13 Posts: 26 Credit: 219,745,553 RAC: 315,418 Level Scientific publications	Message 33191 - Posted: 24 Sep 2013 \| 22:36:04 UTC
	I have deleted gpugrid from my computer. I suspended all runs except for the gpu It runs but the remaining times keeps increasing. never more never more.
	ID: 33191 \| Rating: 0 \| rate: / Reply Quote

Paul Send message Joined: 25 Apr 13 Posts: 26 Credit: 219,745,553 RAC: 315,418 Level Scientific publications	Message 33213 - Posted: 26 Sep 2013 \| 9:19:07 UTC - in response to Message 33182.
	Will try 50% processor and 100% CPU. Will let you know how this works.
	ID: 33213 \| Rating: 0 \| rate: / Reply Quote

Paul Send message Joined: 25 Apr 13 Posts: 26 Credit: 219,745,553 RAC: 315,418 Level Scientific publications	Message 33226 - Posted: 27 Sep 2013 \| 19:18:22 UTC - in response to Message 33182.
	Had my first GPU completion in quite awhile after the CPU & processor usage change ... Thanks !
	ID: 33226 \| Rating: 0 \| rate: / Reply Quote

dskagcommunity Send message Joined: 28 Apr 11 Posts: 456 Credit: 817,865,789 RAC: 0 Level Scientific publications	Message 33638 - Posted: 27 Oct 2013 \| 11:47:45 UTC Last modified: 27 Oct 2013 \| 11:48:44 UTC
	Sooo whats the official solution for this, because this problem seems to still remain on short queue here on a 560ti 448 core edition. Lucky i can switch to long runs only with this card, but want to set it for both queues any later again for dont lose any publicationbadges ;) restart the machine after every short unit on an unattended machine is no solution ;) ____________ DSKAG Austria Research Team: http://www.research.dskag.at
	ID: 33638 \| Rating: 0 \| rate: / Reply Quote

Jacob Klein Send message Joined: 11 Oct 08 Posts: 1127 Credit: 1,901,927,545 RAC: 0 Level Scientific publications	Message 33639 - Posted: 27 Oct 2013 \| 12:02:18 UTC - in response to Message 33638.
	What drivers are you using? It looks like your computer is using 301.42 from May 2012. I'd recommend trying the most-recent 331.58 drivers that were released, and if those still don't work, possible try 314.22 (which was before they created the 320 branch and broke some stuff). Believe it or not, new drivers can actually fix things!
	ID: 33639 \| Rating: 0 \| rate: / Reply Quote

dskagcommunity Send message Joined: 28 Apr 11 Posts: 456 Credit: 817,865,789 RAC: 0 Level Scientific publications	Message 33640 - Posted: 27 Oct 2013 \| 12:47:23 UTC
	Hmm. ok i was only wondering because TJ has the same problem and updated the drivers but still fail. ____________ DSKAG Austria Research Team: http://www.research.dskag.at
	ID: 33640 \| Rating: 0 \| rate: / Reply Quote

TJ Send message Joined: 26 Jun 09 Posts: 815 Credit: 1,470,385,294 RAC: 0 Level Scientific publications	Message 33657 - Posted: 28 Oct 2013 \| 23:15:33 UTC - in response to Message 33640.
	Hmm. ok i was only wondering because TJ has the same problem and updated the drivers but still fail. Hello dskagcommunity, I need to tell you that I didn't run any short WU in the last 30 days, I don't dare to. Later this week when I am at the computer with the 660 I will run a few short ones and let you know how they do. ____________ Greetings from TJ
	ID: 33657 \| Rating: 0 \| rate: / Reply Quote

dskagcommunity Send message Joined: 28 Apr 11 Posts: 456 Credit: 817,865,789 RAC: 0 Level Scientific publications	Message 33667 - Posted: 29 Oct 2013 \| 19:23:43 UTC
	oh i read it like that because you answered me O.o ok then, i will upgrade the driver ^^ ____________ DSKAG Austria Research Team: http://www.research.dskag.at
	ID: 33667 \| Rating: 0 \| rate: / Reply Quote

TJ Send message Joined: 26 Jun 09 Posts: 815 Credit: 1,470,385,294 RAC: 0 Level Scientific publications	Message 33681 - Posted: 30 Oct 2013 \| 16:49:38 UTC - in response to Message 33667.
	Hello dskagcommunity, as promised I would let you know about the SR's. Well the first one did finished okay but with this "result" in the stderr: # GPU 0 : 65C SWAN : FATAL : Cuda driver error 716 in file 'swanlibnv2.cpp' in line 1963. # SWAN swan_assert 0 # GPU [GeForce GTX 660] Platform [Windows] Rev [3203] VERSION [55] # SWAN Device 0 : # Name : GeForce GTX 660 That means that the clock has down clocked to half, so very slow processing now. The PC had run for 10 and a half day with LR's and no such "error". To me the SR are a pain on the 660. However 331.40 are not the latest beta drivers anymore. I do hope your SR's do good. If you have time and willing you could try the latest drivers. Good luck. ____________ Greetings from TJ
	ID: 33681 \| Rating: 0 \| rate: / Reply Quote

Jozef J Send message Joined: 7 Jun 12 Posts: 112 Credit: 1,140,895,172 RAC: 384,933 Level Scientific publications	Message 33773 - Posted: 4 Nov 2013 \| 16:34:38 UTC
	few weeks I have a problem. When im start a project manager Boinc gpu grid starts to work,then run the Chrome browser,or movie player program..or gpu-Z,youtube a whole computer slows down like in slow motion, if I in this manage to hit the slow mode mouse and clicking on the boinc manager and Pause only gpu grid, everything starts working normally-off in this Already have fun few days, sometimes I succeed to run and feelings do not come into a slow-mode which subsequently repaired just reboot or reset. but immediately after start when windows boots up normally boinc manager and GPUGRID and I'll start windows normally uses 8.1 clean install nvidia driver geforce R331, after starting certain specific applications (Chrome, PotPlayer ..) to start mowing after again .. But it happened to me already without launching the program ... I checked everything but when it's almost a clean install, the problem is gpudrid and nvidia drivers and some acceleration in programs who work at the same time of GPUGRID .. shorter this could be explained if I shot this video ... but a longer time to work assignments from GPUGRID perfectly but recently launched this problem .. and the solution will be hard to find .. I changed the HDD and SSD because of it whether it is really HW-no problem .. strange that I simply could not fit bsod month because of faulty work of GPUGRID or other problems of labor, which is fully-forum :) But he began to show me the problem-slow motion .. this is a situation in I think it will be some conflict-GPUGRID-nvidia-acceleration program that uses the graphics card because another project which was then the cpu is running at the same time with no problem
	ID: 33773 \| Rating: 0 \| rate: / Reply Quote

Betting Slip Send message Joined: 5 Jan 09 Posts: 670 Credit: 2,498,095,550 RAC: 0 Level Scientific publications	Message 33775 - Posted: 4 Nov 2013 \| 18:06:48 UTC - in response to Message 33773.
	few weeks I have a problem. When im start a project manager Boinc gpu grid starts to work,then run the Chrome browser,or movie player program..or gpu-Z,youtube a whole computer slows down like in slow motion, if I in this manage to hit the slow mode mouse and clicking on the boinc manager and Pause only gpu grid, everything starts working normally-off in this Already have fun few days, sometimes I succeed to run and feelings do not come into a slow-mode which subsequently repaired just reboot or reset. but immediately after start when windows boots up normally boinc manager and GPUGRID and I'll start windows normally uses 8.1 clean install nvidia driver geforce R331, after starting certain specific applications (Chrome, PotPlayer ..) to start mowing after again .. But it happened to me already without launching the program ... I checked everything but when it's almost a clean install, the problem is gpudrid and nvidia drivers and some acceleration in programs who work at the same time of GPUGRID .. shorter this could be explained if I shot this video ... but a longer time to work assignments from GPUGRID perfectly but recently launched this problem .. and the solution will be hard to find .. I changed the HDD and SSD because of it whether it is really HW-no problem .. strange that I simply could not fit bsod month because of faulty work of GPUGRID or other problems of labor, which is fully-forum :) But he began to show me the problem-slow motion .. this is a situation in I think it will be some conflict-GPUGRID-nvidia-acceleration program that uses the graphics card because another project which was then the cpu is running at the same time with no problem You are running Noelia Ins1p units on you 275 card which doesn't have enough memory this will cause what you describe. ____________ Radio Caroline, the world's most famous offshore pirate radio station. Great music since April 1964. Support Radio Caroline Team - Radio Caroline
	ID: 33775 \| Rating: 0 \| rate: / Reply Quote

Post to thread

Message boards : Number crunching : strange behaviour...

	About	Science	Volunteers	Performance	Forum	Join us	Donate