Message boards : Number crunching : NOELIAs are back!
Author | Message |
---|---|
Just got one. Glad I have a 1GB GPU 'cause it's using 874MB of its memory! | |
ID: 29705 | Rating: 0 | rate: / Reply Quote | |
Have been getting many. None completed as yet, but so far no problems. Are these the same type as before? | |
ID: 29706 | Rating: 0 | rate: / Reply Quote | |
These new units are using really low CPU power. The gpus are way cooler aswell, and the processing times seems to be very high. Not using all the hardware. | |
ID: 29708 | Rating: 0 | rate: / Reply Quote | |
After over 4 hours execution with 12 hours estimated runtime to completion, I aborted one. The second one is showing the same issues. Estimated runtime to completion is rising with almost every second of execution. | |
ID: 29709 | Rating: 0 | rate: / Reply Quote | |
I suspended my Nathan WU’s to have a look at these NOELIA_klebe WU:
temperature 67C GPU usage 94% (with CPU tasks suspended) Fan speed 77% Core clock 1202MHz CPU usage is fairly low; looks like being ~44% of the GPUs’ runtime. | |
ID: 29711 | Rating: 0 | rate: / Reply Quote | |
Just finished my first NOELIA_klebe workunit on my GTX 670 (W7)! Nothing unusual: runtime 34480.037 s, credit 127,800.00, which is by the way similar to NOELIA_PEPTGPRC: 29748.680 s, 113,250.00, last/oldest NOELIA WU I was able to find on my register (after that I only had NATHANs). About same credit/time ratio within NOELIAs and hey it is not all about credits, so no reason to abort just because NATHANs do better on credits, within the GPUGRID it will level out. So I am quite happy with these new NOELIAs, if the WUs don't start to crash down the road. | |
ID: 29712 | Rating: 0 | rate: / Reply Quote | |
I have a short run NOELIA klebe in process. on a 4 core AMD with 2 boincsimap tasks and wuprop running CPU use is low in comparison to NATHAN. CPU use is varying between 15% and 100% depending on which core, but the CPU is never saturated. | |
ID: 29714 | Rating: 0 | rate: / Reply Quote | |
After 9 hours running. only 18% complete. Remaining time is still climbing. | |
ID: 29715 | Rating: 0 | rate: / Reply Quote | |
It might be better or even neccessary to for straight usage of one core, as the Nathans do. Freeing half a core for such a performance hit (on fully loaded systems) doesn't seem worth it. | |
ID: 29717 | Rating: 0 | rate: / Reply Quote | |
It might be better or even neccessary to for straight usage of one core, as the Nathans do. Freeing half a core for such a performance hit (on fully loaded systems) doesn't seem worth it. Agreed. 148px47x4-NOELIA_klebe_run-0-3-RND5398_0 4431950 6 May 2013 | 20:26:40 UTC 7 May 2013 | 9:18:47 UTC Completed and validated 36,452.79 16,142.69 127,800.00 I32R6-NATHAN_dhfr36_5-21-32-RND2187_1 4426158 5 May 2013 | 23:09:30 UTC 6 May 2013 | 8:22:25 UTC Completed and validated 18,944.86 18,944.86 70,800.00 I went into app_config and set the cpu_usage for GPUGrid to 1.0 to see what impact this had on running a NOELIA_klebe WU, with the CPU usage set in Boinc Manager to 75% (the most I typically use with 2 GPU's in the system): GPU power 88% temperature 65C GPU usage ~90% still with some variation Fan speed 74% Core clock at 1189MHz (this dropped yesterday from 1202MHz and hasn’t risen yet). There is still a 4% loss, going by GPU usage, but probably more if I ran an entire WU with these settings. I also suspended and restarted Noelia’s task and closed, used snooze GPU and opened Boinc without problems. My WU returned in just over 10h, as expected, but might have dipped below 10h if I hadn't been suspended and restarting... - With SWAN_SYNC set to 0 (and still using app_config) it didn’t make much difference: GPU power 88% (no change) temperature 65C (no change) GPU usage ~90% still with some variation but very occasionally rising to 94% and staying at that for a few seconds. Fan speed 74% Core clock at 1189MHz (no change). Actual progress was about the same. 1% after just over 6min. Got a driver error when I exited Boinc. That suggests to me that the WU isn’t closing down gracefully. Maybe Boinc is to blame there. Did a restart and removed app_config file, but left SWAN_SYNC in place. No change. Opened the init_data.xml file in the slot that the WU was running and it's still set to <ncpus>1.000000</ncpus>. So there's that theory confirmed. Once <ncpus> is set then that's it fixed for the duration of the run. When I suspended the WU I got another driver restart. - Before I previously restarted I had set <max_concurrent>2</max_concurrent> in app_config (thinking this would limit the GPUGrids WU cache to 2). Then I removed the app_config file from the project directory. I aborted 2 WU's and restarted. I now have 4 WU's in the queue, but only 1 NVidia GPU. Either the number of WU's has been fixed in relation to time (1day, so unlikely given that the WU's are 10h each and Boinc is saying that) due to app_config, and you have to use the app_config always (or do a project restart to properly flush it out of the system), or it's in some way related to the GPU count and ignores the fact that the other GPU is an ATI. Either way having to do a project reset just to make configuration changes is far from ideal. Anyway, if anyone is planning to do a project reset because they were using app_config, I suggest you choose no more tasks, finish any running WU's and abort any queued WU's, then do the project reset. The project doesn't resend tasks after a project reset, so by aborting the upstarted WU's they won't be in limbo until they time out and can be resent early (better for the project). If you don't they will appear as In Progress online for two weeks. ____________ FAQ's HOW TO: - Opt out of Beta Tests - Ask for Help | |
ID: 29719 | Rating: 0 | rate: / Reply Quote | |
Hi, everyone: | |
ID: 29720 | Rating: 0 | rate: / Reply Quote | |
By running less than 3 CPU tasks on your triple core CPU you should see some improvements in GPU usage (at least for these NOELIA WU's). If you don't run any CPU projects you will see the most improvement, but what you run is personal choice. I would probably run one CPU WU at a time with your setup, or just not use the CPU; your NVidia is vastly more powerful than your AMD CPU. | |
ID: 29721 | Rating: 0 | rate: / Reply Quote | |
Still computing on the first one, but seems to run normal 12h on my 560ti 448 cores @ 98% gpu load and 2-6% pentium 4 load :) | |
ID: 29723 | Rating: 0 | rate: / Reply Quote | |
Just got one. Glad I have a 1GB GPU 'cause it's using 874MB of its memory! Maybe that's why they're locking up my 4 GTX 460 768MB cards. Can Nathan and/or Toni perhaps help with designing these WUs? Please? Please?? Please??? Though they walk though the valley of the shadow of death, my GTX 460s fear no evil EXCEPT these crappy NOELIA WUs. Guess the 4 of them are off to other projects :-( | |
ID: 29725 | Rating: 0 | rate: / Reply Quote | |
If WU's are going to use more than 750MB GDDR, some crunchers need an easy way to select what tasks they run, or better still the server would allocate WU's based on the amount of GDDR the cards have (might be problematic for people with multiple mixed series cards). No point sending out work that will never complete. | |
ID: 29729 | Rating: 0 | rate: / Reply Quote | |
They start to crash on all machines. Lost like 50hr of processing since yesterday. Uncool to say the least. | |
ID: 29730 | Rating: 0 | rate: / Reply Quote | |
They start to crash on all machines. Lost like 50hr of processing since yesterday. Uncool to say the least. There's quite a lot of doom & gloom over this latest batch of NOELIAs. My own experience, on an oc'd GTX 460 1MB, is positive; one completed in under 18 hours with 127,800 credit, and one now running, almost six hours elapsed and 12+ hours remaining. ____________ | |
ID: 29731 | Rating: 0 | rate: / Reply Quote | |
They start to crash on all machines. Lost like 50hr of processing since yesterday. Uncool to say the least. An experience of 1 is not much experience. Firehawk is having crashing problems on his very fast GPUs, sure a few completed there too. They don't seem to run at all on GPUs with under 1GB ram. Another user posted above that they will not run on Linux on his 660 Ti. Other people are also having crashes. Why are we inflicted with these WUs without testing or warning? It's ridiculous IMO. It's a waste of our resources and our money. | |
ID: 29732 | Rating: 0 | rate: / Reply Quote | |
They start to crash on all machines. Lost like 50hr of processing since yesterday. Uncool to say the least. I too have had a less-than-ideal experience with these new Noelia workunits. After weeks of successfully completing Nathans, my stable GTX 680 cruncher failed 17 Noelias in a row before I detected the issue and rebooted the box. I am still awaiting expiration of the WU limit timer to confirm whether this cruncher will even be able to complete one of these WUs. I concur with Beyond that this is a very unfortunate waste of resources that might have been avoided with some advance notification. | |
ID: 29733 | Rating: 0 | rate: / Reply Quote | |
I have had two NOELIA tasks successfully complete with two more half way through....fingers crossed. | |
ID: 29734 | Rating: 0 | rate: / Reply Quote | |
Up to now 4 of them in Linux in 660Ti without problem, the fourth one is about to finish yet, between 11 and 12 hours. Lower ppd than Nathan's but better for the summer as they stress less the GPUs :) | |
ID: 29735 | Rating: 0 | rate: / Reply Quote | |
Have processed 40 Noelias so far without error. I've had 15 of Nathans fail in the last 7 days! | |
ID: 29736 | Rating: 0 | rate: / Reply Quote | |
I've completed 21 NOELIA's so far and have had 0 errors. | |
ID: 29737 | Rating: 0 | rate: / Reply Quote | |
An experience of 1 is not much experience. 65 recently-reported successes is an experience!! ____________ | |
ID: 29738 | Rating: 0 | rate: / Reply Quote | |
An experience of 1 is not much experience. So because they run for maybe 1/2 the people (looks like mainly XP and Linux, although I did so far have 1 finish in Win7/64), all is OK with you? I remember VERY well how loudly you screamed when YOU were having problems with some WUs. | |
ID: 29739 | Rating: 0 | rate: / Reply Quote | |
Guys, let's be fair here: the issues with this run seem to be less than with previous Noelias. Whether the job was fit for the production queue is up to debate, but don't assume they didn't do any internal testing just because errors happen. swanMakeTexture2D failed -- array create failedAssertion failed: a, file swanlibnv2.cpp, line 59 That looks like the creation / allocation of some array. This might indeed be due to running out of memory. BOINC should prevent this, but the amount of memory needed has to be reported properly by the WU, otherwise BOINC can't do anything about it. I'll forward this to the Devs. @John: your CPU has 2 cores disguised as 4. As SK said, I'd first try to reduce the number of CPU tasks and see how GPU performance improves. Once you've got these numbers you can still decide whether this trade-off is worth it for you. MrS ____________ Scanning for our furry friends since Jan 2002 | |
ID: 29740 | Rating: 0 | rate: / Reply Quote | |
Hey guys, I wasn't trying to diminish from the problems you are having and I apologize for coming off wrong. | |
ID: 29741 | Rating: 0 | rate: / Reply Quote | |
Well remembered mate. I had to roll back to this driver because the newer ones had a way worse temperature control, but since these new units run cooler, I may give it a try prior to change projects. Thanks for the heads up. Update: and this seems to did the trick on the AMD 6x690 machine, wich was suffering from poor performance on half of the gpus. They are warming right now, wich always is a good sign. Will see about the stability in some hours. Thanks again ETApes | |
ID: 29742 | Rating: 0 | rate: / Reply Quote | |
I'm presently using 311.06 on W7x64. I've had a few driver resets and app crashes, but so far these appear close to the start of runs (up to 5%), and if I just close and open Boinc, Noelia's WU's restart and crunch away reasonably well. | |
ID: 29743 | Rating: 0 | rate: / Reply Quote | |
I had two failures, and many successful 'NOELIA_klebe_run-0-3's. | |
ID: 29744 | Rating: 0 | rate: / Reply Quote | |
These new units are using really low CPU power. The gpus are way cooler aswell, and the processing times seems to be very high. Not using all the hardware. +1 | |
ID: 29745 | Rating: 0 | rate: / Reply Quote | |
I had two failures, and many successful 'NOELIA_klebe_run-0-3's. file swanlibnv2.cpp, line 59 - might be a cuda bug/app issue. The Cellresize error might be due to OC, or it might be something else/new; I don't recall seeing it before. Probably best to report such errors, just in case. These new units are using really low CPU power. The gpus are way cooler aswell, and the processing times seems to be very high. Not using all the hardware. Yeah, we think these tasks could perhaps do with using more CPU resources(or possibly be higher priority), but conversely, they are using too much GPU memory for some cards. So, not using enough resources for some and using too much for others. Making everybody happy is a cinch :)) The reported downclocking may depend on setup and OS/drivers. When I reduce the CPU usage I'm getting reasonably high GPU performances (W7x64). I'm running 4 climate models (for stability ;p) and two GPUGRid WU's. The GTX660Ti is presently using 88% power, 89% GPU utilization, 848MB GDDR5 (for the bigger equations perhaps) and the shaders are up to 1215MHz. The GTX470 is using 93% GPU and 751MB GDDR (smaller equation maybe). Are there larger and smaller equations in use and are they being used based on GDDR capacity or Compute Capability (CC2.0 small CC2.1 or above large)? Perhaps there is some variation in memory usage of these tasks (different WU's use different amounts of memory); Im seeing 751MB, 801MB and 848MB on different GPU's. Only the 751MB WU is running on a card not used for a display. Again, might be a coincidence or might be something in it? Those with 2 GPU's could check. ____________ FAQ's HOW TO: - Opt out of Beta Tests - Ask for Help | |
ID: 29746 | Rating: 0 | rate: / Reply Quote | |
MrS, many thanks for the suggestion! I normally avoid changing my crunching rig config as much as possible once it is stable, but I see that my other rig that has a mix of 650Ti & 560Ti cards running version 314.22 has now completed 4 Noelia WUs. I'll try upgrading the 680 rig as soon as I get home. Do you tend to keep your crunch platforms on the latest drivers, or do you wait for systematic issues to arise before upgrading? | |
ID: 29747 | Rating: 0 | rate: / Reply Quote | |
I like to play it half-safe: wait a few weeks if others discover any issues with newer drivers, and if not I'll upgrade when I feel like it. And I won't use beta drivers for crunching, unless there's a very good reason to do so. | |
ID: 29749 | Rating: 0 | rate: / Reply Quote | |
Going straight for the leading edge can be painful in the BOINC world, it's more like the bleeding edge :p Unfortunately, staying still for the sake of 'stability' seems to be just as bloody... I've upgraded to version 314.22 WHQL driver, and my GTX680 platform (ID 87170) is still failing the long run WU within 6 seconds, with the following error in the BOINC log: 5/8/2013 10:57:22 | GPUGRID | [sched_op] Reason: Unrecoverable error for task 290px29x1-NOELIA_klebe_run-1-3-RND8276_0 ( - exit code 98 (0x62)) No hardware or software config changes have been made to this platform since the new long run WUs were queued. I have no clue why my other rig (ID 137898) (in which I am constantly changing GPUs setups, and runs hotter than the GTX680 platform) is completing long runs successfully, but 87170 can't even initialize the new long runs. Any further insights would be greatly appreciated. Thank you in advance for the help and the patience! | |
ID: 29757 | Rating: 0 | rate: / Reply Quote | |
I've just seen that you're using an anonymous platform on that rig. There's been an app update in the not too distant past, introducing features which hadn't been used by the previous Nathans, but are being used now. | |
ID: 29760 | Rating: 0 | rate: / Reply Quote | |
When Tomba first started this thread, I jumped up and checked my computers and discovered that I had 3 running. I use Precision X to adjust my Evga cards, the 3 NOELIA's that were running had caused those cards boost clock to jump up considerably along with more frequent voltage spikes. Now that all my cards are running these new NOELIA's, I've had to adjust the voltage and core clock on all the cards back down to where I know they are in a safe range. | |
ID: 29761 | Rating: 0 | rate: / Reply Quote | |
I tested NOELIAs on Ubuntu 13.04 with Nvidia 319.17 on a GTX660Ti. Progress thru 9 hours of execution of a SHORT RUN, showed less than 20% progress. Further tests with Long Runs indicated a runtime of over 5000 minutes, minimum. | |
ID: 29762 | Rating: 0 | rate: / Reply Quote | |
If you're still running an older app there we'd have an easy explanation. Yes - I believe you've found the root cause. I adapted an app_info.xml from the forum to run on rig 87170 August last year to permit execution of two low-utilization Paola 3EKO WUs at a time. The cudart32_42_9.dll, cufft32_42_9.dll and acemd.2562.cuda42.dll are dated 6/17/2012, and the tcl85.dll is 11/23/2010. I see from my other rig that app updates occurred on 11/4/2012 and 2/25/2013. What's the fastest/simplest resolution - delete the current app_info.xml? Thanks! | |
ID: 29767 | Rating: 0 | rate: / Reply Quote | |
Yes, delete the app_info file and restart Boinc. | |
ID: 29774 | Rating: 0 | rate: / Reply Quote | |
Yes, delete the app_info file and restart Boinc. Thank you skgiven & MrS - I'm crunching long runs again! At present I doubt that you will get a significant improvement running two tasks... I agree completely. I only ran 2 WUs at once with Paola 3EKO long runs. IMHO, every GPUGRID long run WU since has had high enough GPU utilization to justify dedicating the GPU (even a GTX 680) to it alone. And as you can probably tell, I'd really much rather leave my cruncher config alone, and let it do its thing :) | |
ID: 29780 | Rating: 0 | rate: / Reply Quote | |
I have one Noelia finished in 147,962.58 seconds on a GTX550Ti with driver 314.7. | |
ID: 29783 | Rating: 0 | rate: / Reply Quote | |
I have recently completed seven NOELIA tasks in an average of 69,589 seconds (19.3h) on my GTX 650 Ti GPUs, driver 314.22. CPU is AMD A10 5800K. Two more tasks are running with about 15h to go. | |
ID: 29784 | Rating: 0 | rate: / Reply Quote | |
Going by that, and I know you both use W7, Johns GTX650Ti is 2.12 times as fast as TJ's GTX550Ti. Don't know the CPU usage on both systems though, and that's important. | |
ID: 29787 | Rating: 0 | rate: / Reply Quote | |
Going by that, and I know you both use W7, Johns GTX650Ti is 2.12 times as fast as TJ's GTX550Ti. Don't know the CPU usage on both systems though, and that's important. I use only x% of the CPU, so that 1 core is always free. The GTX560Ti WU is using 0.49 CPU and the GTX285WU is using 0.571 CPU. By the way, both are Vista driven. My W7 has ATI cards which I will replace as soon as I have some money left by nVidia 680 or 690. ____________ Greetings from TJ | |
ID: 29789 | Rating: 0 | rate: / Reply Quote | |
Computed already 9 units, no problems so far :), 42500 secs on (my new 24/7) 570 and 45000 on 560ti 448cores. | |
ID: 29791 | Rating: 0 | rate: / Reply Quote | |
Going by that, and I know you both use W7, Johns GTX650Ti is 2.12 times as fast as TJ's GTX550Ti. Don't know the CPU usage on both systems though, and that's important. My 3 650 TI GPUs are running 59,400 - 64,900 seconds/WU OCed. The 4 GTX 460s are unfortunately off to other projects until the NOELIAS go away, so I'm 3 for 7... | |
ID: 29792 | Rating: 0 | rate: / Reply Quote | |
We are looking into it. | |
ID: 29811 | Rating: 0 | rate: / Reply Quote | |
We are looking into it. Since May 8th I have done 3 Noelia's which all finished without error. Only my GPU's are slow so it took almost 2 days to complete on WU. And I have seen other PC's that finish these Noelia's so it could be something of a driver, hardware, BOINC or the OS? So perhaps is the algorithm Noelia is using okay. I don't mind to run these Noelia's especially if it is important for the project. ____________ Greetings from TJ | |
ID: 29815 | Rating: 0 | rate: / Reply Quote | |
Ran a few Noelias recently on my 660 TI, no problems finishing them. Did have the issue with drivers crashing after suspending the wu. But just now I suspended the gpu to do some gaming and it didn't crash, odd. | |
ID: 29816 | Rating: 0 | rate: / Reply Quote | |
Just finished my first Noella on a newly acquired 660TI. 10 hours to complete, no problems on Linux Mint, 127,800 credits. Will be rebooting to XP soon to see how they run there. | |
ID: 29817 | Rating: 0 | rate: / Reply Quote | |
I have had one Noelia WU failure and 17 WU's complete successfully. For me that's a slightly higher success rate than with Nate's less challenging WU's. Though I would add that I dropped using another CPU core to improve their performances. | |
ID: 29818 | Rating: 0 | rate: / Reply Quote | |
Problem for me: much too long, won't finish on time on my 470GTX. :( | |
ID: 29849 | Rating: 0 | rate: / Reply Quote | |
No problems at all for me with the Noelia's - 17 have gone through without a hitch - there again I am not overclocking my systems to their limit. | |
ID: 29850 | Rating: 0 | rate: / Reply Quote | |
Not overclocking at all, but this is obviously not the point. :( | |
ID: 29852 | Rating: 0 | rate: / Reply Quote | |
Problem for me: much too long, won't finish on time on my 470GTX. :( Your GTX 470 should be fast enough to complete a NOELIA_klebe workunit in time (about 12 hours). It takes 10h45m ~ 11h45m to process them on my (slightly overclocked) GTX 480. So the problem is at your end, try to restart your host, or try not to crunch any CPU tasks on your P4. | |
ID: 29854 | Rating: 0 | rate: / Reply Quote | |
Thanks, but that will not help I'm afraid, this is apparently not a performance issue. | |
ID: 29855 | Rating: 0 | rate: / Reply Quote | |
Now, the very same WU that was in progress a couple of minutes ago also ended up in error. | |
ID: 29856 | Rating: 0 | rate: / Reply Quote | |
There is a problem on some Linux operating systems; they want to take for ever to complete. I think it's the more recent versions of Linux that are impacted, but not all. It's possible there is something missing in the drivers or Linux that is preventing the correct use of the drivers; missing libraries. | |
ID: 29858 | Rating: 0 | rate: / Reply Quote | |
Finished my first Noelia without errors on XP running a 660TI. Took about 35 minutes longer than the same card on Linux Mint | |
ID: 29859 | Rating: 0 | rate: / Reply Quote | |
There is a problem on some Linux operating systems; they want to take for ever to complete. I think it's the more recent versions of Linux that are impacted, but not all. It's possible there is something missing in the drivers or Linux that is preventing the correct use of the drivers; missing libraries. I'm running Debian Wheezy, so yes, a very recent version of kernel, drivers and libraries. And if I want to install « glibc-2.13-1 » (containing libpthread.so.0 mentioned in the error message), apt tells me that « libc6 » is already installed instead. So yes indeed, might be that this is a choice of library to compile the Linux application that is not compatible with latest versions (but I'm no developer, so that is just an assumption). | |
ID: 29861 | Rating: 0 | rate: / Reply Quote | |
Finished my first Noelia without errors on XP running a 660TI. Took about 35 minutes longer than the same card on Linux Mint While you have only run one task type each on Linux and XP, it looks like Linux Mint (3.5.0-17-generic) is ~5% faster (4.5% for Nathan's and 5.8% for Noelia's): Linux 306px37x2-NOELIA_klebe_run-1-3-RND7661_1 4440201 10 May 2013 | 20:53:39 UTC 11 May 2013 | 7:16:08 UTC Completed and validated 36,653.31 16,521.36 127,800.00 I40R14-NATHAN_dhfr36_6-10-32-RND5144_0 4440304 10 May 2013 | 20:53:39 UTC 11 May 2013 | 12:08:30 UTC Completed and validated 17,866.73 17,627.17 70,800.00 XP 306px2x1-NOELIA_klebe_run-1-3-RND0127_0 4442470 11 May 2013 | 19:28:14 UTC 12 May 2013 | 6:21:56 UTC Completed and validated 38,796.59 17,414.91 127,800.00 I12R11-NATHAN_dhfr36_6-13-32-RND4528_0 4442251 11 May 2013 | 19:29:37 UTC 12 May 2013 | 11:33:18 UTC Completed and validated 18,676.30 18,565.16 70,800.00 All 'Long runs (8-12 hours on fastest card) v6.18 (cuda42)' That's more than I thought it would be (1%, possibly 3%). There might be some task variation, but running on the same system is a very solid. ____________ FAQ's HOW TO: - Opt out of Beta Tests - Ask for Help | |
ID: 29880 | Rating: 0 | rate: / Reply Quote | |
I2HDQ_17R4-SDOERR_2HDQc-1-4-RND1951_0 | |
ID: 30024 | Rating: 0 | rate: / Reply Quote | |
Is there any conspiracy behind it or just incompetence? Neither. Noelia is not to blame, she's just the first to use new features which the project needs in the future. MrS ____________ Scanning for our furry friends since Jan 2002 | |
ID: 30026 | Rating: 0 | rate: / Reply Quote | |
Every time when i have rac about 620k incoming some wrong jobs noelia..but is not the first time when im complained to the problem when I have just about 600-620 rac..It is the third time exactly when again I have a problem of Noelia and just when I got 620k rac, it is amazing Mr. Scientist----------And that's your answer, Mr. moderator? | |
ID: 30031 | Rating: 0 | rate: / Reply Quote | |
It's obviously an international conspiracy to keep your RAC low. We're all involved and participate in LJRAC (Lowering Jozef's Recent Average Credit). BTW, the checks are in the mail... | |
ID: 30038 | Rating: 0 | rate: / Reply Quote | |
Well I have two systems with nVidia cards, slow ones though. However they do Noelia's without problems so far, taking between 2 and 3 days. The systems are stable nothing is overclocked and not the latest BOINC or drivers. If it works than I leave it as is. If not I'll try to update the video drivers. One CPU core is always free, that seems to be important. | |
ID: 30039 | Rating: 0 | rate: / Reply Quote | |
If it works than I leave it as is. Always the best advice. Unfortunately I test stupid problems and get errors for my efforts. Today while testing something and looking into another issue/fix, I had to suspend WU's. This caused the driver to restart two or three times, and then I got a blue screen. On reboot lots of C+ errors and all my running WU's crashed and burned. Not an issue if I had been running FightMalaria@home, but I was running 5 climate models and lost several hundred hours - scunnered! Possible fix here - works for me. ____________ FAQ's HOW TO: - Opt out of Beta Tests - Ask for Help | |
ID: 30047 | Rating: 0 | rate: / Reply Quote | |
I was running 5 climate models and lost several hundred hours - scunnered! Ouch. | |
ID: 30049 | Rating: 0 | rate: / Reply Quote | |
I have read here in the forum many times that suspending a GPUGRID WU will cause error and blue screen. That is why I have never tried it. For Albert and Einstein at home it can be done without harm (in my case). | |
ID: 30054 | Rating: 0 | rate: / Reply Quote | |
Hi all, | |
ID: 30058 | Rating: 0 | rate: / Reply Quote | |
I think it's pretty safe to say that with the curent Noelias suspending a WU almost certainly triggers a driver reset. For me this has taken down a few hours of Einstein work, twice. Now I do my testing whenever I have other WUs running. Not ideal, but better than the alternative. | |
ID: 30060 | Rating: 0 | rate: / Reply Quote | |
The suspend-restart blue screen has never happened to me and I suspend quite often (Windows XP Pro x64). Maybe it's an OS specific issue, I also have my checkpoints set to 900 seconds (15 minutes), I did this mainly for the climate models I run. I do have problems when finishing a SDOERR and starting a NOELIA on the same GPU, no crashes, just the card running wild on the GPU clock. | |
ID: 30075 | Rating: 0 | rate: / Reply Quote | |
I've only seen the 'suspend & crash' problem on W7. Saying as different OS's handle the drivers differently it's bound to be OS related. | |
ID: 30088 | Rating: 0 | rate: / Reply Quote | |
I've only seen the 'suspend & crash' problem on W7. Add W8 to that! MrS ____________ Scanning for our furry friends since Jan 2002 | |
ID: 30090 | Rating: 0 | rate: / Reply Quote | |
This caused the driver to restart two or three times, and then I got a blue screen. On reboot lots of C+ errors and all my running WU's crashed and burned. Not an issue if I had been running FightMalaria@home, but I was running 5 climate models and lost several hundred hours - scunnered! Sometimes I do think it is not necessary the GPUGRID WUs, which causes the bluescreen, I think it might be also the CLIMATEPREDICTION.NET WUs: I just had a bluescreen around the same time as you, and then one of the CLIMATEPREDICTION.NET WUs did not work anymore, and the GPUGRID did continue. However it mostly on a system with a GTX 570 card. | |
ID: 30098 | Rating: 0 | rate: / Reply Quote | |
The climate models (CPDN) are very, very sensitive to any kind of an interruption. When I set my checkpoints to every 15 minutes, my computation error rate dropped by 70% and if I do 3 or more suspend/restarts within 10 minutes, I'll get at least 1 error. | |
ID: 30100 | Rating: 0 | rate: / Reply Quote | |
The climate models (CPDN) are very, very sensitive to any kind of an interruption. When I set my checkpoints to every 15 minutes, my computation error rate dropped by 70% and if I do 3 or more suspend/restarts within 10 minutes, I'll get at least 1 error. I've been wondering about CPDN, because the people reporting crashes often mention that they loose CPDN work. I'm not running that project and also have never had any hard crashes, nothing but some ACEMD errors on certain WU types. Nothing else running on the machine is ever affected. | |
ID: 30115 | Rating: 0 | rate: / Reply Quote | |
Hi all, | |
ID: 30286 | Rating: 0 | rate: / Reply Quote | |
My GTX 650Ti is currently working on a NOELIA, but the GPU utilization looks pretty low: elapsed 10h, remaining 17h. That will be a total of 27 hours! A previous SDOERR took 18h. On my 650 Ti GPUs (and others) the GPU utilization runs 5-6% lower on NOELIA and NATHAN_KID WUs than on SDOERR WUs. (Win7-64) | |
ID: 30293 | Rating: 0 | rate: / Reply Quote | |
Is Prefer Maximum Performance 'presently' selected (as in, did you set it since you last rebooted)? | |
ID: 30294 | Rating: 0 | rate: / Reply Quote | |
PS. Finger pointed at CPDN (no system or WU failures since I stopped crunching climate models)! I suspected such. Maybe a conflict between the apps? | |
ID: 30298 | Rating: 0 | rate: / Reply Quote | |
Possibly, or with Boinc, but on at least 2 occasions it/something has caused a blue screen/restart, which would have prevented Boinc and running apps from closing down properly. Most likely it was the CPDN app/WU's that caused Windows to fail, and the GPUGrid WU failures were just co-incidental. I had presumed it was the GPUGrid app that had failed triggering everything else to fail, due to the startup error messages and logs. I didn't think it was the CPDN apps as several models had been running for several days. Since I've stopped running the CPDN WU's, I've had no problems... | |
ID: 30311 | Rating: 0 | rate: / Reply Quote | |
I run CPDN too and had BSoD's a while back and traced my problem to the USB 3.0, I uninstalled the drivers, rebooted, went in to the BIOS and turned off USB 3.0. That was 4 to 5 months ago and it seems to have fixed it, maybe CPDN and the new USB doesn't get along. I figured that the drivers weren't mature enough, I don't need it so it's no big deal for me (knocking on wood). | |
ID: 30312 | Rating: 0 | rate: / Reply Quote | |
Is Prefer Maximum Performance 'presently' selected (as in, did you set it since you last rebooted)? I don't use the Nvidia tool (nvidia-settings in Linux), as this is a headless machine, so effectively everything is "stock". I aborted the NOELIA, as it was going to take waaaay too long (something like 4 days!) and didn't want to risk a midway - or worse - failure. I am crunching a NATHAN_KID right now at full speed, the GPU at 60-something degrees and a whole CPU core consumed. Total estimated runtime at ~22h. Bottom line, it has to be something with the NOELIAs, at least some of them.Here is the WU discussed. | |
ID: 30314 | Rating: 0 | rate: / Reply Quote | |
There's a discussion about 319.17 performing poorly for someone else running Ubuntu Server 12.04 x86_64. Going back to 310.44 fixed it for him. | |
ID: 30319 | Rating: 0 | rate: / Reply Quote | |
I also saw a driver restart with Einstein (couple of weeks ago) and a POEM WU yesterday (vista rig). So it's common to many NVidia apps and WDDM OS's. The reg fix I posted has thus far prevented the driver restarts, but not the BSD/Restarts. So, two different issues. | |
ID: 30326 | Rating: 0 | rate: / Reply Quote | |
My last task on the GTX285 was a NATHAN and took 196,443.98 seconds. That is 10,000 seconds than a NOELIA on the same card. | |
ID: 30328 | Rating: 0 | rate: / Reply Quote | |
My GTX650TiBoost finished a NOELIA_klebe in 13h 51min (49,849sec). Ubuntu 13.04, NVidia 304.88, Boinc 7.0.27. | |
ID: 30337 | Rating: 0 | rate: / Reply Quote | |
TJ, you really should sell that GTX285 heater and get something new (cheap to buy, much faster, less expensive to run). A GTX650Ti would more than triple the performance, a GTX650TiBoost would almost quadruple it and a GTX660 would be around 4.5 times as fast. I did. That is way i wrote the "last task" it's out the rig and the new GTX660 is in it. Running MilkyWay now for testing, but seems slower to me as a WU takes longer to finish around 8 minutes was 6 minutes on the old GTX285, but the WU's are not the same. But all seems slow, even browsing, I posted this under the cards threat. Indeed a heather, the GTX660 is cooler now 54°C ____________ Greetings from TJ | |
ID: 30344 | Rating: 0 | rate: / Reply Quote | |
MilkyWay requires FP64 (double precision). It's a bad project for GK104 cards. | |
ID: 30348 | Rating: 0 | rate: / Reply Quote | |
Just a quick question: I run a 8600 GT and a 9800 GT and some 8400 (never dump old gear) - not GPUGRID rather PRIMEGRID - as I am still thinking of buying a GTX 660 up to GTX 770 in the near future, but looking on my electric bill from last month (roughly USD 350.00) I was discouraged to invest in an additional card. So as you have discussed the GTX 285 at this very moment, do you think that it would pay off to buy a GTX 650ti and dump the 9800 GT and the 8600 GT, the later is causing trouble with about 10000 credits each day. The 8400 have just been lying around so I thought why not put in some computers. | |
ID: 30357 | Rating: 0 | rate: / Reply Quote | |
After a quick look I think that PG's credits are roughly comparable to here and a 650Ti is a descent enough GPU for here. | |
ID: 30360 | Rating: 0 | rate: / Reply Quote | |
I seconds SKs suggestion. And you wouldn't have to throw those Core 2 Duos away: they're still decent surf stations / office boxes, especially if equipped with 4 GB RAM and an SSD. | |
ID: 30385 | Rating: 0 | rate: / Reply Quote | |
If the $350/month only came down by ~$50 a GTX650Ti would pay for itself in a couple of months, and a 660 would probably pay for itself within 3 or 4 months, just by getting rid of the old cards running cost. That’s what I was thinking, dismiss all the old cards and replace it with an GTX 650 Ti and save enouth in th electric bill to justify the investment. A 8400GS has a TDP of between 25W and 38W, so there isn't a lot of electric being used per GPU. Does about 3371 credits a day in PG = 88 credits / 1 W The 8600GT only has a TDP of 43W, so again there isn't much of a saving from one, but neither of these cards can do much crunching anyway. Around 10000 credits a day in PG = 232 /1 W The 9800GT however varies between 59W and 125W, depending on the model (?) Around 25000 credits a day in PG = 200 / 1W I see you have two 8400GS GPU's running at PG, along with an 8600GT and a 9800GT. These are bringing you a RAC of <35K for ~(40or65+35+60to110)W = 125W to 210W (depending on the models). Obviously your choice what you crunch with and who you crunch for, but if you pulled those GPU's and ran a GTX650Ti here you could get a RAC of ~190,000, and for ~100W. So you would be saving some electric (between 35W and 120W) and earning more than five times the credits. Presuming your 9800GT has either a 105W or a 125W TDP (it's not a GE version) then the Electric saving would be at least 50W. Just made me thinking, my GTX 570 (Factory overclocked) has a TDP of 218 W and gives around 250000 Credits a /day… ok you can’t throw it away, because of the grey energy after 2 years of operation… so, it should be a GTX 650 TI 2 GB and not a boast because of the TDP? If you want to save further on the Electric front, get rid of your E6750 system, and possibly your E8500. Even if you just stopped using those CPU's to crunch you would save as much as you would from getting rid of an old GPU. I seconds SKs suggestion. And you wouldn't have to throw those Core 2 Duos away: they're still decent surf stations / office boxes, especially if equipped with 4 GB RAM and an SSD. PG, Einstein and Collatz are just a side show my real interest is on climateprediction.net, GPUGRID (skgiven there we have the same interests) and to some extend Malariacontrol. So I thought I will use all my gear I have for BOINC, but the Electric Bill made me think... So the two Core 2 Duos will come in just if there is climateprediction work around in the future. MrS you are right the two Core 2 Duos are just reserve computers, if I have some Practicants or an other help how needs a computer - work well for them. Finally about the Bluescreen`s topic I still think Climateprediction.net goes very well along with GPUGRID. | |
ID: 30437 | Rating: 0 | rate: / Reply Quote | |
Message boards : Number crunching : NOELIAs are back!