Message boards : Number crunching : all WUs downloaded recently produce "computation error" right away
Author | Message |
---|---|
this happens since about 20 minutes ago: | |
ID: 46856 | Rating: 0 | rate: / Reply Quote | |
Same here, all units are failing with this error message: | |
ID: 46857 | Rating: 0 | rate: / Reply Quote | |
Same, http://www.gpugrid.net/workunit.php?wuid=12499116 5 WU Failed on one computer, probably hit daily quota and is disabled now. | |
ID: 46858 | Rating: 0 | rate: / Reply Quote | |
thanks, folks, for the quick replies. | |
ID: 46859 | Rating: 0 | rate: / Reply Quote | |
Hi, | |
ID: 46860 | Rating: 0 | rate: / Reply Quote | |
Same thing here, quota maxed. | |
ID: 46861 | Rating: 0 | rate: / Reply Quote | |
Hi! | |
ID: 46862 | Rating: 0 | rate: / Reply Quote | |
I too am getting "Computation Errors" on all my GPUGrid tasks, currently. Server state Over Outcome Computation error Client state Compute error Exit status -44 (0xffffffffffffffd4) Unknown error number Stderr output <core_client_version>7.7.2</core_client_version> <![CDATA[ <message> (unknown error) - exit code -44 (0xffffffd4) </message> ]]> | |
ID: 46863 | Rating: 0 | rate: / Reply Quote | |
I too am getting "Computation Errors" on all my GPUGrid tasks, currently. yes, this is the remarkable thing - the error occurs on ANY type of task. No idea, why :-( We can only hope that someone from GPUGRID notices this problem ASAP. Whether it can be solved quickly - that's another question. | |
ID: 46864 | Rating: 0 | rate: / Reply Quote | |
Same issues as listed above, regardless of long/short WUs. Hope we can hear back from the team soon! | |
ID: 46867 | Rating: 0 | rate: / Reply Quote | |
Matt's FAQ: What do the application error codes signify? lists "-44 The computer's date is wrong." Mine still looks OK, but I'm getting the same errors as everybody else. | |
ID: 46868 | Rating: 0 | rate: / Reply Quote | |
I experience the same behavior. | |
ID: 46869 | Rating: 0 | rate: / Reply Quote | |
Same here :-) | |
ID: 46872 | Rating: 0 | rate: / Reply Quote | |
Same here | |
ID: 46873 | Rating: 0 | rate: / Reply Quote | |
50 failed tasks since 14:46 UTC on the 14th of April. They continue to come and fail. | |
ID: 46875 | Rating: 0 | rate: / Reply Quote | |
Interestingly enough: on the Project Status page, the number of unsent tasks is going up continuously. | |
ID: 46876 | Rating: 0 | rate: / Reply Quote | |
Interestingly enough: on the Project Status page, the number of unsent tasks is going up continuously.It's because all active hosts have used up their daily quota. | |
ID: 46877 | Rating: 0 | rate: / Reply Quote | |
Interestingly enough: on the Project Status page, the number of unsent tasks is going up continuously.It's because all active hosts have used up their daily quota. so I guess these are the tasks that are being generated automatically, right? | |
ID: 46878 | Rating: 0 | rate: / Reply Quote | |
In this case no. These are simply the failed workunits waiting to be resent to another host, but there's none to send to, because all have used up their dailiy quota.so I guess these are the tasks that are being generated automatically, right?Interestingly enough: on the Project Status page, the number of unsent tasks is going up continuously.It's because all active hosts have used up their daily quota. | |
ID: 46879 | Rating: 0 | rate: / Reply Quote | |
50 failed tasks since 14:46 UTC on the 14th of April. They continue to come and fail. Keep calm, and set up a backup project (that is a project which has 0 resource share set on the project's webpage). I suggest Einstein@home or SETI@home. | |
ID: 46880 | Rating: 0 | rate: / Reply Quote | |
These are simply the failed workunits waiting to be resend to another host, but there's none to send to, because all have used up their dailiy quota. Which means that all the WUs that were faulty to begin with, will be "recycled", so to speak; and at some point, there will be several thousand faulty WUs in the queue :-( So I am curious how this pile of junk will be successfully cleaned up :-) | |
ID: 46881 | Rating: 0 | rate: / Reply Quote | |
My first failure was at | |
ID: 46882 | Rating: 0 | rate: / Reply Quote | |
Here's an interesting one: WU 12499196. | |
ID: 46883 | Rating: 0 | rate: / Reply Quote | |
I have updated drivers Richard on my 980ti but it won't pick up new app. I have reset project and still 8.48 cuda 6.5 | |
ID: 46884 | Rating: 0 | rate: / Reply Quote | |
I experience the same behavior. I'd like to further clarify that. If you suspend in-progress tasks, then resume them, they will fail. I just lost tons of work that way :) I smile, because it's all I can do. It happens. Just wanted to add that suspending and restarting the task itself, is also a problem. Backup projects (attached with 0 resource share) are starting to kick in for me. | |
ID: 46885 | Rating: 0 | rate: / Reply Quote | |
I'm aware of that problem, but that gives a different error message in stderr.txtI experience the same behavior. EDIT: maybe I don't remember it right, and the error code / message is the same, but my tasks did not error out after a restart earlier. | |
ID: 46886 | Rating: 0 | rate: / Reply Quote | |
I have updated drivers Richard on my 980ti but it won't pick up new app. I have reset project and still 8.48 cuda 6.5 Sampling through a few of the highest-RAC users on my way to bed, it looks as if all their 970/980 cards are erroring tasks, but all their 1070/1080 cards are working normally. There's a debug clue in there somewhere. Edit - including Retvari's single active 1080, host 23631 | |
ID: 46887 | Rating: 0 | rate: / Reply Quote | |
Here's an interesting one: WU 12499196. My GTX 1080 is working fine with the 9.15 app under Windows 10. There was an announcement this week that v9.15 was now available to all supported GPU generations: my older ones haven't picked it up, probably because I haven't updated my drivers recently. But just maybe, the current tasks require v9.15? That's one to test in the morning. I don't think so. It's more likely that some dll stopped working after a given date, that is 04.14.2017. It could be a licensing limitation, or other time limit which is expired. | |
ID: 46888 | Rating: 0 | rate: / Reply Quote | |
There was an announcement this week that v9.15 was now available to all supported GPU generations: my older ones haven't picked it up, probably because I haven't updated my drivers recently. But just maybe, the current tasks require v9.15? That's one to test in the morning. I've downloaded 4 new tasks with my main cruncher PC, then I've set the date on this PC to 04.13.2017, and I've started the GPUGrid tasks. Guess what? It's crunching! Yes, the 8.48 app. So there's a date limit somewhere in the 8.48 app. | |
ID: 46889 | Rating: 0 | rate: / Reply Quote | |
Great find, Retvari! That should help the devs to solve it as quickly as they can! | |
ID: 46890 | Rating: 0 | rate: / Reply Quote | |
I've downloaded 4 new tasks with my main cruncher PC, then I've set the date on this PC to 04.13.2017, and I've started the GPUGrid tasks. Guess what? It's crunching! Yes, the 8.48 app. So there's a date limit somewhere in the 8.48 app. I've tried to do this, however, I got stuck with "the computer has finished the daily quota of 1 task" - HOW NICE :-( Slowly but surely I am kind of fed up by GPUGRID. I'm getting more and more impression (like one of the posters above) that they don't take their work serious enough :-( | |
ID: 46892 | Rating: 0 | rate: / Reply Quote | |
Great find, Retvari! That should help the devs to solve it as quickly as they can! Peace and love, thanks great Retvari, good Easter to all ... be patient ! K. ____________ Dreams do not always come true. But not because they are too big or impossible. Why did we stop believing. (Martin Luther King) | |
ID: 46895 | Rating: 0 | rate: / Reply Quote | |
... be patient ! I am afraid that my patience is overstreched by now - every month a major problem which makes GPUGRID crunching impossible for several days :-((( | |
ID: 46896 | Rating: 0 | rate: / Reply Quote | |
So there's a date limit somewhere in the 8.48 app. My suspicion (unverified) is that the problem might lie with tcl84.dll That's been replaced with tcl86.dll in v9.14/5, and https://www.activestate.com/activetcl seem to have a rather curious licencing regime: Business and Enterprise Editions provide access to older Tcl versions: I'll play around with some options later. | |
ID: 46897 | Rating: 0 | rate: / Reply Quote | |
So there's a date limit somewhere in the 8.48 app. I've tried to replace tcl84.dll with tcl86.dll by renaming the latter (and setting don't check file sizes in cc_config.xml), but then I got a different error: There are no child processes to wait for.
(0x80) - exit code 128 (0x80) See this task. | |
ID: 46898 | Rating: 0 | rate: / Reply Quote | |
Zoltan wrote: I've downloaded 4 new tasks with my main cruncher PC, then I've set the date on this PC to 04.13.2017, and I've started the GPUGrid tasks. Guess what? It's crunching! For me, this worked on the two Windows 10 PCs. On my main crunching PC with two GTX980Ti and XP, I unfortunately had the "limit of daily tasks" problem (as mentioned earlier here), on the other one with the GTX750Ti and XP, after changing the date backwards (to 04.13.2017), none of the buttons on the left side of the BOINC manager did react any more. So I could not do what I had intended. Only after changing the date back to real, the BOINC manager worked again. So no chance to apply this "date trick" on XP, at least not on mine :-( | |
ID: 46899 | Rating: 0 | rate: / Reply Quote | |
So there's a date limit somewhere in the 8.48 app. I had the same idea, but tried a different route: I wrapped up the existing files in an app_info.xml, and then changed the tcl file reference to supply a copy of tcl86.dll No dice: instead, I got error 0xC000007B
STATUS_INVALID_IMAGE_FORMAT (task 16233669 - confirmed that this related to the tcl change with some offline tests) This machine is Windows 7 with a GTX 970 and (currently) a maximum cuda 7.0 driver. Next steps will be to try a cuda 8.0 driver and see what the project sends me: if it's still v8.48, I'll try putting v9.15 into an app_info. | |
ID: 46900 | Rating: 0 | rate: / Reply Quote | |
Sad to report that both approaches failed. A normal work fetch got me v8.48 even with a cuda 8.0 driver, and it failed with the clock error as before. 15/04/2017 12:31:21 | GPUGRID | [cpu_sched] Starting task e14s3_e11s4p0f35-ADRIA_FOLDGREED10_crystal_ss_contacts_20_ubiquitin_4-0-1-RND0892_0 using acemdlong version 915 (cuda80) in slot 1 15/04/2017 12:31:24 | GPUGRID | Task e14s3_e11s4p0f35-ADRIA_FOLDGREED10_crystal_ss_contacts_20_ubiquitin_4-0-1-RND0892_0 exited with zero status but no 'finished' file 15/04/2017 12:31:24 | GPUGRID | If this happens repeatedly you may need to reset the project. - the app quits silently with no error number, and doesn't even have time to start writing a stderr.txt file or to write anything to the _0_0 result file (aka 'progress.log'). The only evidence that the app has even tried to run is a 'canary' file in the slot directory. The only diagnostics output I can get is from a command prompt: D:\BOINCdata\slots\1>acemd.915-80.exe # ACEMD Molecular Dynamics Version [3212] # CUDA Synchronisation mode: BLOCKING # CUDA Synchronisation mode: BLOCKING # SWAN: Created context 0 on GPU 0 SWAN : FATAL : Cuda driver error 35 in file 'swanlibnv2.cpp' in line 448. # SWAN swan_assert 0 Card data is 15/04/2017 12:28:16 | | CUDA: NVIDIA GPU 0: GeForce GTX 970 (driver version 368.81, CUDA version 8.0, compute capability 5.2, 4096MB, 3066MB available, 4087 GFLOPS peak) I think I'm stuck until the staff are back in the lab. | |
ID: 46902 | Rating: 0 | rate: / Reply Quote | |
I talked with Matt. He says that it's probably the license that time-expired. Updating the drivers will get the cuda 8 app which should fix it. | |
ID: 46903 | Rating: 0 | rate: / Reply Quote | |
For a more correct solution we will have to wait for Matt to update the old app next week. In the meanwhile as I said updating drivers should do it | |
ID: 46904 | Rating: 0 | rate: / Reply Quote | |
which driver version is necesary and which driver version is save? | |
ID: 46905 | Rating: 0 | rate: / Reply Quote | |
... updating drivers should do it which might be impossible, or at least risky in case of Windows XP; Zoltan, what's your opinion on this? | |
ID: 46908 | Rating: 0 | rate: / Reply Quote | |
My drivers are locked to the versions that came with the devices: changing drivers causes failures. | |
ID: 46909 | Rating: 0 | rate: / Reply Quote | |
For a more correct solution we will have to wait for Matt to update the old app next week. In the meanwhile as I said updating drivers should do it What the crap, Stefan? :) I'm already using the latest drivers! My failures are on Windows 10, using 381.65 and 381.78. Please provide more details on what drivers you think should work, and also why failures still happen on 381.65 and 381.78. Edit: I'm not 100% sure that I've been able to attempt a task using 381.78 yet. | |
ID: 46910 | Rating: 0 | rate: / Reply Quote | |
... My failures are on Windows 10, using 381.65 and 381.78. Please provide more details on what drivers you think should work, and also why failures still happen on 381.65 and 381.78. I was just going to ask here whether some-one has already tried the latest drivers - your posting answers my question, although in the negative sense. So Matt's assumption that the latest drivers should solve the current problem unfortunately seems to be wrong :-( | |
ID: 46911 | Rating: 0 | rate: / Reply Quote | |
The problem should now be fixed for anyone with a CUDA 8-capable driver. | |
ID: 46912 | Rating: 0 | rate: / Reply Quote | |
I see you've deprecated v8.48 completely, but left v9.15 (superficially - as far as we can see) unchanged. I couldn't get it to work earlier, but I'll try again within the hour - test machine is busy with another project just at the moment. | |
ID: 46914 | Rating: 0 | rate: / Reply Quote | |
The problem should now be fixed for anyone with a CUDA 8-capable driver. which means that for Windows XP users, the problem is NOT solved yet, right? When will this be the case? | |
ID: 46915 | Rating: 0 | rate: / Reply Quote | |
I've changed the rules for issuing the 915 version. Any Windows machine that is 64 bit and reports CUDA 8.0 capability will get it now. | |
ID: 46916 | Rating: 0 | rate: / Reply Quote | |
I've changed the rules for issuing the 915 version. Any Windows machine that is 64 bit and reports CUDA 8.0 capability will get it now. So which steps will be taken next to enable older drivers for XP to work? My XP with driver 368.81 did download version 915, the task did start, but was broken off after a few minutes with "too many exit(0)s" | |
ID: 46917 | Rating: 0 | rate: / Reply Quote | |
I was expecting it to work on 64 bit XP, actually. Given that it doesn't there's not a tremendous amount I can do to fix it immediately. We haven't had an XP test platform for a long time: Microsoft's ended support for it 3 years ago! You really should upgrade... Matt | |
ID: 46919 | Rating: 0 | rate: / Reply Quote | |
OK, let's put XP to bed - I think it's a red herring in this case. Task e4s7_e2s3p0f357-ADRIA_FOLDGREED10_crystal_ss_contacts_100_ubiquitin_4-1-2-RND7142_0 exited with zero status but no 'finished' file until BOINC kills the task with the 'Too many exits' after 100 tries - exactly the message Erich got under XP. No difference between the OS versions - this difference applies to the hardware (different generations of GPU). It seems to have changed with this new batch of tasks, since the initial test release a week ago. Running standalone in a terminal window, I get D:\BOINCdata\slots\0>acemd.915-80
# ACEMD Molecular Dynamics Version [3212]
# CUDA Synchronisation mode: BLOCKING
# CUDA Synchronisation mode: BLOCKING
# SWAN: Created context 0 on GPU 0
SWAN : FATAL : Cuda driver error 35 in file 'swanlibnv2.cpp' in line 448.
# SWAN swan_assert 0 - that's the only diagnostic I've been able to capture. Nothing is written to the output or stderr files. Test task is 16240262 - I'll let it run through its 100 exits and report it as soon as I've posted this, so you can compare my Windows 7 output with Erich's XP. | |
ID: 46920 | Rating: 0 | rate: / Reply Quote | |
Matt, It's a bit off-topic, but let me explain: These Windows XP x64 hosts are dedicated crunching boxes (therefore it does not matter if their OS is not supported anymore). A lot of effort have been put into them to make the GTX 980Ti work under Windows XP, selecting the right MB, "hacking" the NV driver to recognize the top-end cards, etc. The reason for *not* to upgrade them from Windows XP is to maximize their throughput (avoiding WDDM). The other path to achieve this is to use Linux, but you haven't put the SWAN_SYNC option into the latest Linux client (as far as my test proved it, but please correct me if I'm wrong), which hinders the performance of the top-end cards under Linux too. So you could motivate us to use Linux instead of the deprecated Windows XP if you would put that option in the Linux client, it could also increase the performance of the top end cards by 10~15% under Linux. But for now, if you could make a fresh CUDA 6.5 client, that would be great (and it would save us a lot of work). Thank you in advance! | |
ID: 46921 | Rating: 0 | rate: / Reply Quote | |
Richard, | |
ID: 46922 | Rating: 0 | rate: / Reply Quote | |
I was running the same cuda 7.0 driver version on all machines until this morning - I upgraded this morning for testing only. | |
ID: 46924 | Rating: 0 | rate: / Reply Quote | |
Jacob was testing before I'd changed the issuing rules for 915 - he never even go the app to test, let alone see any failures. | |
ID: 46925 | Rating: 0 | rate: / Reply Quote | |
Sure, anything I can do to help. Supper has just beeped in the microwave, but I'll download while I eat, and install later. | |
ID: 46926 | Rating: 0 | rate: / Reply Quote | |
MJH: | |
ID: 46927 | Rating: 0 | rate: / Reply Quote | |
Specify *which* problem, please. | |
ID: 46928 | Rating: 0 | rate: / Reply Quote | |
| |
ID: 46929 | Rating: 0 | rate: / Reply Quote | |
MJH: <core_client_version>7.7.2</core_client_version> <![CDATA[ <message> (unknown error) - exit code -59 (0xffffffc5) </message> <stderr_txt> # GPU [GeForce GTX 660 Ti] Platform [Windows] Rev [3212] VERSION [80] # SWAN Device 1 : # Name : GeForce GTX 660 Ti # ECC : Disabled # Global mem : 3072MB # Capability : 3.0 # PCI ID : 0000:07:00.0 # Device clock : 1045MHz # Memory clock : 3004MHz # Memory width : 192bit # Driver version : r381_64 : 38178 #SWAN: FATAL: cannot find image for module [.nonbonded.cu.] for device version 300 </stderr_txt> ]]> | |
ID: 46930 | Rating: 0 | rate: / Reply Quote | |
Interesting. The machines I'm having problems with also have dual GPUs of different vintages - mine have a secondary GTX 750Ti in both cases. But GPUGrid is excluded from the 750s, and only runs on the 970s. | |
ID: 46931 | Rating: 0 | rate: / Reply Quote | |
For some reason the sm 3.0 support (and only that sm version) is broken. | |
ID: 46932 | Rating: 0 | rate: / Reply Quote | |
Richard the problem with your machines is (at least) the driver version. | |
ID: 46933 | Rating: 0 | rate: / Reply Quote | |
Thanks MJH. | |
ID: 46934 | Rating: 0 | rate: / Reply Quote | |
9.16 should be along in about 15 mins. | |
ID: 46935 | Rating: 0 | rate: / Reply Quote | |
Failing on all machines. | |
ID: 46936 | Rating: 0 | rate: / Reply Quote | |
Richard the problem with your machines is (at least) the driver version. I was just coming to that conclusion myself. Clean install of 381.65 completed, machine rebooted, and task e67s40_e47s2p0f68-PABLO_P04637_0_IDP-0-1-RND0199_4 is running normally. But that's an old task from 13 April, with three previous v9.15 failures. Too late to investigate whether they might have lower drivers too, or some other problem. I'd like to verify that tasks like yesterday's ADRIA_FOLDGREED10_crystal_ss_contacts_100_ubiquitin batch run OK before we completely sign this one off, but that can wait until tomorrow (or later next week). Apologies for interrupting your weekend - hope you can have a good break in what remains of it. Edit - OK, I peeked :-) Task 16239400 has the ERR_TOO_MANY_EXITS problem, and it's described as GeForce GTX 1060 6GB (4095MB) driver: 368.81 We're going to have to work out where the break-point occurs in the 360+ driver sequence, and put out an APB to upgrade - or a min_version in the plan_class. Next week. G'night. | |
ID: 46937 | Rating: 0 | rate: / Reply Quote | |
916 is out now: this should work with sm 300 GPUs | |
ID: 46938 | Rating: 0 | rate: / Reply Quote | |
OK, when I start upgrading my other two tomorrow morning I'll start with 372.54, and if that works, probably stick at 373.06 (first and last of the 372 series respectively). If that doesn't work, rinse and repeat with 375.63 / 376.33 and so on. | |
ID: 46939 | Rating: 0 | rate: / Reply Quote | |
MJH: Server state Over Outcome Computation error Client state Compute error Exit status -52 (0xffffffffffffffcc) Unknown error number Computer ID 153764 Report deadline 20 Apr 2017 | 23:01:28 UTC Run time 2.88 CPU time 0.00 Validate state Invalid Credit 0.00 Application version Long runs (8-12 hours on fastest card) v9.16 (cuda80) Stderr output <core_client_version>7.7.2</core_client_version> <![CDATA[ <message> (unknown error) - exit code -52 (0xffffffcc) </message> <stderr_txt> # GPU [GeForce GTX 660 Ti] Platform [Windows] Rev [3212] VERSION [80] # SWAN Device 1 : # Name : GeForce GTX 660 Ti # ECC : Disabled # Global mem : 3072MB # Capability : 3.0 # PCI ID : 0000:07:00.0 # Device clock : 1045MHz # Memory clock : 3004MHz # Memory width : 192bit # Driver version : r381_64 : 38178 SWAN : FATAL Unable to load module .nonbonded.cu. (300) </stderr_txt> ]]> | |
ID: 46940 | Rating: 0 | rate: / Reply Quote | |
I updated my windows 10 computer to nvidia driver 381.65 from 359.06. Everything is running fine so far, but this driver is slightly slower. | |
ID: 46941 | Rating: 0 | rate: / Reply Quote | |
Now, what is going to happen to windows xp? I would like to see it supported a little bit longer, and please don't tell me to upgrade. +1 | |
ID: 46943 | Rating: 0 | rate: / Reply Quote | |
My GTX750Ti - driver 376.53 & GTX1050Ti - driver 378.92 are working fine now. | |
ID: 46944 | Rating: 0 | rate: / Reply Quote | |
9.18 is also not working for my GTX 660 Ti GPUs. Stderr output <core_client_version>7.7.2</core_client_version> <![CDATA[ <message> (unknown error) - exit code -52 (0xffffffcc) </message> <stderr_txt> # GPU [GeForce GTX 970] Platform [Windows] Rev [3212] VERSION [80] # SWAN Device 0 : # Name : GeForce GTX 970 # ECC : Disabled # Global mem : 4096MB # Capability : 5.2 # PCI ID : 0000:09:00.0 # Device clock : 1367MHz # Memory clock : 3505MHz # Memory width : 256bit # Driver version : r381_64 : 38178 # GPU 0 : 68C # GPU 1 : 67C # GPU 2 : 54C # GPU 2 : 57C # GPU 2 : 58C # GPU 2 : 59C # GPU 1 : 68C # GPU 2 : 60C # GPU 1 : 69C # GPU 1 : 71C # GPU 1 : 72C # GPU [GeForce GTX 970] Platform [Windows] Rev [3212] VERSION [80] # SWAN Device 0 : # Name : GeForce GTX 970 # ECC : Disabled # Global mem : 4096MB # Capability : 5.2 # PCI ID : 0000:09:00.0 # Device clock : 1367MHz # Memory clock : 3505MHz # Memory width : 256bit # Driver version : r381_64 : 38178 # GPU 0 : 67C # GPU 1 : 66C # GPU 2 : 58C # GPU 1 : 69C # GPU 2 : 59C # GPU [GeForce GTX 660 Ti] Platform [Windows] Rev [3212] VERSION [80] # SWAN Device 1 : # Name : GeForce GTX 660 Ti # ECC : Disabled # Global mem : 3072MB # Capability : 3.0 # PCI ID : 0000:07:00.0 # Device clock : 1045MHz # Memory clock : 3004MHz # Memory width : 192bit # Driver version : r381_64 : 38178 SWAN : FATAL Unable to load module .nonbonded.cu. (300) </stderr_txt> ]]> | |
ID: 46947 | Rating: 0 | rate: / Reply Quote | |
I was expecting it to work on 64 bit XP, actually. Given that it doesn't there's not a tremendous amount I can do to fix it immediately. I can only fully underline what Zoltan is saying, and hope that all the many crunchers using XP for good reason will be able to continue for a while. | |
ID: 46950 | Rating: 0 | rate: / Reply Quote | |
I now tried task | |
ID: 46951 | Rating: 0 | rate: / Reply Quote | |
on my Windows 10 64-bit, driver 376.53, acemd 918.80. I now updated the driver to 381.65 and downloaded e5s6_e3s4p0f494-ADRIA_FOLDGREED50_crystal_ss_contacts_100_ubiquitin_3-0-2-RND2192_0 It's been running well for 10 minutes ... so let's keep our fingers crossed. The card is a GTX970. Still I hope that a solution can be found for XP. | |
ID: 46952 | Rating: 0 | rate: / Reply Quote | |
Windows 10/64-bit | |
ID: 46956 | Rating: 0 | rate: / Reply Quote | |
It's been running well for 10 minutes ... so let's keep our fingers crossed. And, unfortunately, even with the new (latest) driver I am experiencing the same problem that I am having since some time ago, and which I desribed in this thread: http://www.gpugrid.net/forum_thread.php?id=4511#46686 After a while (today: after about 2 hours) the GPU clock automatically drops to the "default clock" value 1152 MHz (and power consumption dropping to about 58%). And this can only be changed back to a higher value (via NVIDIA Inspector) after a restart of the PC. BTW, the same thing happens with the GTX750Ti in the other Windows10 PC. This has never ever happened with my two Windows XP PCs. So one more reason NOT to give up XP by switching to Windows10 !!! | |
ID: 46957 | Rating: 0 | rate: / Reply Quote | |
OK, when I start upgrading my other two tomorrow morning I'll start with 372.54, and if that works, probably stick at 373.06 (first and last of the 372 series respectively). I can confirm that both these drivers allow v9.18 (cuda80) tasks to download and run on my GTX 970s under Windows 7. I'll settle on 373.06 - last bugfix for the series. Technically speaking, these are major version 370 drivers, according to the release notes. | |
ID: 46960 | Rating: 0 | rate: / Reply Quote | |
But Richard, | |
ID: 46961 | Rating: 0 | rate: / Reply Quote | |
I'm happy just bumping along the bottom - I'll leave the stratosphere to you :) | |
ID: 46962 | Rating: 0 | rate: / Reply Quote | |
I'm happy just bumping along the bottom - I'll leave the stratosphere to you :) Indeed. Just look at all these PRETTY numbers! There not a non-alpha version to be found! Just the way I like it! :) 4/15/2017 3:21:41 PM | | Starting BOINC client version 7.7.2 for windows_x86_64 4/15/2017 3:21:41 PM | | This a development version of BOINC and may not function properly 4/15/2017 3:21:41 PM | | log flags: file_xfer, sched_ops, task, scrsave_debug, unparsed_xml 4/15/2017 3:21:41 PM | | Libraries: libcurl/7.47.1 OpenSSL/1.0.2g zlib/1.2.8 4/15/2017 3:21:41 PM | | Data directory: E:\BOINC Data 4/15/2017 3:21:41 PM | | Running under account jacob 4/15/2017 3:21:42 PM | | CUDA: NVIDIA GPU 0: GeForce GTX 980 Ti driver version (381.78, CUDA version 8.0, compute capability 5.2, 4096MB, 3962MB available, 7271 GFLOPS peak) 4/15/2017 3:21:42 PM | | CUDA: NVIDIA GPU 1: GeForce GTX 980 Ti driver version (381.78, CUDA version 8.0, compute capability 5.2, 4096MB, 3962MB available, 6060 GFLOPS peak) 4/15/2017 3:21:42 PM | | OpenCL: NVIDIA GPU 0: GeForce GTX 980 Ti driver version (381.78, device version OpenCL 1.2 CUDA, 6144MB, 3962MB available, 7271 GFLOPS peak) 4/15/2017 3:21:42 PM | | OpenCL: NVIDIA GPU 1: GeForce GTX 980 Ti driver version (381.78, device version OpenCL 1.2 CUDA, 6144MB, 3962MB available, 6060 GFLOPS peak) 4/15/2017 3:21:42 PM | | Host name: Speed 4/15/2017 3:21:42 PM | | Processor: 16 GenuineIntel Intel(R) Core(TM) i7-5960X CPU @ 3.00GHz [Family 6 Model 63 Stepping 2] 4/15/2017 3:21:42 PM | | Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss htt tm pni ssse3 fma cx16 sse4_1 sse4_2 movebe popcnt aes f16c rdrandsyscall nx lm avx avx2 vmx tm2 dca pbe fsgsbase bmi1 smep bmi2 4/15/2017 3:21:42 PM | | OS: Microsoft Windows 10: Professional x64 Edition, (10.00.16176.00) 4/15/2017 3:21:42 PM | | Memory: 63.90 GB physical, 73.40 GB virtual 4/15/2017 3:21:42 PM | | Disk: 300.00 GB total, 222.52 GB free 4/15/2017 3:21:42 PM | | Local time is UTC -4 hours 4/15/2017 3:21:42 PM | | VirtualBox version: 5.0.37 | |
ID: 46963 | Rating: 0 | rate: / Reply Quote | |
I was expecting it to work on 64 bit XP, actually. Given that it doesn't there's not a tremendous amount I can do to fix it immediately. + 1 | |
ID: 46964 | Rating: 0 | rate: / Reply Quote | |
Damn, guess this means I will have to move off the 359.06 drivers eh? | |
ID: 46965 | Rating: 0 | rate: / Reply Quote | |
CC 3.0 ... still isn't working on 9.18 ... | |
ID: 46966 | Rating: 0 | rate: / Reply Quote | |
I have the same situation. I was told, that the reason is in obsolete .dll file and bad WU at the server. The problem must be solved within a week or so. | |
ID: 46978 | Rating: 0 | rate: / Reply Quote | |
I can confirm, that the driver version: 373.06 solves the problem for my GTX 970 cards, but not for the GTX 670. It does not download any new WU on this computer. | |
ID: 46979 | Rating: 0 | rate: / Reply Quote | |
After two days of useless requests my computer finally got a new job in GPUGrid. But new `long run` task lasts three times longer, than any previous `long run`! With the same amount of calculations (5 000 000 GFLOPs) on the same machine it is running for 27-29 hours instead of 9-10 before. | |
ID: 46984 | Rating: 0 | rate: / Reply Quote | |
Unfortunately, the "same amount of calculations (5 000 000 GFLOPs)" applies to all tasks assigned to the long queue, and isn't adjusted to reflect the complexity (duration) of the task - even though tasks are assessed before the run starts, so that a proportionate amount of credit can be issued. | |
ID: 46986 | Rating: 0 | rate: / Reply Quote | |
After two days of useless requests my computer finally got a new job in GPUGrid. But new `long run` task lasts three times longer, than any previous `long run`! With the same amount of calculations (5 000 000 GFLOPs) on the same machine it is running for 27-29 hours instead of 9-10 before.The workunits won't take longer than before, only the estimation of the remaining time (what you see) is miscalculated. This time estimation will normalize after a couple of workunits done. As this is a new app, so the BOINC manager has to learn the duration correction ratio for this version. Alternatively you can make an app_config.xml to instruct the BOINC manager to calculate the remaining time based on the fraction done and the time elapsed. Copy the following to the clipboard: notepad c:\ProgramData\BOINC\projects\www.gpugrid.net\app_config.xml Press Windows key + R, then paste and press [enter]. If you see an empty file then copy & paste the following text: <app_config>
<app>
<name>acemdlong</name>
<fraction_done_exact/>
</app>
<app>
<name>acemdshort</name>
<fraction_done_exact/>
</app>
<app>
<name>acemdbeta</name>
<fraction_done_exact/>
</app>
</app_config> If you already have an app_config.xml, then you should only insert the line <fraction_done_exact/> after each line containing the name of the application.Click file -> save and click [save]. Open the BOINC manager, click Options -> read config files. | |
ID: 46987 | Rating: 0 | rate: / Reply Quote | |
Zoltan, thanks for the "fraction_done_exact" app_config.xml, seems very useful. | |
ID: 47000 | Rating: 0 | rate: / Reply Quote | |
Erich56: | |
ID: 47002 | Rating: 0 | rate: / Reply Quote | |
So, if your XML file declares multiple app blocks (say x blocks), and you want all of them to use this setting, then you'll need to add the line x times, in the right places. okay, all clear, thx, seems to work :-) | |
ID: 47004 | Rating: 0 | rate: / Reply Quote | |
Anyone having some weird GPU usage lately? it's stable at 90% for like 3 mins, then 0% (while still showing "Running" in BOINC) for a long time. | |
ID: 47008 | Rating: 0 | rate: / Reply Quote | |
Anyone having some weird GPU usage lately? Yes, this afternoon I happened to notice this behavour on one of my Windows 10 machines - latest software, latest driver. Since I was out for a while, I cannot tell for how long time the CPU usage (as seen in the Windows Task Manager) was at zero. The task (still running) is a Pablo_contact_goal_KIX. But if you can observe this with a ADRIA task as well, then it seems that the fault may rather be with the new software. Anyway, what I noticed already is that with the new software, in Windows 10 crunching is a bit slower, and overclocking even less possible than before. | |
ID: 47017 | Rating: 0 | rate: / Reply Quote | |
this evening, again I noticed this strange behaviour as described above. | |
ID: 47037 | Rating: 0 | rate: / Reply Quote | |
I'm noticing similar behavior on my machine. Running a GTX 1070. I look at GPU core load directly in HWinfo and it will just sit at 0% load for long periods, but from looking at the task history they seem to be completing eventually. Anyone know what's going on with this? | |
ID: 47059 | Rating: 0 | rate: / Reply Quote | |
back at it again | |
ID: 47120 | Rating: 0 | rate: / Reply Quote | |
back at it again No - the WUs seem to be fine at the moment, and your failures since 26 April come from a range of different WU types. The output of your most recent successful task shows Driver version : r376_38 : 37653 but your computer now shows NVIDIA GeForce GTX 970 (4095MB) driver: 381.89 Since you're running Windows 10, I suspect you've suffered from the common 'automatice driver update by Microsoft'. Try updating your driver again, this time direct from the NVidia site. | |
ID: 47121 | Rating: 0 | rate: / Reply Quote | |
Thanks for the detailed answer. Updates have been triggered voluntarily by me, for both Win10 Creator and Nvidia. I'll try to reinstall these drivers now and see if it makes a difference tomorrow. | |
ID: 47123 | Rating: 0 | rate: / Reply Quote | |
Just to clarify .... | |
ID: 47125 | Rating: 0 | rate: / Reply Quote | |
Is it the workunits (some types? all types?) which fail on your GTX 660 Ti, or the new application? | |
ID: 47126 | Rating: 0 | rate: / Reply Quote | |
Just to clarify .... I've got my 660ti still running on the 359.06 driver with cuda 6.5 app and works fine. | |
ID: 47127 | Rating: 0 | rate: / Reply Quote | |
Is it the workunits (some types? all types?) which fail on your GTX 660 Ti, or the new application? Just to clarify .... The 9.18 (cuda80) app crashes on my GTX 660 Ti GPUs that are in the same PC as my GTX 970. To my knowledge, this machine is intentionally and correctly given 9.18 (cuda80) tasks, but there's a problem with the app. MJH said: 15 Apr 2017 | 21:43:26 UTC http://www.gpugrid.net/forum_thread.php?id=4545&nowrap=true#46932 For some reason the sm 3.0 support (and only that sm version) is broken. 17 Apr 2017 | 19:49:15 UTC http://www.gpugrid.net/forum_thread.php?id=4551&nowrap=true#46981 The peculiar exception for sm 3.0 devices is due to a compiler problem with CUDA 80 that affects only that hardware version. When that's fixed, hosts with a non-XP Windows will get 918. ..... But I don't know what that means! Is it a problem that GPUGrid must fix, or is it a problem that NVIDIA must fix? I feel like nobody is trying to fix it. | |
ID: 47128 | Rating: 0 | rate: / Reply Quote | |
17 Apr 2017 | 19:49:15 UTC A Compiler is an integral part of the development software used by computer programmers to create useful applications. In this case, the CUDA 8.0 compiler is maintained and distributed by NVidia to facilitate sales of their hardware products (GPUs). It would be difficult-to-impossible to do anything with a GPU without NVidia's compiler. The CUDA compiler comprises two parts: the first part, which resides in the 'CUDA toolkit' on Matt's machine, produces intermediate code. The second part, which resides in the drivers on all our machines, converts the universal intermediate code into machine code instructions tailored to the specific hardware found in the target computer. Matt hasn't identified (in public, at least) which of the two components he believes to be at fault. Since it's hardware-specific, my personal opinion is that it's likely to be the driver-level component - but I've been wrong before. Either way, both components are the responsibility of NVidia. Any change would have to be implemented and distributed by them. But you've encountered an age-old problem, previously described in terms of putting new wine into old bottles, or teaching old dogs new tricks. When a complex system relies on two symbiotic components (hardware and software, in this case), to what extent is it realistic to expect that every new pairing will work together ad infinitum? Personally, I feel it's advantageous to keep computer systems 'balanced' - with hardware and software of a comparable vintage. My trusty and long-serving 9800 GTs have joined my Windows 3 computers in the museum - I haven't tried to convert them to run Cuda 8 or Windows 10. I suggest that, if you feel GTX 660 Ti cards are still energy-efficient enough to be useful, you put them into a chassis with a similar vintage of operating system and a Cuda 6.5 driver. | |
ID: 47130 | Rating: 0 | rate: / Reply Quote | |
Thanks Richard, but ... | |
ID: 47131 | Rating: 0 | rate: / Reply Quote | |
The fix I would prefer is for MJH to limit the relevant GPU application to the newer cards; i.e., Maxwell and later. Whatever "fix" he might come up with may limit the performance of the newer cards, or at least require a lot of his time and effort that might be spent in better ways on new apps. | |
ID: 47132 | Rating: 0 | rate: / Reply Quote | |
The fix I would prefer is for MJH to limit the relevant GPU application to the newer cards; i.e., Maxwell and later. My two cents... I would agree for the long runs, as it doesnt make sense to run them on an old gtx660 anyway. But not as a general measure for long and short runs. Do we have any statistic about how many Kepler cards are still in use at GPUGRID? I reckon that there are a great many... and therefore we shouldnt jump the gun excluding them. There will be the usual moaning and groaning, and people will leave. But there are plenty of volunteers anyway, and even more problems. So reduce both. Well, if there are as many as I suspect (650ti, 660, 660ti, 670, 680), it would be very difficult to compensate that loss of crunching power. I have my doubts. ____________ I would love to see HCF1 protein folding and interaction simulations to help my little boy... someday. | |
ID: 47133 | Rating: 0 | rate: / Reply Quote | |
OK, that makes sense. I forgot about the short runs, but the Keplers would be quite nice for that. | |
ID: 47134 | Rating: 0 | rate: / Reply Quote | |
Kepler is not nearly old enough to drop support, nor is it inefficient enough, as it's still on 28nm like maxwell. I'm glad they dropped Fermi because of the higher lithography and inefficient architecture. | |
ID: 47135 | Rating: 0 | rate: / Reply Quote | |
Despite re-installing nvidia drivers, i'm still facing immediate computation errors since win10 creator's update - since this is not going to change, do you have any recommendations for me to try to start crunching again? | |
ID: 47138 | Rating: 0 | rate: / Reply Quote | |
OK, that makes sense. I forgot about the short runs, but the Keplers would be quite nice for that. Except that the availablity of short runs has dropped quite a bit lately :-( Myself, I have already considered to switch to short runs with my two GTX750Ti, since after implementing the latest crunching software (acemd_918.80), the crunching times have inreased considerably, up to almost 60 hours (as noticed also by other members). | |
ID: 47139 | Rating: 0 | rate: / Reply Quote | |
Despite re-installing nvidia drivers, i'm still facing immediate computation errors since win10 creator's update - since this is not going to change, do you have any recommendations for me to try to start crunching again? It's beginning to look as if there might be a problem with that 381.89 driver, isn't it? It was only released on 25 April, and I haven't heard about anybody else trying to use it yet. Maybe other users could post their observations, either way - and while we're waiting, you could try reverting to an older driver to see if that helps. Go to http://www.nvidia.com/Download/Find.aspx, fill in your card and operating system details, and choose from the search result list - anything between 372.54 and 381.65 should be fine. When you run the installer, choose 'custom' installation and check the 'clean install' box just to be on the safe side. | |
ID: 47141 | Rating: 0 | rate: / Reply Quote | |
yeah maybe ill try that, but it's hard since after 2 faulty WUs, i have to wait another 24hr to get the next ones. | |
ID: 47142 | Rating: 0 | rate: / Reply Quote | |
I went back and saw that successful WU were performed with the latest Nvidia drivers (also my current one now), so it's fair to assume that win 10 creators update is the culprit... Since nothing else changed. Does that basically mean that I'm not gonna be able to do any work until gpugrid makes win 10 creators update compatible? I fear this might take a long time... | |
ID: 47143 | Rating: 0 | rate: / Reply Quote | |
Thanks Richard, but ... Request for users affected by "9.18 (cuda80)" app instantly failing: My NVIDIA contact has a request: Please fill out the Driver Feedback survey below, if you are affected by the GPUGrid "9.18 (cuda80)" app immediately failing with "Computation Error" on your GPU. This helps them assign priority when fixing issues. Be thorough when filling it out, please. http://surveys.nvidia.com/index.jsp?pi=6e7ea6bb4a02641fa8f07694a40f8ac6 Thanks, Jacob | |
ID: 47153 | Rating: 0 | rate: / Reply Quote | |
I have read somwere else that scientists have a major problem comminicating with ordinary people (i mean thick) and all the problems with this project seem to bare this out. | |
ID: 47154 | Rating: 0 | rate: / Reply Quote | |
Guess who's going to download the 1.2 GB Cuda 8.0 toolkit, and install the 8 GB Visual Studio 2015 Community Edition IDE, in attempt to repro the SM3/CC3 compiler issues using the Cuda Toolkit samples? | |
ID: 47164 | Rating: 0 | rate: / Reply Quote | |
MJH (et. al): | |
ID: 47166 | Rating: 0 | rate: / Reply Quote | |
It's beginning to look as if there might be a problem with that 381.89 driver, isn't it? A user on the BOINC message boards says that BSOD problems with driver 381.89 stopped after updating to 382.05 (he also says he's upgraded from BOINC v7.6.33 to v7.7.2, but I'd caution against that - v7.7.2 was a highly experimental test build. v7.6.33 has been around for a long time, and is very unlikely to be implicated in recent changes to GPU behaviour) | |
ID: 47176 | Rating: 0 | rate: / Reply Quote | |
That same guy posts a ton in the NVIDIA Driver Feedback threads on their forums, about GPU apps crashing whenever he closes BOINC. I've tried to help him a few times before, but it seems he doesn't know how to isolate problems and troubleshoot them very well. I don't think he tries very hard to reliably reproduce the problems that he has. | |
ID: 47177 | Rating: 0 | rate: / Reply Quote | |
Message boards : Number crunching : all WUs downloaded recently produce "computation error" right away