Message boards : Number crunching : New batch KKi4
Author | Message |
---|---|
Dears, this is the continuation of an experiment we'd like to publish soon. WUs are twice as large as the old "CAPBIND*" series. | |
ID: 18862 | Rating: 0 | rate: / Reply Quote | |
Dear Toni, | |
ID: 18868 | Rating: 0 | rate: / Reply Quote | |
There should be nothing new with these WUs (except their length). By "cancelled" you mean that they failed? | |
ID: 18870 | Rating: 0 | rate: / Reply Quote | |
I had one this morning which has failed on two different machines so far: http://www.gpugrid.net/workunit.php?wuid=1966290 | |
ID: 18871 | Rating: 0 | rate: / Reply Quote | |
And a TONI_KK broken as well. stderr out <core_client_version>6.10.17</core_client_version> <![CDATA[ <message> process exited with code 98 (0x62, -158) </message> <stderr_txt> # There is 1 device supporting CUDA # Device 0: "GeForce GT 240" # Clock rate: 1.34 GHz # Total amount of global memory: 536150016 bytes # Number of multiprocessors: 12 # Number of cores: 96 MDIO ERROR: read error for file "input.coor", byte number 0: expected to read number of atoms ERROR: file mdioload.cpp line 80: Unable to read bincoordfile 11:16:36 (3686): called boinc_finish </stderr_txt> ]]> and this (the other Windows): stderr out <core_client_version>6.10.58</core_client_version> <![CDATA[ <message> - exit code 98 (0x62) </message> <stderr_txt> # Using device 0 # There is 1 device supporting CUDA # Device 0: "GeForce GTX 260" # Clock rate: 1.35 GHz # Total amount of global memory: 919994368 bytes # Number of multiprocessors: 27 # Number of cores: 216 MDIO ERROR: read error for file "input.coor", byte number 0: expected to read number of atoms ERROR: file mdioload.cpp line 80: Unable to read bincoordfile called boinc_finish </stderr_txt> ]]> ____________ Gruesse vom Saenger For questions about Boinc look in the BOINC-Wiki | |
ID: 18872 | Rating: 0 | rate: / Reply Quote | |
Dear Toni, | |
ID: 18873 | Rating: 0 | rate: / Reply Quote | |
I have finished 4 and my systems are running at least one. Reasonable performance compared to the other tasks. However I also got one immediate failure: | |
ID: 18874 | Rating: 0 | rate: / Reply Quote | |
Hi, for those getting: byte number 0: expected to read number of atoms - it must have been a glitch in mass-WU creation, let them die. Richard - I think your other failure was on a mobile card. | |
ID: 18877 | Rating: 0 | rate: / Reply Quote | |
Thank you to everyone that reported this problem and thank you Toni for letting us know it is just a WU creation glitch. | |
ID: 18878 | Rating: 0 | rate: / Reply Quote | |
One of my 9800GTs had a go at h230r2-TONI_KKi4-0-200-RND9586, but unfortunately crashed with an assertion failure at the bitter end, after more than 24 hours of work. C'est la vie. | |
ID: 18890 | Rating: 0 | rate: / Reply Quote | |
Richard, you took that blow well. | |
ID: 18893 | Rating: 0 | rate: / Reply Quote | |
Toni, perhaps Fermi-only long tasks would go down better; a failure after a few hours is no big deal but after a day it really bites, and not everyone is so understanding. My GTX 260/216 runs the TONI_KKi4 WUs well, in fact it runs everything well. The problem is with my three GT 240 cards. They won't run the TONI_KKi4 WUs. They don't like the TONI_HERGMETAXDOFE WUs either. They do run KASHIF_HIVPR, TONI_CAPBIND and IBUCH very well though. | |
ID: 18909 | Rating: 0 | rate: / Reply Quote | |
I have had 4 finish on a GT240, and just one that failed after 2.46sec. Vista x64, all 512MB DDR5 cards. | |
ID: 18910 | Rating: 0 | rate: / Reply Quote | |
One faulty WU , probably, as all hosts have failed on this one......?! | |
ID: 18914 | Rating: 0 | rate: / Reply Quote | |
Found 2 WU's , computed by 4 hosts, which all failed, 2 still have to Report. | |
ID: 18918 | Rating: 0 | rate: / Reply Quote | |
One of my 9800GTs had a go at h230r2-TONI_KKi4-0-200-RND9586, but unfortunately crashed with an assertion failure at the bitter end, after more than 24 hours of work. C'est la vie. This 9800GT host really doesn't like KKi4 - now failed g105r2-TONI_KKi4-6-200-RND6062 with the same SWAN : FATAL : Failure executing kernel sync [frc_sum_kernel] [700] Assertion failed: 0, file swanlib_nv.cpp, line 121 error message. At least it only wasted 22 Ksec this time. | |
ID: 18925 | Rating: 0 | rate: / Reply Quote | |
This WU might be bad, | |
ID: 19161 | Rating: 0 | rate: / Reply Quote | |
Message boards : Number crunching : New batch KKi4