Author |
Message |
Beyond Send message
Joined: 23 Nov 08 Posts: 1112 Credit: 6,162,416,256 RAC: 0 Level
Scientific publications
|
A couple examples:
https://www.gpugrid.net/workunit.php?wuid=11674070
https://www.gpugrid.net/workunit.php?wuid=11674120
The error is always: ERROR: file force.cpp line 513: TCL evaluation of [calcforces]
However I do have a 0-GERARD_MO_MOR WU that's still running after 5 hours.
|
|
|
|
A couple examples:
https://www.gpugrid.net/workunit.php?wuid=11674070
https://www.gpugrid.net/workunit.php?wuid=11674120
The error is always: ERROR: file force.cpp line 513: TCL evaluation of [calcforces]
However I do have a 0-GERARD_MO_MOR WU that's still running after 5 hours.
Same thing here!
Name e1s34_1-GERARD_MO_MOR_1-0-1-RND2358_2
Workunit 11674055
Created 18 Jul 2016 | 14:57:53 UTC
Sent 18 Jul 2016 | 14:57:56 UTC
Received 18 Jul 2016 | 19:18:31 UTC
Server state Over
Outcome Computation error
Client state Compute error
Exit status -98 (0xffffffffffffff9e) Unknown error number
Computer ID 263612
Report deadline 23 Jul 2016 | 14:57:56 UTC
Run time 3.25
CPU time 1.22
Validate state Invalid
Credit 0.00
Application version Long runs (8-12 hours on fastest card) v8.48 (cuda65)
Stderr output
<core_client_version>7.6.22</core_client_version>
<![CDATA[
<message>
(unknown error) - exit code -98 (0xffffff9e)
</message>
<stderr_txt>
# GPU [GeForce GTX 980 Ti] Platform [Windows] Rev [3212] VERSION [65]
# SWAN Device 0 :
# Name : GeForce GTX 980 Ti
# ECC : Disabled
# Global mem : 4095MB
# Capability : 5.2
# PCI ID : 0000:01:00.0
# Device clock : 1266MHz
# Memory clock : 3505MHz
# Memory width : 384bit
# Driver version : r358_00 : 35906
ERROR: file force.cpp line 513: TCL evaluation of [calcforces]
15:16:11 (6520): called boinc_finish
</stderr_txt>
]]>
Name e1s41_1-GERARD_MO_MOR_2-0-1-RND9697_0
Workunit 11674112
Created 18 Jul 2016 | 12:13:48 UTC
Sent 18 Jul 2016 | 12:40:09 UTC
Received 18 Jul 2016 | 19:18:31 UTC
Server state Over
Outcome Computation error
Client state Compute error
Exit status -98 (0xffffffffffffff9e) Unknown error number
Computer ID 263612
Report deadline 23 Jul 2016 | 12:40:09 UTC
Run time 3.45
CPU time 1.17
Validate state Invalid
Credit 0.00
Application version Long runs (8-12 hours on fastest card) v8.48 (cuda65)
Stderr output
<core_client_version>7.6.22</core_client_version>
<![CDATA[
<message>
(unknown error) - exit code -98 (0xffffff9e)
</message>
<stderr_txt>
# GPU [GeForce GTX 980 Ti] Platform [Windows] Rev [3212] VERSION [65]
# SWAN Device 0 :
# Name : GeForce GTX 980 Ti
# ECC : Disabled
# Global mem : 4095MB
# Capability : 5.2
# PCI ID : 0000:01:00.0
# Device clock : 1266MHz
# Memory clock : 3505MHz
# Memory width : 384bit
# Driver version : r358_00 : 35906
ERROR: file force.cpp line 513: TCL evaluation of [calcforces]
15:16:07 (2340): called boinc_finish
</stderr_txt>
]]>
|
|
|
|
Same here :
Stderr output
<core_client_version>7.6.22</core_client_version>
<![CDATA[
<message>
(unknown error) - exit code -98 (0xffffff9e)
</message>
<stderr_txt>
# GPU [GeForce GTX 980] Platform [Windows] Rev [3212] VERSION [65]
# SWAN Device 0 :
# Name : GeForce GTX 980
# ECC : Disabled
# Global mem : 4095MB
# Capability : 5.2
# PCI ID : 0000:0F:00.0
# Device clock : 1215MHz
# Memory clock : 3505MHz
# Memory width : 256bit
# Driver version : r352_00 : 35362
ERROR: file force.cpp line 513: TCL evaluation of [calcforces]
23:14:00 (5980): called boinc_finish
</stderr_txt>
]]> |
|
|
|
I had 2 more of these units error out:
Name e1s39_1-GERARD_MO_MOR_1-0-1-RND2698_6
Workunit 11674060
Created 19 Jul 2016 | 21:42:11 UTC
Sent 19 Jul 2016 | 21:42:18 UTC
Received 19 Jul 2016 | 21:44:52 UTC
Server state Over
Outcome Computation error
Client state Compute error
Exit status -98 (0xffffffffffffff9e) Unknown error number
Computer ID 263612
Report deadline 24 Jul 2016 | 21:42:18 UTC
Run time 3.08
CPU time 1.17
Validate state Invalid
Credit 0.00
Application version Long runs (8-12 hours on fastest card) v8.48 (cuda65)
Stderr output
<core_client_version>7.6.22</core_client_version>
<![CDATA[
<message>
(unknown error) - exit code -98 (0xffffff9e)
</message>
<stderr_txt>
# GPU [GeForce GTX 980 Ti] Platform [Windows] Rev [3212] VERSION [65]
# SWAN Device 1 :
# Name : GeForce GTX 980 Ti
# ECC : Disabled
# Global mem : 4095MB
# Capability : 5.2
# PCI ID : 0000:02:00.0
# Device clock : 1190MHz
# Memory clock : 3505MHz
# Memory width : 384bit
# Driver version : r358_00 : 35906
ERROR: file force.cpp line 513: TCL evaluation of [calcforces]
17:47:08 (6440): called boinc_finish
</stderr_txt>
]]>
Name e1s50_1-GERARD_MO_MOR_1-0-1-RND9002_1
Workunit 11674071
Created 19 Jul 2016 | 17:21:34 UTC
Sent 19 Jul 2016 | 17:21:49 UTC
Received 19 Jul 2016 | 21:37:22 UTC
Server state Over
Outcome Computation error
Client state Compute error
Exit status -98 (0xffffffffffffff9e) Unknown error number
Computer ID 30790
Report deadline 24 Jul 2016 | 17:21:49 UTC
Run time 6.00
CPU time 2.83
Validate state Invalid
Credit 0.00
Application version Long runs (8-12 hours on fastest card) v8.48 (cuda65)
Stderr output
<core_client_version>7.6.22</core_client_version>
<![CDATA[
<message>
(unknown error) - exit code -98 (0xffffff9e)
</message>
<stderr_txt>
# GPU [GeForce GTX 980 Ti] Platform [Windows] Rev [3212] VERSION [65]
# SWAN Device 0 :
# Name : GeForce GTX 980 Ti
# ECC : Disabled
# Global mem : 4095MB
# Capability : 5.2
# PCI ID : 0000:02:00.0
# Device clock : 1190MHz
# Memory clock : 3505MHz
# Memory width : 384bit
# Driver version : r355_00 : 35582
ERROR: file force.cpp line 513: TCL evaluation of [calcforces]
17:39:53 (3556): called boinc_finish
</stderr_txt>
]]>
This looks like a bad batch.
|
|
|
GregerSend message
Joined: 6 Jan 15 Posts: 76 Credit: 24,195,148,333 RAC: 10,296,438 Level
Scientific publications
|
Same here 90.45% ends to fail, what can we do about it, and how does these 9.55 manage to success to those wu:s? 3-4 sec this time so low lost but would like to see low error rate.
http://www.gpugrid.net/workunit.php?wuid=11674051
http://www.gpugrid.net/workunit.php?wuid=11674098
http://www.gpugrid.net/workunit.php?wuid=11674051
Got the same wu but to another host. |
|
|
|
I had e1s27_1-GERARD_MO_MOR_1-0-1-RND0098 - reached maximum number of errors with no successful returns at all. |
|
|
skgivenVolunteer moderator Volunteer tester
Send message
Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level
Scientific publications
|
90% fail rate says there is a problem with the model.
No tasks available confirms there is a problem.
Why are we not using a beta queue to test such tasks?
-----------------------------------------------------------------------------
____________
FAQ's
HOW TO:
- Opt out of Beta Tests
- Ask for Help |
|
|
Beyond Send message
Joined: 23 Nov 08 Posts: 1112 Credit: 6,162,416,256 RAC: 0 Level
Scientific publications
|
Same here 90.45% ends to fail, what can we do about it, and how does these 9.55 manage to success to those wu:s? 3-4 sec this time so low lost but would like to see low error rate.
Because the 0-GERARD_MO_MOR WUs seem OK while the 1-GERARD_MO_MOR WUs all fail.
All the admins are apparently either on vacation or asleep.
>> Why are we not using a beta queue to test such tasks?
We're open to theories... |
|
|