Energies have become nan

Message boards : Number crunching : Energies have become nan

Author	Message
skgiven Volunteer moderator Volunteer tester Send message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level Scientific publications	Message 20538 - Posted: 28 Feb 2011 \| 8:50:11 UTC Last modified: 28 Feb 2011 \| 8:52:46 UTC
	Skip Da Shu is getting "Energies have become nan" task errors on this system. All tasks error out with, ERROR: file deven.cpp line 879: # Energies have become nan System is Linux, Boinc Version 6.12.15. Failed tasks are TONI_KKAL2 and GIANNI_DHFR1000 tasks. No other tasks ran. The tasks have mostly failed on the GTS250, but also on the GT340. Some tasks have started running on one card and later ran on the other (after a Boinc or system restart). I would suggest you remove the GTS250 and reinstall the drivers if need be.
	ID: 20538 \| Rating: 0 \| rate: / Reply Quote

MarkJ Volunteer moderator Volunteer tester Send message Joined: 24 Dec 08 Posts: 738 Credit: 200,909,904 RAC: 0 Level Scientific publications	Message 20841 - Posted: 2 Apr 2011 \| 4:55:51 UTC Last modified: 2 Apr 2011 \| 4:58:21 UTC
	I got one of these just now. Machine has a GTX570 in it. Running 266.58 drivers under Win 7 x64. It was a KASHIF_HIVPR work unit this time. It ran for 3 hours 46 mins before it died. Link to wu here ____________ BOINC blog
	ID: 20841 \| Rating: 0 \| rate: / Reply Quote

Stoneageman Send message Joined: 25 May 09 Posts: 224 Credit: 34,057,374,498 RAC: 227 Level Scientific publications	Message 20842 - Posted: 2 Apr 2011 \| 5:57:24 UTC - in response to Message 20841.
	Sorted my old nan problem by under clocking the card memory by 10% :-)
	ID: 20842 \| Rating: 0 \| rate: / Reply Quote

MarkJ Volunteer moderator Volunteer tester Send message Joined: 24 Dec 08 Posts: 738 Credit: 200,909,904 RAC: 0 Level Scientific publications	Message 20861 - Posted: 6 Apr 2011 \| 14:32:52 UTC Last modified: 6 Apr 2011 \| 14:34:44 UTC
	And another one tonight here I don't have the memory overclocked, but I do have the processor clock cranked up a little (1675Mhz for this run). The previous wu I had it at 1700Mhz. I might try dropping it some more. ____________ BOINC blog
	ID: 20861 \| Rating: 0 \| rate: / Reply Quote

Retvari Zoltan Send message Joined: 20 Jan 09 Posts: 2343 Credit: 16,206,655,749 RAC: 261,147 Level Scientific publications	Message 20867 - Posted: 6 Apr 2011 \| 21:28:04 UTC - in response to Message 20861.
	And another one tonight here I don't have the memory overclocked, but I do have the processor clock cranked up a little (1675Mhz for this run). The previous wu I had it at 1700Mhz. I might try dropping it some more. I observed that high (above 93%) GPU utilizing tasks (typically GIANNI_DHFRs, KASHIF_HIVPRs and TONI_KKALs) are more prone to error out this way than others. The solution is either you raise the GPU voltage by 0.025V (and the fan speed too), or lower the shader clock (or the memory clock of the GPU) until these errors cease popping up.
	ID: 20867 \| Rating: 0 \| rate: / Reply Quote

skgiven Volunteer moderator Volunteer tester Send message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level Scientific publications	Message 20875 - Posted: 7 Apr 2011 \| 13:43:01 UTC - in response to Message 20867.
	The GIANNI tasks sometimes failed when I was overclocking my GTX470's. Just reducing the clocks back to normal was enough, but some people had to up the voltage and some had to reduce the memory freq. Every GPU is different. The temperatures rose a bit with these tasks as well, but I'm now using MSI Afterburner, which is configured to adjust fan speed automatically in response to temperature changes.
	ID: 20875 \| Rating: 0 \| rate: / Reply Quote

skgiven Volunteer moderator Volunteer tester Send message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level Scientific publications	Message 20904 - Posted: 12 Apr 2011 \| 15:55:47 UTC - in response to Message 20875.
	Ton (ftpd) is seeing "Energies have become nan" failure messages and also "SWAN : FATAL : Cuda driver error 2bc in file 'swanlib_nv.c' in line 244" failures on his GTX295. Mostly on Toni's tasks, but seems to impact on long and short tasks and similar errors for Ignasi's work. (XP x86, driver: 27051, Boinc 6.10.60) Is this a known issue that is being looked at?
	ID: 20904 \| Rating: 0 \| rate: / Reply Quote

Betting Slip Send message Joined: 5 Jan 09 Posts: 670 Credit: 2,498,095,550 RAC: 0 Level Scientific publications	Message 20905 - Posted: 12 Apr 2011 \| 16:57:58 UTC - in response to Message 20904.
	I've got one of the SWAN Cuda errors as well. http://www.gpugrid.net/result.php?resultid=3882394 ____________ Radio Caroline, the world's most famous offshore pirate radio station. Great music since April 1964. Support Radio Caroline Team - Radio Caroline
	ID: 20905 \| Rating: 0 \| rate: / Reply Quote

ftpd Send message Joined: 6 Jun 08 Posts: 152 Credit: 328,250,382 RAC: 0 Level Scientific publications	Message 20915 - Posted: 13 Apr 2011 \| 6:42:07 UTC - in response to Message 20904. Last modified: 13 Apr 2011 \| 6:42:21 UTC
	And again after more then 7 hours processing 2 (two) wu's cancelled with gtx295. And again i have to wait a long time for new download. A waste of valuable time/money/power etc! Please look at it! ____________ Ton (ftpd) Netherlands
	ID: 20915 \| Rating: 0 \| rate: / Reply Quote

skgiven Volunteer moderator Volunteer tester Send message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level Scientific publications	Message 20917 - Posted: 13 Apr 2011 \| 9:32:04 UTC - in response to Message 20915.
	I had 4 "ERROR: file tclutil.cpp line 31: get_Dvec() element 0 (b) called boinc_finish" errors, but they all errored out inside 10sec. Using driver: 26724 on both systems, one with a GTX260 and the other with GTX470's. The one task I recently had fail after some time (5h) was triggered by the system, again. Ton, your problems might be being exasperated by the more recent Beta driver, but others have not reported problems with it, and the problems are there with the earlier driver. So I think the problem is more likely to do with the tasks themselves - basically down to Toni, Ignasi and the rest of the team to sort out. Good luck guys,
	ID: 20917 \| Rating: 0 \| rate: / Reply Quote

Ross* Send message Joined: 6 May 09 Posts: 34 Credit: 443,507,669 RAC: 0 Level Scientific publications	Message 21036 - Posted: 22 Apr 2011 \| 4:22:38 UTC Last modified: 22 Apr 2011 \| 4:29:08 UTC
	The following happened to several Wus with this driver on 2 boxes A198-TONI_AGG1-8-100-RND4916_0 Workunit 2448115 Created 21 Apr 2011 19:47:36 UTC Sent 21 Apr 2011 19:52:50 UTC Received 21 Apr 2011 23:20:56 UTC Server state Over Outcome Client error Client state Compute error Exit status 98 (0x62) Computer ID 95964 Report deadline 26 Apr 2011 19:52:50 UTC Run time 8796.666317 CPU time 1981.4 stderr out <core_client_version>6.10.58</core_client_version> <![CDATA[ <message> - exit code 98 (0x62) </message> <stderr_txt> # Using device 0 # There are 2 devices supporting CUDA # Device 0: "GeForce GTX 570" # Clock rate: 1.56 GHz # Total amount of global memory: 1275658240 bytes # Number of multiprocessors: 15 # Number of cores: 120 # Device 1: "GeForce GTX 570" # Clock rate: 1.56 GHz # Total amount of global memory: 1275789312 bytes # Number of multiprocessors: 15 # Number of cores: 120 MDIO ERROR: cannot open file "restart.coor" ERROR: file deven.cpp line 879: # Energies have become nan called boinc_finish </stderr_txt> ]]> Validate state Invalid Claimed credit 35140.0810185185 Granted credit 0 application version Long runs (8-12 hours on fastest card) v6.13 (cuda31) Also these A560r5-TONI_AB1-21-100-RND5976_1 Workunit 2447770 Created 21 Apr 2011 15:12:33 UTC Sent 21 Apr 2011 22:50:47 UTC Received 22 Apr 2011 0:06:48 UTC Server state Over Outcome Client error Client state Compute error Exit status 98 (0x62) Computer ID 96625 Report deadline 26 Apr 2011 22:50:47 UTC Run time 1046.854 CPU time 272.0657 stderr out <core_client_version>6.10.58</core_client_version> <![CDATA[ <message> - exit code 98 (0x62) </message> <stderr_txt> # Using device 0 # There are 2 devices supporting CUDA # Device 0: "GeForce GTX 580" # Clock rate: 1.59 GHz # Total amount of global memory: 1543045120 bytes # Number of multiprocessors: 16 # Number of cores: 128 # Device 1: "GeForce GTX 580" # Clock rate: 1.59 GHz # Total amount of global memory: 1543176192 bytes # Number of multiprocessors: 16 # Number of cores: 128 MDIO ERROR: warning: redefined atom parameters for ht MDIO ERROR: warning: redefined atom parameters for ot MDIO ERROR: warning: redefined atom parameters for cph1 MDIO ERROR: warning: redefined atom parameters for cph2 MDIO ERROR: warning: redefined atom parameters for nr1 MDIO ERROR: warning: redefined atom parameters for nr2 MDIO ERROR: warning: redefined atom parameters for hr3 MDIO ERROR: warning: redefined atom parameters for hr1 MDIO ERROR: warning: redefined bond parameters for ht ht MDIO ERROR: warning: redefined bond parameters for ht ot MDIO ERROR: warning: redefined bond parameters for nr1 cph1 MDIO ERROR: warning: redefined bond parameters for nr1 cph2 MDIO ERROR: warning: redefined bond parameters for nr2 cph1 MDIO ERROR: warning: redefined bond parameters for nr2 cph2 MDIO ERROR: warning: redefined bond parameters for cph1 cph1 MDIO ERROR: warning: redefined bond parameters for hr1 cph2 MDIO ERROR: warning: redefined bond parameters for hr3 cph1 MDIO ERROR: warning: redefined angle parameters for cph2 nr1 cph1 MDIO ERROR: warning: redefined angle parameters for cph2 nr2 cph1 MDIO ERROR: warning: redefined angle parameters for nr1 cph1 cph1 MDIO ERROR: warning: redefined angle parameters for nr1 cph2 nr2 MDIO ERROR: warning: redefined angle parameters for nr2 cph1 cph1 MDIO ERROR: warning: redefined angle parameters for nr1 cph2 hr1 MDIO ERROR: warning: redefined angle parameters for nr2 cph2 hr1 MDIO ERROR: warning: redefined angle parameters for hr3 cph1 cph1 MDIO ERROR: warning: redefined angle parameters for nr1 cph1 hr3 MDIO ERROR: warning: redefined angle parameters for nr2 cph1 hr3 MDIO ERROR: warning: redefined angle parameters for ht ot ht MDIO ERROR: warning: redefined dihedral parameters for d cph2 nr1 cph1 cph1 MDIO ERROR: warning: redefined dihedral parameters for d cph2 nr2 cph1 cph1 MDIO ERROR: warning: redefined dihedral parameters for d nr1 cph1 cph1 hr3 MDIO ERROR: warning: redefined dihedral parameters for d nr1 cph2 nr2 cph1 MDIO ERROR: warning: redefined dihedral parameters for d nr2 cph1 cph1 nr1 MDIO ERROR: warning: redefined dihedral parameters for d nr2 cph2 nr1 cph1 MDIO ERROR: warning: redefined dihedral parameters for d hr1 cph2 nr1 cph1 MDIO ERROR: warning: redefined dihedral parameters for d hr1 cph2 nr2 cph1 MDIO ERROR: warning: redefined dihedral parameters for d hr3 cph1 cph1 hr3 MDIO ERROR: warning: redefined dihedral parameters for d hr3 cph1 nr1 cph2 MDIO ERROR: warning: redefined dihedral parameters for d hr3 cph1 nr2 cph2 MDIO ERROR: warning: redefined dihedral parameters for d nr2 cph1 cph1 hr3 MDIO ERROR: warning: redefined improper parameters for i hr1 nr1 nr2 cph2 MDIO ERROR: warning: redefined improper parameters for i hr1 nr2 nr1 cph2 MDIO ERROR: warning: redefined improper parameters for i hr3 cph1 nr1 cph1 MDIO ERROR: warning: redefined improper parameters for i hr3 cph1 nr2 cph1 MDIO ERROR: warning: redefined improper parameters for i hr3 nr1 cph1 cph1 MDIO ERROR: warning: redefined improper parameters for i hr3 nr2 cph1 cph1 MDIO ERROR: cannot open file "restart.coor" # Using device 0 # There are 2 devices supporting CUDA # Device 0: "GeForce GTX 580" # Clock rate: 1.59 GHz # Total amount of global memory: 1543045120 bytes # Number of multiprocessors: 16 # Number of cores: 128 # Device 1: "GeForce GTX 580" # Clock rate: 1.59 GHz # Total amount of global memory: 1543176192 bytes # Number of multiprocessors: 16 # Number of cores: 128 MDIO ERROR: warning: redefined atom parameters for ht MDIO ERROR: warning: redefined atom parameters for ot MDIO ERROR: warning: redefined atom parameters for cph1 MDIO ERROR: warning: redefined atom parameters for cph2 MDIO ERROR: warning: redefined atom parameters for nr1 MDIO ERROR: warning: redefined atom parameters for nr2 MDIO ERROR: warning: redefined atom parameters for hr3 MDIO ERROR: warning: redefined atom parameters for hr1 MDIO ERROR: warning: redefined bond parameters for ht ht MDIO ERROR: warning: redefined bond parameters for ht ot MDIO ERROR: warning: redefined bond parameters for nr1 cph1 MDIO ERROR: warning: redefined bond parameters for nr1 cph2 MDIO ERROR: warning: redefined bond parameters for nr2 cph1 MDIO ERROR: warning: redefined bond parameters for nr2 cph2 MDIO ERROR: warning: redefined bond parameters for cph1 cph1 MDIO ERROR: warning: redefined bond parameters for hr1 cph2 MDIO ERROR: warning: redefined bond parameters for hr3 cph1 MDIO ERROR: warning: redefined angle parameters for cph2 nr1 cph1 MDIO ERROR: warning: redefined angle parameters for cph2 nr2 cph1 MDIO ERROR: warning: redefined angle parameters for nr1 cph1 cph1 MDIO ERROR: warning: redefined angle parameters for nr1 cph2 nr2 MDIO ERROR: warning: redefined angle parameters for nr2 cph1 cph1 MDIO ERROR: warning: redefined angle parameters for nr1 cph2 hr1 MDIO ERROR: warning: redefined angle parameters for nr2 cph2 hr1 MDIO ERROR: warning: redefined angle parameters for hr3 cph1 cph1 MDIO ERROR: warning: redefined angle parameters for nr1 cph1 hr3 MDIO ERROR: warning: redefined angle parameters for nr2 cph1 hr3 MDIO ERROR: warning: redefined angle parameters for ht ot ht MDIO ERROR: warning: redefined dihedral parameters for d cph2 nr1 cph1 cph1 MDIO ERROR: warning: redefined dihedral parameters for d cph2 nr2 cph1 cph1 MDIO ERROR: warning: redefined dihedral parameters for d nr1 cph1 cph1 hr3 MDIO ERROR: warning: redefined dihedral parameters for d nr1 cph2 nr2 cph1 MDIO ERROR: warning: redefined dihedral parameters for d nr2 cph1 cph1 nr1 MDIO ERROR: warning: redefined dihedral parameters for d nr2 cph2 nr1 cph1 MDIO ERROR: warning: redefined dihedral parameters for d hr1 cph2 nr1 cph1 MDIO ERROR: warning: redefined dihedral parameters for d hr1 cph2 nr2 cph1 MDIO ERROR: warning: redefined dihedral parameters for d hr3 cph1 cph1 hr3 MDIO ERROR: warning: redefined dihedral parameters for d hr3 cph1 nr1 cph2 MDIO ERROR: warning: redefined dihedral parameters for d hr3 cph1 nr2 cph2 MDIO ERROR: warning: redefined dihedral parameters for d nr2 cph1 cph1 hr3 MDIO ERROR: warning: redefined improper parameters for i hr1 nr1 nr2 cph2 MDIO ERROR: warning: redefined improper parameters for i hr1 nr2 nr1 cph2 MDIO ERROR: warning: redefined improper parameters for i hr3 cph1 nr1 cph1 MDIO ERROR: warning: redefined improper parameters for i hr3 cph1 nr2 cph1 MDIO ERROR: warning: redefined improper parameters for i hr3 nr1 cph1 cph1 MDIO ERROR: warning: redefined improper parameters for i hr3 nr2 cph1 cph1 MDIO ERROR: cannot open file "restart.coor" ERROR: file deven.cpp line 879: # Energies have become nan called boinc_finish </stderr_txt> ]]> Validate state Invalid Claimed credit 38584.7222222222 Granted credit 0 application version Long runs (8-12 hours on fastest card) v6.13 (cuda31) I have since gone back the 266.58 and had no issues. Cheers Ross* ____________
	ID: 21036 \| Rating: 0 \| rate: / Reply Quote

skgiven Volunteer moderator Volunteer tester Send message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level Scientific publications	Message 21037 - Posted: 22 Apr 2011 \| 8:03:49 UTC - in response to Message 21036.
	The error is "Energies have become nan". Not sure I would attribute this to the driver though; this error has been seen many times before under many drivers.
	ID: 21037 \| Rating: 0 \| rate: / Reply Quote

MarkJ Volunteer moderator Volunteer tester Send message Joined: 24 Dec 08 Posts: 738 Credit: 200,909,904 RAC: 0 Level Scientific publications	Message 21055 - Posted: 25 Apr 2011 \| 14:49:10 UTC Last modified: 25 Apr 2011 \| 14:51:49 UTC
	I would add its typically caused by over-clocking. I had similar issues on my reference design GTX570 which seemed to go away after dropping back to stock. See the Energies have become nan thread ____________ BOINC blog
	ID: 21055 \| Rating: 0 \| rate: / Reply Quote

Ken Florian Send message Joined: 4 May 12 Posts: 56 Credit: 1,832,989,878 RAC: 0 Level Scientific publications	Message 25835 - Posted: 22 Jun 2012 \| 22:26:07 UTC - in response to Message 21055.
	I have work units failing, sporadically, with "energies have become nan". The programmer in me loves the mysterious nature of the message. The cruncher in me wonders "what I am doing wrong"? EVGA GTX-690 Signature, not OC'ed and no manual adjustments made to any settings. 1 GPU at about 82C, utilization about 87% 1 GPU at about 60C, utilization about 60% Intel E8200, not OC'ed 8G ram Antec Earth Watts 650 Single 7200RPM sata drive Nothing else in the machine Boinc 7.0.25 Win7, X64 Since I've never made manual adjustments to a gpu, if changes are necessary, please give me specific recommendations. I will be using EVGA's PrecisionX tool. Thanks, Ken
	ID: 25835 \| Rating: 0 \| rate: / Reply Quote

skgiven Volunteer moderator Volunteer tester Send message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level Scientific publications	Message 25837 - Posted: 23 Jun 2012 \| 7:53:57 UTC - in response to Message 25835. Last modified: 23 Jun 2012 \| 21:14:28 UTC
	nan means "not a number". I think this error occurs when the GPU has an unrecoverable failure, and as described in this thread is often a result of overclocking, temps being too high, or voltage being insufficient. Not all tasks are the same here; some utilize the GPU to a greater extent, demanding more from it. These tasks are more prone to failures. The lack of thread-safe code may also be an issue, but to some extent that would just hide a bad setup. 82°C is pushing it, slightly. Try the generic recommendations for this situation: Increase the GPU Fan speed so that it keeps the GPU below 70°C, Reduce the GDDR5 frequency by 10% or 20% should that fail, Increase the Voltage (if you can on that card), but only by the least amount (typically ~0.025V) Also consider better cooling - adding a case fan or two, or just leaving the door off, slightly. Removing back plates might also be useful, as might a better CPU fan; one that radiates or blows less heat onto the GPU. If you don't get anywhere with the above, try downclocking the CPU in W7 Power management (not ideal but substantially reduces heat from the CPU). ____________ FAQ's HOW TO: - Opt out of Beta Tests - Ask for Help
	ID: 25837 \| Rating: 0 \| rate: / Reply Quote

Retvari Zoltan Send message Joined: 20 Jan 09 Posts: 2343 Credit: 16,206,655,749 RAC: 261,147 Level Scientific publications	Message 25841 - Posted: 23 Jun 2012 \| 18:35:35 UTC
	The definition of NaN from Wikipedia explains a lot.
	ID: 25841 \| Rating: 0 \| rate: / Reply Quote

Raptures Riot Send message Joined: 30 Apr 11 Posts: 6 Credit: 220,588,795 RAC: 0 Level Scientific publications	Message 26639 - Posted: 17 Aug 2012 \| 21:45:36 UTC - in response to Message 20538.
	My candid feeling is that the 6.16 (Cuda 42) is an agressive routine. My 470's are running at stock voltages and frequencies. Some potential adjustments (especially for the inexperienced of us) could shorten the life of these cards and reduce our contributions to this project. The previous coding ran nearly flawlessly for me. I take these suggestions to heart, but also with a grain of hesitation. I, for one, am not so experienced to acheive high success on this Grid while compromising future potential. It is, as it always is, to each their own. Caution is always best. I may be wrong, but I am always willing to learn.
	ID: 26639 \| Rating: 0 \| rate: / Reply Quote

Retvari Zoltan Send message Joined: 20 Jan 09 Posts: 2343 Credit: 16,206,655,749 RAC: 261,147 Level Scientific publications	Message 26641 - Posted: 17 Aug 2012 \| 23:23:13 UTC - in response to Message 26639. Last modified: 17 Aug 2012 \| 23:25:21 UTC
	My candid feeling is that the 6.16 (Cuda 42) is an agressive routine. I came to the same conclusion when I had to set my GTX 590 to 625MHz for the CUDA 4.2 client. It was running fine at 725MHz with the CUDA 3.1 client. However, the CUDA 4.2 client is 40% faster than the CUDA 3.1 client, so it can do 20% more work at the lower frequency. My 470's are running at stock voltages and frequencies. These cards were made for gaming, not for crunching. When a card fails once in a 4 hour gaming session, the player hardly notices the glitch caused by that failure at all. So these cards maybe under-voltaged by factory setting, especially the GTX 470 and the GTX 480. When a 4 hour workunit experiences the same faliure, it will run into an error message. It is debatable, that the client should try to go on from the last checkpoint in this case, instead of aborting the task immediately. But if a card is unreliable, it's safer to discard the entire workunit. Some potential adjustments (especially for the inexperienced of us) could shorten the life of these cards and reduce our contributions to this project. Errors reduce the contributions to this project. Errors cost a lot of electricity for the cruncher in vain. If you lower your GPU frequency, it won't shorten your card's lifespan, but it will make it more reliable at the same voltage. The previous coding ran nearly flawlessly for me. I have a Gigabyte GTX 480 with 1000mV GPU voltage by default. It was nearly flawless, while my ASUS GTX 480 at 1025mV was really flawless. I raised the Gigabyte's voltage to 1025mV, and it became also really flawless. It was more than 2 years ago, and these cards are still crunching 24/7 at an even higher voltage and frequency (equipped with a better than factory cooling).
	ID: 26641 \| Rating: 0 \| rate: / Reply Quote

Raptures Riot Send message Joined: 30 Apr 11 Posts: 6 Credit: 220,588,795 RAC: 0 Level Scientific publications	Message 26643 - Posted: 18 Aug 2012 \| 3:20:55 UTC - in response to Message 26641.
	Thank you Retvari for your reply. I'm making small changes to try and improve my reliability. The card's <less than ideal> gaming configuration is a valuable point and I have learned something.
	ID: 26643 \| Rating: 0 \| rate: / Reply Quote

skgiven Volunteer moderator Volunteer tester Send message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level Scientific publications	Message 28318 - Posted: 31 Jan 2013 \| 18:32:30 UTC - in response to Message 26643. Last modified: 31 Jan 2013 \| 18:39:43 UTC
	We could do with some feedback regarding these recent "Energies have become nan" errors: Name 48_14-NOELIA_hfXA_long_30-0-2-RND7978_2 Workunit 4080964 Created 30 Jan 2013 \| 10:26:34 UTC Sent 30 Jan 2013 \| 14:18:32 UTC Received 30 Jan 2013 \| 22:24:51 UTC Server state Over Outcome Computation error Client state Compute error Exit status 98 (0x62) Computer ID 139265 Report deadline 4 Feb 2013 \| 14:18:32 UTC Run time 28,221.08 CPU time 28,208.29 Validate state Invalid Credit 0.00 Application version Long runs (8-12 hours on fastest card) v6.17 (cuda42) Stderr output <core_client_version>7.0.44</core_client_version> <![CDATA[ <message> - exit code 98 (0x62) </message> <stderr_txt> MDIO: cannot open file "restart.coor" SWAN : FATAL : Cuda driver error 999 in file 'swanlibnv2.cpp' in line 1574. Assertion failed: a, file swanlibnv2.cpp, line 59 This application has requested the Runtime to terminate it in an unusual way. Please contact the application's support team for more information. MDIO: cannot open file "restart.coor" SWAN : FATAL : Cuda driver error 999 in file 'swanlibnv2.cpp' in line 1574. Assertion failed: a, file swanlibnv2.cpp, line 59 This application has requested the Runtime to terminate it in an unusual way. Please contact the application's support team for more information. MDIO: cannot open file "restart.coor" MDIO: cannot open file "restart.coor" ERROR: file deven.cpp line 1106: # Energies have become nan called boinc_finish </stderr_txt> ]]> Thanks, ____________ FAQ's HOW TO: - Opt out of Beta Tests - Ask for Help
	ID: 28318 \| Rating: 0 \| rate: / Reply Quote

Post to thread

Message boards : Number crunching : Energies have become nan

	About	Science	Volunteers	Performance	Forum	Join us	Donate