Advanced search

Message boards : Number crunching : The simulation has become unstable. Terminating to avoid lock-up

Author Message
John C MacAlister
Send message
Joined: 17 Feb 13
Posts: 181
Credit: 144,871,276
RAC: 0
Level
Cys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwat
Message 41844 - Posted: 18 Sep 2015 | 21:34:26 UTC

Hi, GPUGrid Folks:

Does this error mean my 660Ti is running too hot at 66 - 67 degrees C?

Thanks in advance for any help.


About Science Volunteers Performance Stats Forum Join Us Donate

Name e1s19_8-GERARD_VACXCL12_LIG_27303741-0-1-RND6928_0
Workunit 11203848
Created 17 Sep 2015 | 10:57:56 UTC
Sent 18 Sep 2015 | 12:16:58 UTC
Received 18 Sep 2015 | 20:59:56 UTC
Server state Over
Outcome Computation error
Client state Compute error
Exit status -97 (0xffffffffffffff9f) Unknown error number
Computer ID 214484
Report deadline 23 Sep 2015 | 12:16:58 UTC
Run time 31,225.45
CPU time 5,675.66
Validate state Invalid
Credit 0.00
Application version Long runs (8-12 hours on fastest card) v8.47 (cuda65)
Stderr output
<core_client_version>7.6.9</core_client_version>
<![CDATA[
<message>
(unknown error) - exit code -97 (0xffffff9f)
</message>
<stderr_txt>
# GPU [GeForce GTX 660 Ti] Platform [Windows] Rev [3212] VERSION [65]
# SWAN Device 1 :
# Name : GeForce GTX 660 Ti
# ECC : Disabled
# Global mem : 2048MB
# Capability : 3.0
# PCI ID : 0000:03:00.0
# Device clock : 1110MHz
# Memory clock : 3304MHz
# Memory width : 192bit
# Driver version : r355_00 : 35582
# GPU 0 : 50C
# GPU 1 : 56C
# GPU 0 : 57C
# GPU 1 : 61C
# GPU 0 : 60C
# GPU 1 : 64C
# GPU 0 : 65C
# GPU 1 : 66C
# GPU 0 : 66C
# GPU 1 : 67C
# GPU 0 : 68C
# GPU 1 : 68C
# GPU 0 : 70C
# GPU 1 : 69C
# GPU 0 : 71C
# GPU 0 : 72C
# GPU 0 : 73C
# GPU 1 : 70C
# GPU 0 : 74C
# GPU 0 : 75C
# GPU 1 : 71C
# GPU 0 : 76C
# GPU 0 : 77C
# GPU 1 : 72C
# GPU 0 : 78C
# The simulation has become unstable. Terminating to avoid lock-up (1)
# Attempting restart (step 8220000)
# GPU [GeForce GTX 660 Ti] Platform [Windows] Rev [3212] VERSION [65]
# SWAN Device 1 :
# Name : GeForce GTX 660 Ti
# ECC : Disabled
# Global mem : 2048MB
# Capability : 3.0
# PCI ID : 0000:03:00.0
# Device clock : 1110MHz
# Memory clock : 3304MHz
# Memory width : 192bit
# Driver version : r355_00 : 35582
# The simulation has become unstable. Terminating to avoid lock-up (1)

</stderr_txt>
]]>

John C MacAlister
Send message
Joined: 17 Feb 13
Posts: 181
Credit: 144,871,276
RAC: 0
Level
Cys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwat
Message 41845 - Posted: 18 Sep 2015 | 22:06:40 UTC

I meant the temperature of the GPU was 76 - 77 deg C.....

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2343
Credit: 16,206,655,749
RAC: 261,147
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 41846 - Posted: 18 Sep 2015 | 23:08:04 UTC - in response to Message 41844.
Last modified: 18 Sep 2015 | 23:12:54 UTC

Does this error mean my 660Ti is running too hot at 76 - 77 degrees C?

Thanks in advance for any help.


Application version Long runs (8-12 hours on fastest card) v8.47 (cuda65)
Stderr output
<core_client_version>7.6.9</core_client_version>
<![CDATA[
<message>
(unknown error) - exit code -97 (0xffffff9f)
</message>
<stderr_txt>
# The simulation has become unstable. Terminating to avoid lock-up (1)
# Attempting restart (step 8220000)
# GPU [GeForce GTX 660 Ti] Platform [Windows] Rev [3212] VERSION [65]
# SWAN Device 1 :
# Name : GeForce GTX 660 Ti
# ECC : Disabled
# Global mem : 2048MB
# Capability : 3.0
# PCI ID : 0000:03:00.0
# Device clock : 1110MHz default clock is 915~980MHz
# Memory clock : 3304MHz default clock is 3000MHz
# Memory width : 192bit
# Driver version : r355_00 : 35582
# The simulation has become unstable. Terminating to avoid lock-up (1)

</stderr_txt>
]]>

Your GPU is probably a factory overclocked card, which can't take that much overclocking.
There are two possible methods to fix it:
The more safe is to reduce its clocks by 20 MHz decrements until it gets stable. It will reduce the card's temperature.
The other metod (to maintain its speed) is to increase the GPU voltage by 12-25mV. It will increase the card's temperature.
____________

Betting Slip
Send message
Joined: 5 Jan 09
Posts: 670
Credit: 2,498,095,550
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 41849 - Posted: 19 Sep 2015 | 8:22:12 UTC - in response to Message 41845.

I meant the temperature of the GPU was 76 - 77 deg C.....


The temperature is fine as I have A GTX660ti which sometimes/often goes into the 80'c with no problem.

Your problem as Retvari says is OC'ing whether by factory or yourself.

John C MacAlister
Send message
Joined: 17 Feb 13
Posts: 181
Credit: 144,871,276
RAC: 0
Level
Cys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwat
Message 41850 - Posted: 19 Sep 2015 | 10:13:46 UTC

Thank you Retvari Zoltan and Betting Slip

Post to thread

Message boards : Number crunching : The simulation has become unstable. Terminating to avoid lock-up

//