Author |
Message |
|
<core_client_version>6.6.38</core_client_version>
<![CDATA[
<message>
Unzul�ssige Funktion. (0x1) - exit code 1 (0x1)
</message>
<stderr_txt>
# Using CUDA device 0
# There is 1 device supporting CUDA
# Device 0: "GeForce 9800 GT"
# Clock rate: 1.65 GHz
# Total amount of global memory: 1073741824 bytes
# Number of multiprocessors: 14
# Number of cores: 112
MDIO ERROR: cannot open file "restart.coor"
# Using CUDA device 0
# There is 1 device supporting CUDA
# Device 0: "GeForce 9800 GT"
# Clock rate: 1.65 GHz
# Total amount of global memory: 1073741824 bytes
# Number of multiprocessors: 14
# Number of cores: 112
Cuda error: Kernel [pme_fill_charges_accumulate] failed in file 'fillcharges.cu' in line 73 : unknown error.
</stderr_txt>
?
|
|
|
|
Another failure:
<core_client_version>6.6.38</core_client_version>
<![CDATA[
<message>
Unzul�ssige Funktion. (0x1) - exit code 1 (0x1)
</message>
<stderr_txt>
# Using CUDA device 0
# There is 1 device supporting CUDA
# Device 0: "GeForce 9800 GT"
# Clock rate: 1.65 GHz
# Total amount of global memory: 1073741824 bytes
# Number of multiprocessors: 14
# Number of cores: 112
MDIO ERROR: cannot open file "restart.coor"
# Using CUDA device 0
# There is 1 device supporting CUDA
# Device 0: "GeForce 9800 GT"
# Clock rate: 1.65 GHz
# Total amount of global memory: 1073741824 bytes
# Number of multiprocessors: 14
# Number of cores: 112
Cuda error: Kernel [pme_fill_charges_accumulate] failed in file 'fillcharges.cu' in line 73 : unknown error.
</stderr_txt>
]]>
|
|
|
|
I keep having this error on every unit.
What should I do?
<core_client_version>6.10.18</core_client_version>
<![CDATA[
<message>
Funci�n incorrecta. (0x1) - exit code 1 (0x1)
</message>
<stderr_txt>
# Using CUDA device 0
# There are 2 devices supporting CUDA
# Device 0: "GeForce GTX 260"
# Clock rate: 1.41 GHz
# Total amount of global memory: 939196416 bytes
# Number of multiprocessors: 27
# Number of cores: 216
# Device 1: "GeForce 9800 GT"
# Clock rate: 1.37 GHz
# Total amount of global memory: 1073545216 bytes
# Number of multiprocessors: 14
# Number of cores: 112
MDIO ERROR: cannot open file "restart.coor"
</stderr_txt>
]]>
<core_client_version>6.10.18</core_client_version>
<![CDATA[
<message>
Funci�n incorrecta. (0x1) - exit code 1 (0x1)
</message>
<stderr_txt>
# Using CUDA device 0
# There are 2 devices supporting CUDA
# Device 0: "GeForce GTX 260"
# Clock rate: 1.41 GHz
# Total amount of global memory: 939196416 bytes
# Number of multiprocessors: 27
# Number of cores: 216
# Device 1: "GeForce 9800 GT"
# Clock rate: 1.37 GHz
# Total amount of global memory: 1073545216 bytes
# Number of multiprocessors: 14
# Number of cores: 112
MDIO ERROR: cannot open file "restart.coor"
Cuda error: Kernel [shake_step_1] failed in file 'shake.cu' in line 79 : unspecified launch failure.
</stderr_txt>
]]>
|
|
|
|
More and more...
task 1629422
task 1629408
task 1629383
task 1629337
task 1629322
Tonight I suspend proyect until further news. If admins what me to report any further information, I'll gladly help. |
|
|
skgivenVolunteer moderator Volunteer tester
Send message
Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level
Scientific publications
|
You should do this,
Install and run GPU-Z. It is freeware and will allow you to see the temperatures of the GPUS. If they are over 70degrees when running GPUGrid tasks, you may have a heating / ventilation problem.
If you do then you could test this by just leaving the door off for a while and try running more tasks and check the temperatures. If this is definitely a problem, either get a couple of extra system fans or manually turn the fan speed up on the card(s).
I would highly recommend that you uninstall the 9800GT. These cards use a G92 core and do not handle most of today’s tasks too well, especially hERG tasks!
This card is most likely causing ALL your failures.
If you do not then the card will eventually give so many failures that you will get no new work. Your GTX260 will be doing at least 3 times the work of the 9800GT.
What are the GPU temperatures like with and without the 9800GT installed,
Running tasks and not running tasks?
What is your PSU? |
|
|
|
More and more...
task 1629422
task 1629408
task 1629383
task 1629337
task 1629322
Tonight I suspend proyect until further news. If admins what me to report any further information, I'll gladly help.
When your card fails task after task in 10 seconds or less, it's likely that its internal state has become corrupted. Do a complete power cycle - power down the host and restart: that should allow most of those tasks (the GIANNI-BIND and the KASHIF-HIVPR, at least) to run properly - even on the 9800GT that SKGiven despises so much :-) |
|
|
|
Install and run GPU-Z. It is freeware and will allow you to see the temperatures of the GPUS. If they are over 70degrees when running GPUGrid tasks, you may have a heating / ventilation problem.
That's good advice for the CPU.
The GPU is a different beast altogether.
Newer GPU's appear to be designed to run hotter. A *LOT* hotter. The latest generation of Nvidia GPUs (260, 280, 285, 295) actually are designed to safely run to -- get this -- just over 100 degrees Celsius. In fact, when the fan is left on automatic, it won't even start to ramp up the fan speed until the temperature exceeds 70. Even then, it's only slightly increasing the fan speed and is clearly not trying to keep temps in the 70s
Normal operating temperature under a heavy load seems to be around 80 to 85 degrees -- and that's with the fan still below 60% on automatic control.
So don't panic if you see GPU temps in the 70s or 80s. That's normal.
On the other hand, if your *CPU* is running that hot, you're possibly just on the brink of having it fail, depending on the model.
____________
Want to find one of the largest known primes? Try PrimeGrid. Or help cure disease at WCG.
|
|
|
skgivenVolunteer moderator Volunteer tester
Send message
Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level
Scientific publications
|
Install and run GPU-Z. It is freeware and will allow you to see the temperatures of the GPUS. If they are over 70degrees when running GPUGrid tasks, you may have a heating / ventilation problem.
That's good advice for the CPU.
The GPU is a different beast altogether.
So don't panic if you see GPU temps in the 70s or 80s. That's normal.
OK, GPU-Z measures GPU temperatures. It is not CPU-Z!
Although the cards can run hot, it is NOT Good for them to stay at that temperature for extended periods of time and certainly Not Normal!
My GTX260-216sp sits at 66 Degrees C when crunching GPUGrid and the fan is at 40%. That is my Normal and the card works 100% for ALL tasks. I would not like to hear the fan at 60% or run the GPU at 80 Degrees C.
Pull the 9800GT
You are testing to find the problem, so trying things is important! |
|
|
|
Sure it wasn't the heat, because since I thought that it was in a very cold room with the mainboard without case.
I had the GT in the same board with the GTX card. I moved the gtx to a brand new board and everything is fine since then... the wonderfull world of computers...
Now having problems with Linux drivers with my GTX, but that's another story I'll handle with time and patience... |
|
|
skgivenVolunteer moderator Volunteer tester
Send message
Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level
Scientific publications
|
A pain I’m sure.
It ran for some time before it failed!
Not sure a 9500 GT is up to the task any more. Well, certainly not that task.
Was there a system restart, an update, or did you clear the cache, wipe free space...? Just on the off chance it is something other than Boinc/GPUGrid!
I tried my 8800GTS again, but all tasks failed. 4 failed within 3sec (no real problem) but the fifth failed after 13h or so. It is running another task. If it fails I will take it back off GPUGrid, but I might get another GT240 instead. The one I have uses less electric and has successfully finished all 11 tasks so far. |
|
|