Author |
Message |
|
Hi,
I am new in GPUGrid, and I started to crunch some WU a few days ago.
1. This WU ended in error.
http://www.gpugrid.net/result.php?resultid=4259296
Do you know why?
Do I have to change anything in my configuration?
2. Which is the normal load for the GPU? I have only 48% for short WUs and 65% for long ones
Best regards,
Alejandro |
|
|
|
another one
http://www.gpugrid.net/result.php?resultid=4261811
<core_client_version>6.12.33</core_client_version>
<![CDATA[
<message>
- exit code 98 (0x62)
</message>
<stderr_txt>
# Using device 0
# There is 1 device supporting CUDA
# Device 0: "GeForce GTX 580"
# Clock rate: 1.56 GHz
# Total amount of global memory: 1576468480 bytes
# Number of multiprocessors: 16
# Number of cores: 128
SWAN: Using synchronization method 0
MDIO: cannot open file "restart.coor"
ERROR: get_Dvec() element 0 (b)
called boinc_finish
</stderr_txt>
]]>
I received the error at the end. So 11 hours running...
The long runs that I completed were without swan_sync = 0
SWAN: Using synchronization method 0
Best regards,
Alejandro |
|
|
skgivenVolunteer moderator Volunteer tester
Send message
Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level
Scientific publications
|
I don't know the cause of this error, ERROR: get_Dvec() element 0 (b)
It has been seen before, this month and in July on other high end Fermi cards (480 and 590).
Perhaps the scientists know, or perhaps it really is just a client error (system or card related problem).
I would suggest you do a system restart, and check your system and GPU temperatures. Also check that your system is not down-clocking your GPU (NVidia control Panel, prefer Maximum Performance, rather than Adaptive).
Are you using any GPU intensive applications other than GPUGrid?
I guess it's possible that this is related to you running other projects, but I know nothing about what you ran before these two tasks.
In both your linked tasks you did use SWAN_SYNC:
"SWAN: Using synchronization method 0"
Are you also freeing up a CPU core? If not there is no point usine SWAN_SYNC whatsoever.
48% GPU utilization seems too low, but then I am not presently running a W7 system. You should be able to get it to within 15% as good as XP, on which I see up to 98% for Fermi tasks. That said the GIANNI tasks are only around 81% with SWAN_SYNC and Free CPU cores. But you should still be able to get around 65 to 70% utilization.
One last thing, you could try to update your driver, but I would check everything else first.
Good luck,
Kev |
|
|
|
Hi Kev,
I would suggest you do a system restart, and check your system and GPU temperatures.
done. GPU temperature : 65C
Also check that your system is not down-clocking your GPU (NVidia control Panel, prefer Maximum Performance, rather than Adaptive).
I changed in NVidia Control Panel -> Manage 3D settings -> Power Management Mode. From Adaptive to prefer maximum Performance
Are you using any GPU intensive applications other than GPUGrid?
I guess it's possible that this is related to you running other projects, but I know nothing about what you ran before these two tasks.
No. I am only using the GPU for gpugrid. the CPU for GPUGrid, einstein and test4theory
In both your linked tasks you did use SWAN_SYNC:
"SWAN: Using synchronization method 0"
Are you also freeing up a CPU core? If not there is no point usine SWAN_SYNC whatsoever.
How do I free up a CPU core? with swan_sync I can see in the task manager that the process acemdlong_6.15_windows_intelx86__cuda31 takes 25% (the core)
if not ony takes 5 to 9%
48% GPU utilization seems too low, but then I am not presently running a W7 system. You should be able to get it to within 15% as good as XP, on which I see up to 98% for Fermi tasks. That said the GIANNI tasks are only around 81% with SWAN_SYNC and Free CPU cores. But you should still be able to get around 65 to 70% utilization.
GIANNI tasks ?
One last thing, you could try to update your driver, but I would check everything else first.
I have already updated it. This happend with driver 280.26
on the weekend a disabled swan_sync in the middle of the task
http://www.gpugrid.net/result.php?resultid=4263691
and I restarted the system.
The task finished without troubles.
But yesterday I got for task http://www.gpugrid.net/result.php?resultid=4266224
the following error.
<core_client_version>6.12.33</core_client_version>
<![CDATA[
<message>
The system cannot find the path specified. (0x3) - exit code 3 (0x3)
</message>
<stderr_txt>
# Using device 0
# There is 1 device supporting CUDA
# Device 0: "GeForce GTX 580"
# Clock rate: 1.56 GHz
# Total amount of global memory: 1576468480 bytes
# Number of multiprocessors: 16
# Number of cores: 128
MDIO: cannot open file "restart.coor"
SWAN: FATAL : swanMemcpyDtoH failed
Assertion failed: 0, file swanlib_nv.c, line 390
This application has requested the Runtime to terminate it in an unusual way.
Please contact the application's support team for more information.
</stderr_txt>
So I am getting errors of all kind of flavors :)
Best regards,
Alejandro
|
|
|
DagorathSend message
Joined: 16 Mar 11 Posts: 509 Credit: 179,005,236 RAC: 0 Level
Scientific publications
|
I changed in NVidia Control Panel -> Manage 3D settings -> Power Management Mode. From Adaptive to prefer maximum Performance
If you select Prefer Maximum performance in Linux it will revert back to Adaptive after rebooting the OS. Maybe it does that in Windows too?
How do I free up a CPU core?
Go into your BOINC preferences and adjust the "On multiprocessor systems, use at most __% of the processors". If you have 8 virtual cores (4 cores with HT turned on) then set it to 87.5% to allow 1 virtual core free. If you have 4 cores (no HT) then set it to 75% to use 3 cores for BOINC and leave 1 free.
|
|
|
|
Hi Dagorath, Kev,
Now it is working fine. thxs!!!!
NO more errors. I activated swan_sinc again , freed up a CPU core and restarted windows.
I still have the issue of 48% GPU load for ACEMD2: GPU molecular dynamics v6.15 tasks.
I found out that for very long tasks (4000000 steps) the gpu used is 70%
and the use of the gpu is reducing accordingly with the steps (45-48% for 500000 steps). So, the time per step increase for smaller WU.
I checked with other users tasks and I saw the same behaviour ( for smaller WU increase the time per step).
So that brings me to the cpu.
It seems that the cpu Intel(R) Core(TM)2 Quad CPU Q9550 is not fast enough for the gtx 580. I can see on the task manager that the task acemd2_6.15_windows_intelx86__cuda31 always uses 25% ( 4 cores without HT)
What is better for gpugrid, that I only accept long tasks (application: ACEMD for long runs ) so the gpu is used to 70% or to accept all applications?
Best regards,
Alejandro
|
|
|
bigtunaVolunteer moderator Send message
Joined: 6 May 10 Posts: 80 Credit: 98,784,188 RAC: 0 Level
Scientific publications
|
If you are wanting more points you should only accept long running work units as they give more points per time invested. If you just want to help GPUGrid and don't care about points I should think that accepting all tasks would be more helpful.
Also you might want to look into running Linux for better GPUGrid performance. W7 does not produce the best results. Windows XP and Linux both currently outperform W7 on GPUGrid tasks AFAIK.
You can try Linux for free without even installing it to your hard drive. All it would cost is a single CD or DVD blank disk and some time.
FatDog-64 is a super small, super cool Linux distro that can be downloaded in just a few minutes. It will run straight from CD/DVD or from a USB stick, or you can install it to your hard drive but this is not required.
Instructions are here:
http://www.gpugrid.net/forum_thread.php?id=2203#17646 |
|
|
|
Hi Bigtuna,
I am not interested in collecting points. I have activated all the applications.
My plan is to move to linux next year. Now I am short of time, and moving to linux, will take me 2 days. I have 2 PCs. This with windows 7 and a linux box , but I cannot put a better power supply there and intall the graphic card. I made the mistake to by it from DELL.
So I have to install linux here and windows 7 on the DELL.
Best regards,
Alejandro |
|
|
bigtunaVolunteer moderator Send message
Joined: 6 May 10 Posts: 80 Credit: 98,784,188 RAC: 0 Level
Scientific publications
|
I understand that you are busy and don't have the time to spare right now. When you are ready let me know and I'll be happy to help with Linux as much as I can.
If you get time check out FatDog-64. FatDog-64 is so small it will run great from a USB stick without the need to install or write anything to your hard drive at all. You can keep Windows 7 on your hard drive and still run Linux by choosing to boot from USB when you want Linux and by booting from your hard drive when you want Windows. |
|
|
nenymSend message
Joined: 31 Mar 09 Posts: 137 Credit: 1,308,230,581 RAC: 0 Level
Scientific publications
|
What was wrong with that task?
10-KASHIF_HIVPR_cl_ba1-0-100-RND4144_0
Workunit 3028713
Created 10 Jan 2012 | 14:11:21 UTC
Sent 10 Jan 2012 | 16:32:37 UTC
Received 10 Jan 2012 | 20:48:01 UTC
Server state Over
Outcome Success
Client state Done
Exit status 0 (0x0)
Computer ID 100808
Report deadline 15 Jan 2012 | 16:32:37 UTC
Run time 11,822.64
CPU time 11,822.64
Validate state Valid
Credit 0.00
Application version ACEMD2: GPU molecular dynamics v6.14 (cuda31)
Stderr output
<core_client_version>6.12.34</core_client_version>
<![CDATA[
<stderr_txt>
# Using device 2
# There are 7 devices supporting CUDA
.
# Device 2: "GeForce GTX 580"
# Clock rate: 1.56 GHz
# Total amount of global memory: 1610285056 bytes
# Number of multiprocessors: 16
# Number of cores: 128
.
SWAN: Using synchronization method 0
MDIO: cannot open file "restart.coor"
# Time per step (avg over 1250000 steps): 9.484 ms
# Approximate elapsed time for entire WU: 11855.196 s
07:47:38 (24769): called boinc_finish
</stderr_txt>
]]>
|
|
|
ToniVolunteer moderator Project administrator Project developer Project tester Project scientist Send message
Joined: 9 Dec 08 Posts: 1006 Credit: 5,068,599 RAC: 0 Level
Scientific publications
|
Hi Nenym, thanks. Bug fixed.
T |
|
|
nenymSend message
Joined: 31 Mar 09 Posts: 137 Credit: 1,308,230,581 RAC: 0 Level
Scientific publications
|
Windows Beta runDIG10-TONI_BADO9GG-4-5-RND0865_5 task errored out on startup. Debug info can bee seen via link.
Win XP, GTX560Ti, factory OC lovered to level, when NATHAN_CB tasks runs succesfully (900 -> 890) . |
|
|