Advanced search

Message boards : Number crunching : Errored WU

Author Message
Gandalf_the_Grey
Send message
Joined: 4 Apr 13
Posts: 27
Credit: 32,882,125
RAC: 0
Level
Val
Scientific publications
wat
Message 30932 - Posted: 23 Jun 2013 | 7:43:58 UTC

If changing/modifying a GPU causes a WU to error, is it possible to have the specific WU resent for re-crunching?

Thanks

____________

Betting Slip
Send message
Joined: 5 Jan 09
Posts: 670
Credit: 2,498,095,550
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 30935 - Posted: 23 Jun 2013 | 8:07:19 UTC - in response to Message 30932.

In a word, No.

A WU is never sent to the same host twice. Just make sure you exit BOINC before changing/modifying graphics card.


____________
Radio Caroline, the world's most famous offshore pirate radio station.
Great music since April 1964. Support Radio Caroline Team -
Radio Caroline

Gandalf_the_Grey
Send message
Joined: 4 Apr 13
Posts: 27
Credit: 32,882,125
RAC: 0
Level
Val
Scientific publications
wat
Message 30938 - Posted: 23 Jun 2013 | 9:52:24 UTC - in response to Message 30935.

In a word, No.

A WU is never sent to the same host twice. Just make sure you exit BOINC before changing/modifying graphics card.

Since the computer was shutdown and powered off, the BOINC was exited. Your WUs still crashed. Fortunately, I don't change my GPU configuration everyday. I'll be running GPUGRID WUs, until WCG comes up with new GPU projects. At the rate WCG is moving, that probably won't be for another six months.


John C MacAlister
Send message
Joined: 17 Feb 13
Posts: 181
Credit: 144,871,276
RAC: 0
Level
Cys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwat
Message 31054 - Posted: 27 Jun 2013 | 12:15:48 UTC

Hi, GPUGrid Folks:

I have recently brought on line a new PC with an AMD FX-8350 8 core CPU and two ASUS GTX 650 Ti GPUs on board. I have successfully processed three tasks - two long and one short. However, two other short run tasks have failed as shown below. Is anyone else having similar problems? It's very frustrating to have failures so close to the end of the run time.

Thank you,

John



Status Run time
(sec) CPU time
(sec) Credit Application
6990596
4547226
26 Jun 2013 | 9:55:03 UTC 26 Jun 2013 | 15:32:02 UTC Completed and validated 19,860.15 19,429.04 20,550.00 Short runs (2-3 hours on fastest card) v6.52 (cuda42)
6987452
4544848
25 Jun 2013 | 0:38:24 UTC 25 Jun 2013 | 23:21:30 UTC Completed and validated 66,621.91 65,430.47 133,950.00 Long runs (8-12 hours on fastest card) v6.18 (cuda42)
6987227
4544701
25 Jun 2013 | 0:38:24 UTC 25 Jun 2013 | 19:03:43 UTC Completed and validated 65,295.42 64,314.58 133,950.00 Long runs (8-12 hours on fastest card) v6.18 (cuda42)


6991878
4548183
27 Jun 2013 | 1:41:49 UTC 27 Jun 2013 | 8:47:04 UTC Error while computing 25,124.96 24,758.14 --- Short runs (2-3 hours on fastest card) v6.52 (cuda42)
6990599
4547228
26 Jun 2013 | 9:55:03 UTC 26 Jun 2013 | 14:48:37 UTC Error while computing 17,187.14 16,752.82 --- Short runs (2-3 hours on fastest card) v6.52 (cuda42)

Profile dskagcommunity
Avatar
Send message
Joined: 28 Apr 11
Posts: 456
Credit: 817,865,789
RAC: 0
Level
Glu
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 31055 - Posted: 27 Jun 2013 | 13:15:15 UTC

Is your CPU heavy loaded?
____________
DSKAG Austria Research Team: http://www.research.dskag.at



GoodFodder
Send message
Joined: 4 Oct 12
Posts: 53
Credit: 333,467,496
RAC: 0
Level
Asp
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 31056 - Posted: 27 Jun 2013 | 13:26:11 UTC

Hi John,

- I see you are having quite a few failures, here is my tuppence:

The first glaringly obvious thing would be to upgrade your drivers to the latest WHQL directly from the nvidia - 320.18 were awful. Note personal experience is NOT to uninstall and use driver cleaners etc as many on the web will recommend but let nvidia do their job and just upgrade the drivers directly via their install.

Secondly I would ask are you overclocking? what are your temps? Specs and age of your PSU and any other apps / tasks are running.

If you have not already done so; with BOINC disabled I would suggest you test your systems stability with MemTestG80 (http://folding.stanford.edu/English/DownloadUtils) for at least 12hrs and OCCT (http://www.ocbase.com/index.php/download) for about 10mins (OCCT is particularly punishing hence set the max temp to a safe 85C)

Have noticed GPUGrid to be particularly CPU sensitive - no so much in performance but in availability inorder to feed the GPU hence would recommend you leave one core free for the OS.

As you are running windows 7 I would disable CPU parking and would also recommend you prioritise the acemd2865P.exe process using something like eFMer priority (http://www.efmer.eu/boinc/download.html) or as you have eight cores ProcessLasso (http://bitsum.com/processlasso/).

If the above are successful - as a test I would pause any other apps/tasks and just run GPUGrid tasks to get an idea of throughput, then one by one enable any other BOINC tasks.

Hope this helps, have fun



Gandalf_the_Grey
Send message
Joined: 4 Apr 13
Posts: 27
Credit: 32,882,125
RAC: 0
Level
Val
Scientific publications
wat
Message 31057 - Posted: 27 Jun 2013 | 13:32:37 UTC - in response to Message 31055.

Are you using the app_config.xml file and if so, what is its contents?

____________

John C MacAlister
Send message
Joined: 17 Feb 13
Posts: 181
Credit: 144,871,276
RAC: 0
Level
Cys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwat
Message 31058 - Posted: 27 Jun 2013 | 14:10:14 UTC - in response to Message 31057.

Hi, Gandalf:

I am not using any the app_config.xml file to try and keep things understandable.

Thanks,

John

John C MacAlister
Send message
Joined: 17 Feb 13
Posts: 181
Credit: 144,871,276
RAC: 0
Level
Cys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwat
Message 31059 - Posted: 27 Jun 2013 | 14:17:30 UTC - in response to Message 31056.

Hi, GoodFodder:

Thanks for the lengthy response.

1. I am using the 320.18 drivers from NVIDIA. I will try to update these without causing a crash.........I am very leery about touching drivers!

2. My system runs at factory clock speeds - in the past I have experienced extreme instability with overclocking.

3. I will run GPUGrid tasks only as a test: the sharing of the GPUs with Folding at home Alzheimer's tasks may be an issue on the short runs. I did not share on the long runs.

It may take a while, but I will get around to using your suggestions

Thanks, again,

John

Hi John,

- I see you are having quite a few failures, here is my tuppence:

The first glaringly obvious thing would be to upgrade your drivers to the latest WHQL directly from the nvidia - 320.18 were awful. Note personal experience is NOT to uninstall and use driver cleaners etc as many on the web will recommend but let nvidia do their job and just upgrade the drivers directly via their install.

Secondly I would ask are you overclocking? what are your temps? Specs and age of your PSU and any other apps / tasks are running.

If you have not already done so; with BOINC disabled I would suggest you test your systems stability with MemTestG80 (http://folding.stanford.edu/English/DownloadUtils) for at least 12hrs and OCCT (http://www.ocbase.com/index.php/download) for about 10mins (OCCT is particularly punishing hence set the max temp to a safe 85C)

Have noticed GPUGrid to be particularly CPU sensitive - no so much in performance but in availability inorder to feed the GPU hence would recommend you leave one core free for the OS.

As you are running windows 7 I would disable CPU parking and would also recommend you prioritise the acemd2865P.exe process using something like eFMer priority (http://www.efmer.eu/boinc/download.html) or as you have eight cores ProcessLasso (http://bitsum.com/processlasso/).

If the above are successful - as a test I would pause any other apps/tasks and just run GPUGrid tasks to get an idea of throughput, then one by one enable any other BOINC tasks.

Hope this helps, have fun




John C MacAlister
Send message
Joined: 17 Feb 13
Posts: 181
Credit: 144,871,276
RAC: 0
Level
Cys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwat
Message 31060 - Posted: 27 Jun 2013 | 14:19:12 UTC - in response to Message 31055.
Last modified: 27 Jun 2013 | 14:19:33 UTC

Hi dskagcommunity:

Thanks for taking the time to respond. Please see my responses below.

John

Gandalf_the_Grey
Send message
Joined: 4 Apr 13
Posts: 27
Credit: 32,882,125
RAC: 0
Level
Val
Scientific publications
wat
Message 31061 - Posted: 27 Jun 2013 | 14:23:46 UTC - in response to Message 31059.

Can you post an image of your Device Manager?

____________

John C MacAlister
Send message
Joined: 17 Feb 13
Posts: 181
Credit: 144,871,276
RAC: 0
Level
Cys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwat
Message 31063 - Posted: 27 Jun 2013 | 15:05:02 UTC - in response to Message 31061.

Which part? the entire thing or a section?

Gandalf_the_Grey
Send message
Joined: 4 Apr 13
Posts: 27
Credit: 32,882,125
RAC: 0
Level
Val
Scientific publications
wat
Message 31064 - Posted: 27 Jun 2013 | 15:10:04 UTC - in response to Message 31063.

Which part? the entire thing or a section?

It should look like this... But if a category is open, leave it open.

flashawk
Send message
Joined: 18 Jun 12
Posts: 297
Credit: 3,572,627,986
RAC: 0
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 31067 - Posted: 27 Jun 2013 | 17:13:49 UTC

I'm actually running the same CPU that you are on a 990FX chipset and had issue's in the beginning. Main one being the BIOS on some of those boards sets the voltages to low for the memory (1.46v), I upped mine to 1.575 and things smoothed out, memory can take 1.6v no problem. Make sure all power saving settings are turned off along with turbo core - all set to disabled. Turn on HPC (High Performance computing), Those CPU's can run really hot!! Use a program called AMD overdrive to get accurate readings from the chipset like temperatures, voltages and memory speed.

http://sites.amd.com/us/game/downloads/amd-overdrive/Pages/overview.aspx

GoodFodder
Send message
Joined: 4 Oct 12
Posts: 53
Credit: 333,467,496
RAC: 0
Level
Asp
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 31068 - Posted: 27 Jun 2013 | 17:23:00 UTC

John:

re your comment " the sharing of the GPUs with Folding at home Alzheimer's tasks may be an issue on the short runs. I did not share on the long runs."

Running F@H and GpuGrid on the same machine is asking for trouble as they will be heavily competing for resources. You need to choose one or the other - no guessing which one on this forum people would recommend.

Gandalf_the_Grey
Send message
Joined: 4 Apr 13
Posts: 27
Credit: 32,882,125
RAC: 0
Level
Val
Scientific publications
wat
Message 31069 - Posted: 27 Jun 2013 | 17:47:41 UTC - in response to Message 31068.
Last modified: 27 Jun 2013 | 17:52:29 UTC

John:

re your comment " the sharing of the GPUs with Folding at home Alzheimer's tasks may be an issue on the short runs. I did not share on the long runs."

Running F@H and GpuGrid on the same machine is asking for trouble as they will be heavily competing for resources. You need to choose one or the other - no guessing which one on this forum people would recommend.

I have two Intel i7 computers both running GPUGRID and Collatz without any problems...http://boinc.thesonntags.com/collatz/index.php
One computer has two NVIDIA GPUs and the other has three NVIDIA GPUs.

Edit: I'm running both the long and short CUDA42 WUs.

GoodFodder
Send message
Joined: 4 Oct 12
Posts: 53
Credit: 333,467,496
RAC: 0
Level
Asp
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 31072 - Posted: 27 Jun 2013 | 19:27:34 UTC


Note Gpugrid and Collatz are both managed by BOINC - F@H is not.


Gandalf_the_Grey
Send message
Joined: 4 Apr 13
Posts: 27
Credit: 32,882,125
RAC: 0
Level
Val
Scientific publications
wat
Message 31074 - Posted: 27 Jun 2013 | 20:27:18 UTC - in response to Message 31072.

Note Gpugrid and Collatz are both managed by BOINC - F@H is not.

Thank you....That explains a lot.

John C MacAlister
Send message
Joined: 17 Feb 13
Posts: 181
Credit: 144,871,276
RAC: 0
Level
Cys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwat
Message 31075 - Posted: 27 Jun 2013 | 20:40:33 UTC

Thanks to all:

My system builder has diagnosed a Win7 problem which he will fix ASAP. In the meantime, I will continue processing F@H Alzheimer's tasks.

John

Profile Beyond
Avatar
Send message
Joined: 23 Nov 08
Posts: 1112
Credit: 6,162,416,256
RAC: 0
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 31109 - Posted: 28 Jun 2013 | 19:11:59 UTC - in response to Message 31075.

In the meantime, I will continue processing F@H Alzheimer's tasks.

Use coconut oil and take curcumin.

Post to thread

Message boards : Number crunching : Errored WU

//