Advanced search

Message boards : Number crunching : WU failing after ~5 sec

Author Message
Terrible T
Send message
Joined: 5 Jan 17
Posts: 4
Credit: 56,950,647
RAC: 0
Level
Thr
Scientific publications
watwat
Message 54517 - Posted: 1 May 2020 | 16:46:13 UTC

Had about a dozen tasks crashing.
Anyone some idea what the errorcodes mean?

<core_client_version>7.16.5</core_client_version>
<![CDATA[
<message>
(unknown error) - exit code 195 (0xc3)</message>
<stderr_txt>
18:29:15 (2764): wrapper (7.9.26016): starting
18:29:15 (2764): wrapper: running acemd3.exe (--boinc input --device 0)
18:29:17 (2764): acemd3.exe exited; CPU time 0.000000
18:29:17 (2764): app exit status: 0xc0000005
18:29:17 (2764): called boinc_finish(195)

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 1340
Credit: 7,653,273,724
RAC: 13,371,735
Level
Tyr
Scientific publications
watwatwatwatwat
Message 54520 - Posted: 1 May 2020 | 20:01:58 UTC - in response to Message 54517.

https://boinc.mundayweb.com/wiki/index.php?title=Error_code_-191_to_-200_explained

ERR_NO_APP_VERSION -195 - BOINC couldn't find the application's version number.

Is the acemd3 and wrapper executable in the GPUGrid directory?

Terrible T
Send message
Joined: 5 Jan 17
Posts: 4
Credit: 56,950,647
RAC: 0
Level
Thr
Scientific publications
watwat
Message 54521 - Posted: 1 May 2020 | 21:31:06 UTC - in response to Message 54520.
Last modified: 1 May 2020 | 21:33:41 UTC

That's a good wiki, thanks!

Yes both acemd3 and wrapper are in project directory.
Only noticed several files, incl. acemd3 have a hash or so behind the file extension, first time I see that.

acemd3.exe.13c31bf624c1a4246bef60d9b5ab664d

(A 0 results google search term)

Have removed/rejoined , still the same ( 60% erroring out.)
Maybe Rosetta and GPUGrid together cause IO conflicts.

Ian&Steve C.
Avatar
Send message
Joined: 21 Feb 20
Posts: 1069
Credit: 40,231,533,983
RAC: 527
Level
Trp
Scientific publications
wat
Message 54522 - Posted: 1 May 2020 | 21:56:02 UTC - in response to Message 54521.
Last modified: 1 May 2020 | 21:58:21 UTC

dupe

Ian&Steve C.
Avatar
Send message
Joined: 21 Feb 20
Posts: 1069
Credit: 40,231,533,983
RAC: 527
Level
Trp
Scientific publications
wat
Message 54523 - Posted: 1 May 2020 | 21:57:21 UTC - in response to Message 54521.

are the files set with execute permissions?
____________

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2343
Credit: 16,206,655,749
RAC: 261,147
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 54524 - Posted: 1 May 2020 | 22:58:01 UTC - in response to Message 54521.
Last modified: 1 May 2020 | 23:03:20 UTC

Yes both acemd3 and wrapper are in project directory.
Only noticed several files, incl. acemd3 have a hash or so behind the file extension, first time I see that.

acemd3.exe.13c31bf624c1a4246bef60d9b5ab664d
You're right, it's the md5 hash of the executable.

(A 0 results google search term)
Now you've made one. I've made another.

Have removed/rejoined , still the same ( 60% erroring out.)
Maybe Rosetta and GPUGrid together cause IO conflicts.
I don't think so. Unless you run too many rosetta simultaneously.
Try to limit the number of usable CPUs in BOINC manager to 50% (options -> computing preferences -> use at most 50% of the processors).
Is your i7-6800K overclocked? If it is, try to reduce its BCLK core ratio and BCLK uncore ratio.
You can try to exclude the C:\ProgramData\BOINC folder in your antivirus software.

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2343
Credit: 16,206,655,749
RAC: 261,147
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 54525 - Posted: 1 May 2020 | 23:00:11 UTC - in response to Message 54523.

are the files set with execute permissions?
This is Windows, there is no such file attribute.

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 1340
Credit: 7,653,273,724
RAC: 13,371,735
Level
Tyr
Scientific publications
watwatwatwatwat
Message 54526 - Posted: 1 May 2020 | 23:16:45 UTC - in response to Message 54525.

are the files set with execute permissions?
This is Windows, there is no such file attribute.

I also suspect a too aggressive AV program.
Whitelist the BOINC directory.

Terrible T
Send message
Joined: 5 Jan 17
Posts: 4
Credit: 56,950,647
RAC: 0
Level
Thr
Scientific publications
watwat
Message 54527 - Posted: 1 May 2020 | 23:30:51 UTC - in response to Message 54524.

Is your i7-6800K overclocked? If it is, try to reduce its BCLK core ratio and BCLK uncore ratio.
You can try to exclude the C:\ProgramData\BOINC folder in your antivirus software.


Boinc directory all ready excluded longtime.
Only 6 cores for Rosetta now(app_config)
PC is not overclocked, just finished one month Rosetta @ 12 cores without hickups.
Maybe the SSD is due for some garbage collection will check later.

Thanks for thinking along.

Ian&Steve C.
Avatar
Send message
Joined: 21 Feb 20
Posts: 1069
Credit: 40,231,533,983
RAC: 527
Level
Trp
Scientific publications
wat
Message 54528 - Posted: 1 May 2020 | 23:39:42 UTC - in response to Message 54525.

are the files set with execute permissions?
This is Windows, there is no such file attribute.

while it may not be the issue at hand, Windows definitely does have execution permissions.
____________

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2343
Credit: 16,206,655,749
RAC: 261,147
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 54529 - Posted: 2 May 2020 | 1:08:25 UTC - in response to Message 54528.

are the files set with execute permissions?
This is Windows, there is no such file attribute.
while it may not be the issue at hand, Windows definitely does have execution permissions.
Sorry, I misunderstood you before.
Those permissions are set by the BOINC installer.
If they would be incorrectly set (later, by the user), none of the tasks could run.

Profile ServicEnginIC
Avatar
Send message
Joined: 24 Sep 10
Posts: 581
Credit: 9,770,362,024
RAC: 21,500,013
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 54530 - Posted: 2 May 2020 | 11:22:17 UTC - in response to Message 54517.
Last modified: 2 May 2020 | 11:23:30 UTC

Had about a dozen tasks crashing.

I see that you have at your system #403376 a mix of succeeding WUs and failing ones after a few seconds.
On the hardware's point of view, this might be compatible with:

-1) Some failing memory chip at graphics card. Depending on to what memory chip(s) are assigned the initial data for a certain WU, it will be affected if data are assigned to the failing chip.
I'm afraid that if this is the problem, there isn't solution other than replacing graphics card (or, less drastic, trying first to underclock memory chips ).

-2) Some problem regarding the ones described on this post, relative to deficient electrical contacts.

Symptom: A GPU normally working fine starts to fail tasks intermitently with no apparent reason.
Remedy: Extract GPU and memory DIMMs, clean contacts and reinsert.

Terrible T
Send message
Joined: 5 Jan 17
Posts: 4
Credit: 56,950,647
RAC: 0
Level
Thr
Scientific publications
watwat
Message 54532 - Posted: 2 May 2020 | 14:56:31 UTC - in response to Message 54530.

ServicEnginIC , you come close! ;-)

After opening the case found 'a bit more' dustbunnies inside as expected
(cleaned out pc late march).

If I would have also checked MB temps, instead of only GPU temp, perhaps I would have noticed.
Now running happy without crashes (fingers crossed).

Thanks again for all input

Profile robertmiles
Send message
Joined: 16 Apr 09
Posts: 503
Credit: 755,370,933
RAC: 212,472
Level
Glu
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 54553 - Posted: 3 May 2020 | 13:12:57 UTC

Is there a program for checking MB temperatures that will run under Windows 10 on most brands of computers?

Lazydude
Send message
Joined: 25 Sep 08
Posts: 12
Credit: 161,238,437
RAC: 0
Level
Ile
Scientific publications
watwatwatwatwatwatwatwatwat
Message 54554 - Posted: 3 May 2020 | 13:23:40 UTC - in response to Message 54553.

Is there a program for checking MB temperatures that will run under Windows 10 on most brands of computers?


I'm using https://www.hwinfo.com/
in sensormode - if your mb have an sensor it shows the value,
it also shows values from GPU

Profile robertmiles
Send message
Joined: 16 Apr 09
Posts: 503
Credit: 755,370,933
RAC: 212,472
Level
Glu
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 54557 - Posted: 3 May 2020 | 14:15:04 UTC - in response to Message 54554.

Is there a program for checking MB temperatures that will run under Windows 10 on most brands of computers?


I'm using https://www.hwinfo.com/
in sensormode - if your mb have an sensor it shows the value,
it also shows values from GPU

Thank you - I just installed it.

Aurum
Avatar
Send message
Joined: 12 Jul 17
Posts: 401
Credit: 16,755,010,632
RAC: 220,113
Level
Trp
Scientific publications
watwatwat
Message 54563 - Posted: 3 May 2020 | 21:52:37 UTC - in response to Message 54521.

Maybe Rosetta and GPUGrid together cause IO conflicts.

I've never seen a problem running GG & Rosetta together. Rosetta is a memory hog but the worst I've seen is some WUs may be "Suspended waiting for memory."

Aurum
Avatar
Send message
Joined: 12 Jul 17
Posts: 401
Credit: 16,755,010,632
RAC: 220,113
Level
Trp
Scientific publications
watwatwat
Message 54564 - Posted: 3 May 2020 | 21:59:31 UTC - in response to Message 54524.

Try to limit the number of usable CPUs in BOINC manager to 50% (options -> computing preferences -> use at most 50% of the processors).

Zoltan you lost me on this one. Why 50%??? I would think 92% (11/12).

Profile robertmiles
Send message
Joined: 16 Apr 09
Posts: 503
Credit: 755,370,933
RAC: 212,472
Level
Glu
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 54565 - Posted: 3 May 2020 | 22:23:20 UTC - in response to Message 54564.

Try to limit the number of usable CPUs in BOINC manager to 50% (options -> computing preferences -> use at most 50% of the processors).

Zoltan you lost me on this one. Why 50%??? I would think 92% (11/12).

Something to try to help diagnose what resource is running short. Not necessarily permanent after the diagnosis is finished.

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2343
Credit: 16,206,655,749
RAC: 261,147
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 54585 - Posted: 5 May 2020 | 14:44:21 UTC - in response to Message 54564.

Try to limit the number of usable CPUs in BOINC manager to 50% (options -> computing preferences -> use at most 50% of the processors).

Zoltan you lost me on this one. Why 50%??? I would think 92% (11/12).
My answer would hijack this thread, so I've made a new thread for it.

Post to thread

Message boards : Number crunching : WU failing after ~5 sec

//