Author |
Message |
Erich56Send message
Joined: 1 Jan 15 Posts: 1132 Credit: 10,205,482,676 RAC: 29,855,510 Level
Scientific publications
|
Any idea why this task failed with "computation error" about 1 hour after start:
https://www.gpugrid.net/result.php?resultid=32654746 |
|
|
|
Exit status 194 (0xc2) EXIT_ABORTED_BY_CLIENT
____________
|
|
|
Erich56Send message
Joined: 1 Jan 15 Posts: 1132 Credit: 10,205,482,676 RAC: 29,855,510 Level
Scientific publications
|
Exit status 194 (0xc2) EXIT_ABORTED_BY_CLIENT
which is definitely wrong. At least, if the client refers to me personally, for sure I did NOT abort the WU.
Further, under result plus under clientstatus is says: "Berechnungsfehler", i.e. "computation error".
|
|
|
|
don't get too hung up on the verbiage used by BOINC.
ANY kind of error, be it pre-computation, during-computation issues, manual aborts, automatic aborts, or even things like upload errors (after computation has completed) will be classified as "Computation Error". This is the same for all projects, It's just the generic words BOINC uses when there's an error it can't resolve, and more detailed info is usually in the logs or stderr output.
since it failed with aborted by client I can only assume some kind of issue between BOINC and the app, and the BOINC client itself just killed the task.
(unknown error) - exit code 194 (0xc2)
since all you have is "unknown error" I don't think there's much to run down here.
____________
|
|
|
Erich56Send message
Joined: 1 Jan 15 Posts: 1132 Credit: 10,205,482,676 RAC: 29,855,510 Level
Scientific publications
|
...
since it failed with aborted by client I can only assume some kind of issue between BOINC and the app, and the BOINC client itself just killed the task.
(unknown error) - exit code 194 (0xc2)
since all you have is "unknown error" I don't think there's much to run down here.
in a way, I was lucky anyway that this happened after about 1 hour, and not, say, after 15 hours or so :-)
|
|
|
Erich56Send message
Joined: 1 Jan 15 Posts: 1132 Credit: 10,205,482,676 RAC: 29,855,510 Level
Scientific publications
|
now, a task failed after about 16 hours, a few minitues before getting finished:
https://www.gpugrid.net/result.php?resultid=32658707
very annoying, of course.
Can anyone tell me what was going wrong with this task? |
|
|
Keith Myers Send message
Joined: 13 Dec 17 Posts: 1340 Credit: 7,653,273,724 RAC: 13,371,735 Level
Scientific publications
|
Detected memory leaks!
Error invoking kernel: CUDA_ERROR_UNKNOWN (999)
Probably an error in the VRAM on the card. Try reducing the card temp by moving the fan speed up or reducing any overclocking. |
|
|
|
Detected memory leaks!
All Windows users have that report from the app, on perfectly good tasks. I wouldn't worry about that.
Error invoking kernel: CUDA_ERROR_UNKNOWN (999)
Isn't that what happens after a reboot, particularly after the NVidia driver has been updated by Microsoft / Windows 10?
Probably an error in the VRAM on the card. Try reducing the card temp by moving the fan speed up or reducing any overclocking.
I think we'd need more evidence before making a leap of interpretation like that.
Is the video card in question driving a monitor? If so, are there any problems with the visible display? Colour blocks, bad pixels, that sort of thing? Have you changed any operating parameters - overclocked? undervolted?
|
|
|
mmonninSend message
Joined: 2 Jul 16 Posts: 337 Credit: 7,617,757,013 RAC: 10,860,147 Level
Scientific publications
|
Detected memory leaks!
All Windows users have that report from the app, on perfectly good tasks. I wouldn't worry about that.
Error invoking kernel: CUDA_ERROR_UNKNOWN (999)
Isn't that what happens after a reboot, particularly after the NVidia driver has been updated by Microsoft / Windows 10?
Probably an error in the VRAM on the card. Try reducing the card temp by moving the fan speed up or reducing any overclocking.
I think we'd need more evidence before making a leap of interpretation like that.
Is the video card in question driving a monitor? If so, are there any problems with the visible display? Colour blocks, bad pixels, that sort of thing? Have you changed any operating parameters - overclocked? undervolted?
Windows updates doesn't include OpenCL, so not an issue here at GPUGrid. |
|
|
jiipeeSend message
Joined: 4 Jun 15 Posts: 19 Credit: 8,485,329,251 RAC: 11,127,081 Level
Scientific publications
|
Is acemd3 for Windows broken? All tasks seem to be failing:
Stderr output
<core_client_version>7.16.20</core_client_version>
<![CDATA[
<message>
(unknown error) - exit code 195 (0xc3)</message>
<stderr_txt>
03:57:10 (9116): wrapper (7.9.26016): starting
03:57:10 (9116): wrapper: running bin/acemd3.exe (--boinc --device 0)
03:57:12 (9116): bin/acemd3.exe exited; CPU time 0.000000
03:57:12 (9116): app exit status: 0xc0000135
03:57:12 (9116): called boinc_finish(195)
0 bytes in 0 Free Blocks.
456 bytes in 4 Normal Blocks.
1144 bytes in 1 CRT Blocks.
0 bytes in 0 Ignore Blocks.
0 bytes in 0 Client Blocks.
Largest number used: 0 bytes.
Total allocations: 120166 bytes.
Dumping objects ->
...
|
|
|
Erich56Send message
Joined: 1 Jan 15 Posts: 1132 Credit: 10,205,482,676 RAC: 29,855,510 Level
Scientific publications
|
Is acemd3 for Windows broken? All tasks seem to be failing:
how come at all that you receive tasks? There have not been any new ones available for serveral days. |
|
|
Keith Myers Send message
Joined: 13 Dec 17 Posts: 1340 Credit: 7,653,273,724 RAC: 13,371,735 Level
Scientific publications
|
Check your preferences. I have been getting work everyday as have others. |
|
|
zooxitSend message
Joined: 4 Jul 21 Posts: 23 Credit: 10,166,191,567 RAC: 58,460,243 Level
Scientific publications
|
Hi,
so what is the answer for this post's title question (well actually it is a statement :) )?
Why are python apps failing after 2-4 seconds?
Should I install something on my machine (running Debian 11)? |
|
|
Keith Myers Send message
Joined: 13 Dec 17 Posts: 1340 Credit: 7,653,273,724 RAC: 13,371,735 Level
Scientific publications
|
The tasks are beta and the scientists are still debugging the configuration parameters. Errors are to be expected.
If you have a task error, look at the task ID in your Tasks list and see if the task has been sent to many others that have also errored out the task. If so, everything is normal.
However if the wingmen for the task has completed the task successfully, you need to look at the stderr.txt output of the task in the list and read to the end and see what kind of error was generated. If the error is local you might be able to do something about it by restarting the host or updating the video drivers.
And you can't do anything else or need to do anything else like downloading libraries or similar because each task is bundled with exactly the resources it need to complete successfully. Or at least in theory. Again, these are beta tasks and are still being debugged. |
|
|
zooxitSend message
Joined: 4 Jul 21 Posts: 23 Credit: 10,166,191,567 RAC: 58,460,243 Level
Scientific publications
|
Thanks@KeithMyers for tips.
Checked last 20 tasks I got and all failed (they all failed after 3 seconds) - they where all 'solved' by another host shortly thereafter, so...
Everything is updated on my host. It is Debian bullseye though, on computers that finished the task I think I mostly saw they were running Ubuntu 20.04 LTS. But that is probably not the likely cause.
My STDERR says:
INTERNAL ERROR: cannot create temporary directory!
Might that be a permissions problem?
----------------------------------------------
The full STDERR:
<core_client_version>7.16.16</core_client_version>
<![CDATA[
<message>
process exited with code 195 (0xc3, -61)</message>
<stderr_txt>
14:26:46 (1648902): wrapper (7.7.26016): starting
14:26:46 (1648902): wrapper (7.7.26016): starting
14:26:46 (1648902): wrapper: running /usr/bin/flock (/var/lib/boinc-client/projects/www.gpugrid.net/miniconda.lock -c "/bin/bash ./miniconda-installer.sh -b -u -p /var/lib/boinc-client/projects/www.gpugrid.net/miniconda &&
/var/lib/boinc-client/projects/www.gpugrid.net/miniconda/bin/conda install -m -y -p gpugridpy --file requirements.txt ")
[1648927] INTERNAL ERROR: cannot create temporary directory!
[1648931] INTERNAL ERROR: cannot create temporary directory!
14:26:47 (1648902): /usr/bin/flock exited; CPU time 0.139614
14:26:47 (1648902): app exit status: 0x1
14:26:47 (1648902): called boinc_finish(195) |
|
|
|
My STDERR says:
INTERNAL ERROR: cannot create temporary directory!
Might that be a permissions problem?
The same problem was treated at Message #55986
A workaround solution was detailed there, maybe you are interested in trying. |
|
|
zooxitSend message
Joined: 4 Jul 21 Posts: 23 Credit: 10,166,191,567 RAC: 58,460,243 Level
Scientific publications
|
Thanks!
So, I tried:
sudo systemctl edit boinc-client.service
and added:
[Service]
PrivateTmp=true
then rebooted
Waiting for tasks now to see if it works... |
|
|
|
All right.
If it was that, You're done.
Now it's time to patiently wait for new Python WUs... |
|
|
Keith Myers Send message
Joined: 13 Dec 17 Posts: 1340 Credit: 7,653,273,724 RAC: 13,371,735 Level
Scientific publications
|
The bug where all tasks always run on Device#0 has been fixed this morning.
Should be smooth sailing from now on for python tasks. |
|
|
|
If still failing, please, double check that your boinc-client.service is similar to this:
After adding the stated lines, it is necessary to save changes with Ctrl + O, confirm with Enter, then exit with Ctrl + X, and then reboot.
(Excuse that the menus are shown in Spanish version :-) |
|
|
zooxitSend message
Joined: 4 Jul 21 Posts: 23 Credit: 10,166,191,567 RAC: 58,460,243 Level
Scientific publications
|
Thanks for help.
Don't know why (and don't know if boinc-client.service is case sensitive) but I mistyped:
PrivateTMP=true
So repaired it to PrivateTmp=true and now am waiting for new tasks.
-----------------------------------------------
My complete boinc-client.service is like this (should I uncomment or add something else?):
### Editing /etc/systemd/system/boinc-client.service.d/override.conf
### Anything between here and the comment below will become the new contents of the file
[Service]
PrivateTmp=true
### Lines below this comment will be discarded
### /lib/systemd/system/boinc-client.service
# [Unit]
# Description=Berkeley Open Infrastructure Network Computing Client
# Documentation=man:boinc(1)
# After=network-online.target
#
# [Service]
# Type=simple
# ProtectHome=true
# ProtectSystem=strict
# ProtectControlGroups=true
# ReadWritePaths=-/var/lib/boinc -/etc/boinc-client
# Nice=10
# User=boinc
# WorkingDirectory=/var/lib/boinc
# ExecStart=/usr/bin/boinc
# ExecStop=/usr/bin/boinccmd --quit
# ExecReload=/usr/bin/boinccmd --read_cc_config
# ExecStopPost=/bin/rm -f lockfile
# IOSchedulingClass=idle
# # The following options prevent setuid root as they imply NoNewPrivileges=true
# # Since Atlas requires setuid root, they break Atlas
# # In order to improve security, if you're not using Atlas,
# # Add these options to the [Service] section of an override file using
# # sudo systemctl edit boinc-client.service
# #NoNewPrivileges=true
# #ProtectKernelModules=true
# #ProtectKernelTunables=true
# #RestrictRealtime=true
# #RestrictAddressFamilies=AF_INET AF_INET6 AF_UNIX
# #RestrictNamespaces=true
# #PrivateUsers=true
# #CapabilityBoundingSet=
# #MemoryDenyWriteExecute=true
# #PrivateTmp=true #Block X11 idle detection
#
# [Install]
# WantedBy=multi-user.target
|
|
|
|
As a general rule of thumb, everything in Linux is case-sensitive.
It should be right just this way.
Eventually coming tasks will confirm it. |
|
|