Message boards : Number crunching : Cause of quantum chemistry task failures: md5sum errors
Author | Message |
---|---|
I have a single Ubuntu Linux machine participating in GPUGRID using its CPU. Apart from a few correctly completed QC tasks, by now this machine has produced 28 "compute errors" after just a few seconds of run time each (0 secs CPU time). Checking the error logs yields the following message for all 28 tasks: WARNING: md5sum mismatch of tar archive expected: 75a9f0faa822a01dfe0e0e5c43400ed0 got: dfc9f09eb6b6771c69d6cf10b91bc6c9 - bunzip2: Data integrity error when decompressing. I noticed that WU download (and communication in general) is extremely slow - could it be that this is the cause of byte-hick-ups resulting in non-functional WU archives ending up with checksum errors upon extraction? In effect, this machine is prohibited to download additional tasks for 24 hours making it kind of obsolete to continue to participate in the current GPUGRID team challenge and GPUGRID QC task computation in general. Maybe an upgrade of the GPUGRID server infrastructure would help improve the situation? Michael. ____________ President of Rechenkraft.net - Germany's first and largest distributed computing organization. | |
ID: 51129 | Rating: 0 | rate: / Reply Quote | |
I seem to remember something about those in the past, just can't remember. | |
ID: 51130 | Rating: 0 | rate: / Reply Quote | |
Here is an exemplary stderr log: Name m0000040872_65a1af79_n00050-SDOERR_QMML50_4-0-1-RND5714_1 Arbeitspaket 15707811 Erstellt 27 Dec 2018 | 17:23:04 UTC Gesendet 27 Dec 2018 | 18:30:32 UTC Empfangen 27 Dec 2018 | 18:32:06 UTC Serverstatus Abgeschlossen Resultat Berechnungsfehler Clientstatus Berechnungsfehler Endstatus 195 (0xc3) EXIT_CHILD_FAILED Computer ID 428878 Ablaufdatum 1 Jan 2019 | 18:30:32 UTC Laufzeit 2.72 CPU Zeit 0.00 Prüfungsstatus Ungültig Punkte 0.00 Anwendungsversion Quantum Chemistry v3.31 (mt) Stderr Ausgabe <core_client_version>7.9.3</core_client_version> <![CDATA[ <message> process exited with code 195 (0xc3, -61)</message> <stderr_txt> 19:30:37 (6677): wrapper (7.7.26016): starting 19:30:37 (6677): wrapper (7.7.26016): starting 19:30:37 (6677): wrapper: running /usr/bin/flock (/var/lib/boinc-client/projects/www.gpugrid.net/miniconda.lock -c "/bin/bash ./miniconda-installer.sh -b -u -p /var/lib/boinc-client/projects/www.gpugrid.net/miniconda && /var/lib/boinc-client/projects/www.gpugrid.net/miniconda/bin/conda install -m -y -p qmml3 --override-channels -c defaults -c gpugrid --file requirements.txt ") WARNING: md5sum mismatch of tar archive expected: 75a9f0faa822a01dfe0e0e5c43400ed0 got: dfc9f09eb6b6771c69d6cf10b91bc6c9 - bunzip2: Data integrity error when decompressing. Input file = /var/lib/boinc-client/projects/www.gpugrid.net/miniconda/preconda.tar.bz2, output file = (stdout) It is possible that the compressed file(s) have become corrupted. You can use the -tvv option to test integrity of such files. You can use the `bzip2recover' program to attempt to recover data from undamaged sections of corrupted files. tar: Unexpected EOF in archive tar: Unexpected EOF in archive tar: Error is not recoverable: exiting now 19:30:38 (6677): /usr/bin/flock exited; CPU time 0.185019 19:30:38 (6677): app exit status: 0x1 19:30:38 (6677): called boinc_finish(195) </stderr_txt> ]]> Michael. ____________ President of Rechenkraft.net - Germany's first and largest distributed computing organization. | |
ID: 51132 | Rating: 0 | rate: / Reply Quote | |
44 tasks are now affected... | |
ID: 51135 | Rating: 0 | rate: / Reply Quote | |
Is this issue resolved? | |
ID: 51368 | Rating: 0 | rate: / Reply Quote | |
Not something we can fix from here. Try resetting the project, which should clear local files. | |
ID: 51417 | Rating: 0 | rate: / Reply Quote | |
Message boards : Number crunching : Cause of quantum chemistry task failures: md5sum errors