Author |
Message |
|
This morning I have had a string WU's error out. The screen goes blank and the GPU fans run at max RPM. Here is an output:
Stderr output
<core_client_version>7.16.7</core_client_version>
<![CDATA[
<message>
(unknown error) - exit code 195 (0xc3)</message>
<stderr_txt>
10:32:33 (9976): wrapper (7.9.26016): starting
10:32:33 (9976): wrapper: running acemd3.exe (--boinc input --device 0)
# Engine failed: Error initializing CUDA: CUDA_ERROR_UNKNOWN (999) at C:\Miniconda37-x64\conda-bld\openmm_1562766554928\work\platforms\cuda\src\CudaContext.cpp:148
10:32:36 (9976): acemd3.exe exited; CPU time 0.000000
10:32:36 (9976): app exit status: 0x1
10:32:36 (9976): called boinc_finish(195)
0 bytes in 0 Free Blocks.
440 bytes in 8 Normal Blocks.
1144 bytes in 1 CRT Blocks.
0 bytes in 0 Ignore Blocks.
0 bytes in 0 Client Blocks.
Largest number used: 0 bytes.
Total allocations: 139016 bytes.
Dumping objects ->
{1614} normal block at 0x0000018150068220, 48 bytes long.
Data: <ACEMD_PLUGIN_DIR> 41 43 45 4D 44 5F 50 4C 55 47 49 4E 5F 44 49 52
{1603} normal block at 0x000001815006A1D0, 32 bytes long.
Data: <HOME=D:\ProgramD> 48 4F 4D 45 3D 44 3A 5C 50 72 6F 67 72 61 6D 44
{1592} normal block at 0x000001815006A830, 32 bytes long.
Data: <TMP=D:\ProgramDa> 54 4D 50 3D 44 3A 5C 50 72 6F 67 72 61 6D 44 61
{1581} normal block at 0x000001815006A710, 32 bytes long.
Data: <TEMP=D:\ProgramD> 54 45 4D 50 3D 44 3A 5C 50 72 6F 67 72 61 6D 44
{1570} normal block at 0x000001815006ABF0, 32 bytes long.
Data: <TMPDIR=D:\Progra> 54 4D 50 44 49 52 3D 44 3A 5C 50 72 6F 67 72 61
{1559} normal block at 0x00000181500597A0, 140 bytes long.
Data: <<project_prefere> 3C 70 72 6F 6A 65 63 74 5F 70 72 65 66 65 72 65
..\api\boinc_api.cpp(309) : {1556} normal block at 0x000001815006FEF0, 8 bytes long.
Data: < P > 00 00 03 50 81 01 00 00
{891} normal block at 0x0000018150052C50, 140 bytes long.
Data: <<project_prefere> 3C 70 72 6F 6A 65 63 74 5F 70 72 65 66 65 72 65
{202} normal block at 0x00000181500701C0, 8 bytes long.
Data: < P > 20 13 07 50 81 01 00 00
{196} normal block at 0x0000018150068290, 48 bytes long.
Data: <--boinc input --> 2D 2D 62 6F 69 6E 63 20 69 6E 70 75 74 20 2D 2D
{195} normal block at 0x0000018150070300, 16 bytes long.
Data: <x P > 78 F6 06 50 81 01 00 00 00 00 00 00 00 00 00 00
{194} normal block at 0x000001815006F950, 16 bytes long.
Data: <P P > 50 F6 06 50 81 01 00 00 00 00 00 00 00 00 00 00
{193} normal block at 0x000001815006FEA0, 16 bytes long.
Data: <( P > 28 F6 06 50 81 01 00 00 00 00 00 00 00 00 00 00
{192} normal block at 0x000001815006FE50, 16 bytes long.
Data: < P > 00 F6 06 50 81 01 00 00 00 00 00 00 00 00 00 00
{191} normal block at 0x0000018150066C50, 16 bytes long.
Data: < P > D8 F5 06 50 81 01 00 00 00 00 00 00 00 00 00 00
{190} normal block at 0x0000018150066A70, 16 bytes long.
Data: < P > B0 F5 06 50 81 01 00 00 00 00 00 00 00 00 00 00
{189} normal block at 0x00000181500685A0, 48 bytes long.
Data: <ComSpec=C:\Windo> 43 6F 6D 53 70 65 63 3D 43 3A 5C 57 69 6E 64 6F
{188} normal block at 0x0000018150066520, 16 bytes long.
Data: < G P > A0 47 05 50 81 01 00 00 00 00 00 00 00 00 00 00
{187} normal block at 0x000001815006A4D0, 32 bytes long.
Data: <SystemRoot=C:\Wi> 53 79 73 74 65 6D 52 6F 6F 74 3D 43 3A 5C 57 69
{186} normal block at 0x0000018150066A20, 16 bytes long.
Data: <xG P > 78 47 05 50 81 01 00 00 00 00 00 00 00 00 00 00
{184} normal block at 0x0000018150066200, 16 bytes long.
Data: <PG P > 50 47 05 50 81 01 00 00 00 00 00 00 00 00 00 00
{183} normal block at 0x00000181500669D0, 16 bytes long.
Data: <(G P > 28 47 05 50 81 01 00 00 00 00 00 00 00 00 00 00
{182} normal block at 0x0000018150066CA0, 16 bytes long.
Data: < G P > 00 47 05 50 81 01 00 00 00 00 00 00 00 00 00 00
{181} normal block at 0x0000018150066750, 16 bytes long.
Data: < F P > D8 46 05 50 81 01 00 00 00 00 00 00 00 00 00 00
{180} normal block at 0x0000018150066110, 16 bytes long.
Data: < F P > B0 46 05 50 81 01 00 00 00 00 00 00 00 00 00 00
{179} normal block at 0x00000181500546B0, 280 bytes long.
Data: < a P P > 10 61 06 50 81 01 00 00 F0 AB 06 50 81 01 00 00
{178} normal block at 0x0000018150066CF0, 16 bytes long.
Data: < P > 90 F5 06 50 81 01 00 00 00 00 00 00 00 00 00 00
{177} normal block at 0x00000181500662A0, 16 bytes long.
Data: <h P > 68 F5 06 50 81 01 00 00 00 00 00 00 00 00 00 00
{176} normal block at 0x0000018150066070, 16 bytes long.
Data: <@ P > 40 F5 06 50 81 01 00 00 00 00 00 00 00 00 00 00
{175} normal block at 0x000001815006F540, 496 bytes long.
Data: <p` P acemd3.e> 70 60 06 50 81 01 00 00 61 63 65 6D 64 33 2E 65
{64} normal block at 0x0000018150066930, 16 bytes long.
Data: < > 80 EA 05 0A F6 7F 00 00 00 00 00 00 00 00 00 00
{63} normal block at 0x00000181500664D0, 16 bytes long.
Data: <@ > 40 E9 05 0A F6 7F 00 00 00 00 00 00 00 00 00 00
{62} normal block at 0x00000181500666B0, 16 bytes long.
Data: < W > F8 57 02 0A F6 7F 00 00 00 00 00 00 00 00 00 00
{61} normal block at 0x00000181500660C0, 16 bytes long.
Data: < W > D8 57 02 0A F6 7F 00 00 00 00 00 00 00 00 00 00
{60} normal block at 0x0000018150066B10, 16 bytes long.
Data: <P > 50 04 02 0A F6 7F 00 00 00 00 00 00 00 00 00 00
{59} normal block at 0x0000018150066390, 16 bytes long.
Data: <0 > 30 04 02 0A F6 7F 00 00 00 00 00 00 00 00 00 00
{58} normal block at 0x00000181500665C0, 16 bytes long.
Data: < > E0 02 02 0A F6 7F 00 00 00 00 00 00 00 00 00 00
{57} normal block at 0x0000018150066D90, 16 bytes long.
Data: < > 10 04 02 0A F6 7F 00 00 00 00 00 00 00 00 00 00
{56} normal block at 0x0000018150066700, 16 bytes long.
Data: <p > 70 04 02 0A F6 7F 00 00 00 00 00 00 00 00 00 00
{55} normal block at 0x0000018150066160, 16 bytes long.
Data: < > 18 C0 00 0A F6 7F 00 00 00 00 00 00 00 00 00 00
Object dump complete.
</stderr_txt>
]]>
Any suggestions? |
|
|
Keith Myers Send message
Joined: 13 Dec 17 Posts: 1341 Credit: 7,684,871,308 RAC: 13,111,656 Level
Scientific publications
|
Card has gone wonky in the system. Reboot the system. |
|
|
|
Reboot did not solve the issue. It is crashing on Milkway@Home also. I guess the GPU (RTX 20880Ti) has issues. The driver version is 27.21.14.5206 and it ran fine on that for several weeks.
This GPU is fairly new, it is disappointing if it is failing already. |
|
|
|
CI, did your windows 10 recently update? (mine did)
That seems like it could be a driver/OS glitch. Maybe the next driver update will fix it if others are getting it while gaming.
Meanwhile, try testing the card with GPUPI or another math processing tester.
I would also experiment with re-installing BOINC, just in case the ACEMD file became corrupted. |
|
|
|
I believe there was an OS update a couple of days before the GPU checked out. I did a restore to roll Win10 back to see if it would solve the issue - it did not.
I tested it with both PrimeGrid, Milkyway and then stressed it with OCCT - the GPU failed all three. I replaced the GPU with a Radeon and another Nvidia and they worked fine in the failed GPU's slot.
Unfortunately, I think I have a RTX 2080 Ti paperweight now.
|
|
|
Keith Myers Send message
Joined: 13 Dec 17 Posts: 1341 Credit: 7,684,871,308 RAC: 13,111,656 Level
Scientific publications
|
Reboot did not solve the issue. It is crashing on Milkway@Home also. I guess the GPU (RTX 20880Ti) has issues. The driver version is 27.21.14.5206 and it ran fine on that for several weeks.
This GPU is fairly new, it is disappointing if it is failing already.
I think M$ updated your driver to their version which does not have OpenCL support and possibly normal CUDA support. Your reported driver version is NOT a normal Nvidia driver version.
Remove the M$ driver and download the correct latest driver directly from Nvidia. |
|
|
Erich56Send message
Joined: 1 Jan 15 Posts: 1132 Credit: 10,277,932,676 RAC: 29,168,538 Level
Scientific publications
|
I think M$ updated your driver to their version which does not have OpenCL support and possibly normal CUDA support. Your reported driver version is NOT a normal Nvidia driver version.
Remove the M$ driver and download the correct latest driver directly from Nvidia.
I made this experience some time ago when I crunched Folding@Home (during a lengthy time period with no GPUGRID tasks available).
After I replaced the MS driver with the original driver directly from NVIDIA, everything worked fine.
However: OpenCL is not needed for GPUGRID tasks, is it? |
|
|
|
Reboot did not solve the issue. It is crashing on Milkway@Home also. I guess the GPU (RTX 20880Ti) has issues. The driver version is 27.21.14.5206 and it ran fine on that for several weeks.
This GPU is fairly new, it is disappointing if it is failing already.
I think M$ updated your driver to their version which does not have OpenCL support and possibly normal CUDA support. Your reported driver version is NOT a normal Nvidia driver version.
Remove the M$ driver and download the correct latest driver directly from Nvidia.
I downloaded this Nvidia driver a few weeks ago. The GPU ran for a week or two without issue using this driver. I have another RTX GPU that seems to be ok with this driver. I changed drivers to check, but the GPU was still bad.
I am fairly certain the GPU has failed. |
|
|
|
I had a similar problem once and it turned out the PSU had a couple burnt pins. You can check that along with the power cable(s), but if you used the same connection in the other cards you tested it can probably be ruled out.
____________
Team USA forum | Team USA page
Join us and #crunchforcures. We are now also folding:join team ID 236370! |
|
|