Message boards : Number crunching : Long runs stopping for no apparent reason
Author | Message |
---|---|
I'm running long runs on this machine and they just cease all activity after 15% or so. Not a computation error, not paused for another higher-priority task, it just stops, but the run time keeps on ticking and it is still listed as running. There is no progress bar change, no GPU activity, no CPU activity. It seems suspending and restarting the task helps temporarily, but that's obviously not a good solution if I have to micro-manage each run like a hundred times. EDIT: Okay, it resumes by itself after 5-10 minutes, only to stop again shortly after. <core_client_version>7.6.33</core_client_version> <![CDATA[ <message> aborted by user </message> <stderr_txt> # GPU [GeForce GTX 1060 6GB] Platform [Windows] Rev [3212] VERSION [80] # SWAN Device 0 : # Name : GeForce GTX 1060 6GB # ECC : Disabled # Global mem : 6144MB # Capability : 6.1 # PCI ID : 0000:01:00.0 # Device clock : 1708MHz # Memory clock : 4004MHz # Memory width : 192bit # Driver version : r384_00 : 38494 # GPU 0 : 48C # GPU 0 : 53C # GPU 0 : 54C # GPU 0 : 56C # GPU 0 : 59C # GPU 0 : 61C # GPU 0 : 63C # GPU 0 : 64C # GPU 0 : 65C # GPU 0 : 66C </stderr_txt> ]]> The event log has this: 20/09/2017 20:12:24 | GPUGRID | Task e33s10_e19s1p0f163-ADRIA_FOLDA3D_crystal_ss_contacts_50_a3D_1-0-1-RND9573_0 exited with zero status but no 'finished' file 20/09/2017 20:12:24 | GPUGRID | If this happens repeatedly you may need to reset the project. I'm pretty sure I already tried resetting it the last time I had this issue (and just switched to other projects for this computer, because it didn't work). Maybe related or identical to this? Progress does however not drop after resuming or never advance beyond a fixed value. It is going forward, just with weird pauses. Is this normal? I don't think I've ever seen it with my other Pascal card. | |
ID: 47900 | Rating: 0 | rate: / Reply Quote | |
It may be that this is related to the thread Problem with Pablo Tasks Specifically from the post that is linked to the end of the thread may provide a solution for you. | |
ID: 47902 | Rating: 0 | rate: / Reply Quote | |
Thanks. I did try this earlier, though, in that I checked whether the task would freeze after being suspended due to CPU use. But It continued normally after everything else resumed. | |
ID: 47903 | Rating: 0 | rate: / Reply Quote | |
mkay, it hasn't happened since and the last task with this issue successfully ran to completion. I now got two tasks running on the same GPU and have an alert set up that tells me whenever the GPU load drops to near zero for longer periods of time. Nothing so far. Perhaps telling BOINC never to suspend tasks automatically really was the solution. Or maybe the multiple WUs. | |
ID: 47908 | Rating: 0 | rate: / Reply Quote | |
Message boards : Number crunching : Long runs stopping for no apparent reason