Advanced search

Message boards : Number crunching : What are these "SWAN" errors on my 8-12hr tasks?

Author Message
Profile Tuna Ertemalp
Send message
Joined: 28 Mar 15
Posts: 46
Credit: 1,547,496,701
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 46764 - Posted: 24 Mar 2017 | 5:55:33 UTC

Given the recent dump of erroring tasks, I hadn't paid attention to my tasks, but looking at them now, a good number of them from two dual TITAN X hosts don't fit the recent brokenness. So, I wonder what they are.

The errors are subsets of these (all 8-12hr tasks, too):

https://www.gpugrid.net/results.php?hostid=417426&offset=0&show_names=0&state=5&appid=23
https://www.gpugrid.net/results.php?hostid=304059&offset=0&show_names=0&state=5&appid=23

Mostly the errors with 0.00 runtime.

Error lines are (I noticed three suspect ones):

o SWAN : FATAL Unable to load module .fillcharges.cu. (719)
o Error in swanMemcpyDtoH# SWAN swan_assert 0
o SWAN : FATAL Unable to load module .ntnbrlist.cu. (719)

These hosts had produced valid results recently, so I am wondering if bad tasks left something behind on the hosts, or some other coincidence happened with a driver/setting change...

Good tasks:

https://www.gpugrid.net/results.php?hostid=417426&offset=0&show_names=0&state=3&appid=23
https://www.gpugrid.net/results.php?hostid=304059&offset=0&show_names=0&state=3&appid=23

Thanks for any pointers.
Tuna

____________

Profile Tuna Ertemalp
Send message
Joined: 28 Mar 15
Posts: 46
Credit: 1,547,496,701
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 46768 - Posted: 24 Mar 2017 | 19:25:58 UTC

Further similar errors since last night:

o swanBindToConstant failed -- copy failed# SWAN swan_assert 0
o SWAN : FATAL Unable to load module .fastfill.cu. (719)

Nobody has an idea?

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2343
Credit: 16,201,255,749
RAC: 0
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 46769 - Posted: 24 Mar 2017 | 20:38:38 UTC - in response to Message 46768.

I would try:
1. open an elevated command prompt (right click on start)
2. type chkdsk /f /x <enter>
3. don't touch the keyboard until your PC finish checking c:
If it does not help:
1. download display driver uninstaller
2. download the latest nVidia driver
3. disconnect (disable) the network
4. exit BOINC with stopping scientific applications
5. remove display drivers with DDU
6. restart PC
7. install the latest nVidia driver
8. restart PC
9. try GPUGrid again
If it does not help, you should try to reset the GPUGrid project within BOINC manager

Profile Tuna Ertemalp
Send message
Joined: 28 Mar 15
Posts: 46
Credit: 1,547,496,701
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 46772 - Posted: 24 Mar 2017 | 21:33:25 UTC - in response to Message 46769.

Excellent list! Thanks! I hope this is on some pinned FAQ post that I missed. ;)


Just over the last 10 minutes, I also noticed all Asteroids@Home GPU tasks were failing on three hosts (these two, plus another that was actually having successful long GPUGRID tasks). The commonality? I had turned on OptimizeForCompute on them, and only them out of all my GPU'd hosts, just before the recent error storm here started, so my errors overlapped with errors here. So, for now I will blame that. Changed all three PCs back to OFF, rebooted, asking for new tasks, and we'll see what happens in about 12hrs...

Profile Tuna Ertemalp
Send message
Joined: 28 Mar 15
Posts: 46
Credit: 1,547,496,701
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 46776 - Posted: 25 Mar 2017 | 15:36:51 UTC
Last modified: 25 Mar 2017 | 15:37:42 UTC

Yup. All my GPUGRID, Astroids@Home and PrimeGrid AP27 tasks succeeded over night after I set "Optimize for Compute Performance" back to OFF.

Profile robertmiles
Send message
Joined: 16 Apr 09
Posts: 503
Credit: 751,770,933
RAC: 296,495
Level
Glu
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 46945 - Posted: 16 Apr 2017 | 2:27:59 UTC

It may be relevant that there is a program called swan that converts CUDA source to OpenCL source.

Post to thread

Message boards : Number crunching : What are these "SWAN" errors on my 8-12hr tasks?

//