Author |
Message |
|
Hi there
Maybe somebody has an answer,
I get a lot of failures with this message
MDIO ERROR: cannot open file "restart.coor"
SWAN : FATAL : Failure executing kernel sync [transpose_float2] [999]
Assertion failed: 0, file swanlib_nv.cpp, line 121
This application has requested the Runtime to terminate it in an unusual way.
Please contact the application's support team for more information.
Has somebody an answer ?
I am using the latest grafic card driver.
Hints are wlcome
Peter |
|
|
MarkJ Volunteer moderator Volunteer tester Send message
Joined: 24 Dec 08 Posts: 738 Credit: 200,909,904 RAC: 0 Level
Scientific publications
|
Hi there
Maybe somebody has an answer,
I get a lot of failures with this message
MDIO ERROR: cannot open file "restart.coor"
SWAN : FATAL : Failure executing kernel sync [transpose_float2] [999]
Assertion failed: 0, file swanlib_nv.cpp, line 121
I see most of them are the TONI_KKi4 wu. There is a seperate message thread about them. There was some issues with them and a lot fail immediately when they start - nothing you can do about that.
One thing you need to check is that you have disabled multi-gpu mode via the nvidia control panel. You could also run the cuda memory checker to see if your card has issues, you'll need to test both GPU's of the GTX295.
____________
BOINC blog |
|
|
|
Hi there
Thats a good hint. I have enabled SLI. Will switch it of and see what happened.
Thx for the quick reply.
Will post the result.
Regards
Peter |
|
|
BarryAZSend message
Joined: 16 Apr 09 Posts: 163 Credit: 920,875,294 RAC: 2 Level
Scientific publications
|
I thought I might have tracked down one of the problem sources specific to GPUGrid computation errors.
I set up a new installation:
Windows 7 - 64 bit
Latest Nvidia driver
9800GT video card.
Latest version of the BOINC client
NO other GPU projects.
First 3 units processed fine. 4th unit failed after 8 or 9 hours with a computation error.
I figure I controlled for processing contention with other GPU projects, I controlled for latest driver install, I controlled for most recent released BOINC client. Yet still I encountered a computation error on a GPU Grid work unit -- worse, the error was 8 hours plus in.
I had previously encountered errors running 9800GT cards in Windows XP -- various versions of the nvidia drivers including the current driver, various versions of the BOINC client from 6.10.18 up to the current 6.10.56. In those cases other GPU supporting projects were installed. So I thought perhaps the problem was not (since I controlled for it) BOINC client version or nvidia driver version or OS (I saw this on XP, Vista and Win7). I'd note, on none of these multi GPU project configurations did I see long run computation errors on other projects (Collatz, SETI, Dnetc) -- I'd see some errors with other projects but they were 'efficient' errors (ie within a few minutes of processing).
Based on my sampling, I really suspect that at least a piece of the computation error problem comes from the source of the work units. Either it is because they, by design, run long enough for computational errors to surface (other GPU projects don't run more than a few hours, often less than an hour, on the same GPU, while GPUGrid work units run 20 hours or so), or that there is some other work unit 'weakness' which makes them significantly more subject to 'long run' computation errors than ANY of the other GPU projects currently available in the BOINC world.
I know that the GPUGrid project isn't *intentionally* doing this, and I would like to support the project with some of my available 9800GT processing power, but at this point, it seems rather an inefficient use of GPU processing power for me.
Given the traffic here, I must assume that very few other current users encounter computation errors, as I don't see a LOT of traffic regarding it. Perhaps the existing active base for GPUGrid has different (and faster) GPU's they are working with -- I don't know.
|
|
|
|
First 3 units processed fine. 4th unit failed after 8 or 9 hours with a computation error.
I figure I controlled for processing contention with other GPU projects, I controlled for latest driver install, I controlled for most recent released BOINC client. Yet still I encountered a computation error on a GPU Grid work unit -- worse, the error was 8 hours plus in.
Are you looking at this task list? - seems to match that description, taken from one of your hosts running Windows 7 x64 (host ID: 83792).
You seem to have controlled for everything else, so how about the GPUGrid task sub-type?
I see one IBUCH_pYEEI, and two TONI_KKi4 (all successful), and one KASHIF_HIVPR_n1 - the failure, after several hours.
That exactly matches my experiences with two 9800GT and one 9800GTX+: I can run everything GPUGrid throws at me (from the current mix), except HIVPR_n1. I've reported it, several times, but I don't go banging on about it once the point has been made.
I'm afraid to say the 'abort' button is your friend, but only for that particular task subtype. Let somebody with a Fermi crunch them (they're fine on my GTX470). |
|
|
BarryAZSend message
Joined: 16 Apr 09 Posts: 163 Credit: 920,875,294 RAC: 2 Level
Scientific publications
|
Richard, thanks for the explanation -- I suppose it would be a nice feature at the project configuration level to just say no to specific task subtypes.
The alternative I suppose is to look closely at any new downloaded work units, and abort the instant one of the 'GPUGrid doesn't like 9800's' specific work units drops on a given system.
That's a bit of a bother for me. In any event, for the workstation in my specific example, I've switched from the 9800GT to a HD 4850 -- and (since GPUGrid doesn't do ATI GPU's), switched that workstation to Collatz and MW.
I've posted about this sort of issue here over the months, but what you've suggested is the most responsive reply I've received. I think for me, rather than 'fight the good fight' and try to run (and manually filter) GPUGrid, I'll wait for the day when the project itself is positioned to handle these sorts of things a bit better.
You seem to have controlled for everything else, so how about the GPUGrid task sub-type?
I see one IBUCH_pYEEI, and two TONI_KKi4 (all successful), and one KASHIF_HIVPR_n1 - the failure, after several hours.
That exactly matches my experiences with two 9800GT and one 9800GTX+: I can run everything GPUGrid throws at me (from the current mix), except HIVPR_n1. I've reported it, several times, but I don't go banging on about it once the point has been made.
I'm afraid to say the 'abort' button is your friend, but only for that particular task subtype. Let somebody with a Fermi crunch them (they're fine on my GTX470).
|
|
|
|
Richard, my 9800GT (btw it's oc-ed 10%) works fine on KASHIF_HIVPR_n1, with new app 6.11. See:
Task 3219845
Task 3214029
Do you have Swan_Sync=0 in your system variables?
____________
Windows: "Where do you want to go today?"
Linux: "Where do you want to go tomorrow?"
FreeBSD: "Are you guys coming, or what?" |
|
|
|
Richard, my 9800GT (btw it's oc-ed 10%) works fine on KASHIF_HIVPR_n1, with new app 6.11. See:
Task 3219845
Task 3214029
Well, one of mine has just blown away task 3239533, but since it only thought about it for one second, I won't lose much sleep over it.
Do you have Swan_Sync=0 in your system variables?
No, I prefer to keep my CPUs free for other work. |
|
|
skgivenVolunteer moderator Volunteer tester
Send message
Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level
Scientific publications
|
The Windows only optional variable Swan_Sync=0 is for Fermi's and does not have to be used along with one free core but it usually helps a lot. It will make little or no difference to the performance of a 9800GT. There is little need to leave a CPU core free, unless you have 3 or 4 such cards in the same system, at which point your CPU performance for CPU only tasks will degenerate to the point that you might as well free up a CPU core. On a high end Fermi it is still optional but generally recommended to use both Swan_Sync=0 and to leave a Core/Thread free; the performance difference is quite noticeable. |
|
|
|
Thanks for explanation skgiven, I wonder why GPUGrid uses so much CPU time... Now I understand!
____________
Windows: "Where do you want to go today?"
Linux: "Where do you want to go tomorrow?"
FreeBSD: "Are you guys coming, or what?" |
|
|
|
I have a host that has dual 9800GTXs and I can't get one valid result:
http://www.gpugrid.net/results.php?hostid=84907&offset=0&show_names=0&state=0
What is the suggested best fix for this machine?
____________
|
|
|
MarkJ Volunteer moderator Volunteer tester Send message
Joined: 24 Dec 08 Posts: 738 Credit: 200,909,904 RAC: 0 Level
Scientific publications
|
Downgrade drivers to 197.45 and this will force GPUgrid to give you the 6.12 app (assuming you run windows). Not sure if your cards support SLI or not, but if they do make sure its disabled in the Nvidia control panel.
____________
BOINC blog |
|
|
TJSend message
Joined: 26 Jun 09 Posts: 815 Credit: 1,470,385,294 RAC: 0 Level
Scientific publications
|
Well for a few weeks I am getting no new tasks from yours and was advised to update my drivers. I did and now I read I have to downgrade them.
That's a lot of work.
____________
Greetings from TJ |
|
|
skgivenVolunteer moderator Volunteer tester
Send message
Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level
Scientific publications
|
What driver did you update to? |
|
|