Advanced search

Message boards : Number crunching : The hardware enthusiast's corner

Author Message
Profile ServicEnginIC
Avatar
Send message
Joined: 24 Sep 10
Posts: 581
Credit: 9,770,362,024
RAC: 21,500,013
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 52925 - Posted: 2 Nov 2019 | 21:44:13 UTC

I think that a place to share computer hardware experiences/issues migth be useful at a number crunching platform like this.
Please, feel free to share here experiences you consider interesting/useful for other colleagues.

Let's classify crunchers generally into two large groups.
One day something fails at hardware level in one of your rigs, and:
-1) You take it to computer shop/workshop to get it serviced by specialized personnel
-2) You open your rig and try to find and solve the problem by yourself

I classify myself into group number 2 (unless failing rig is under warranty)

I'll start by sharing a simple tip.
This will "sound" familiar to many of you:
For any reason, you stop for a while your usually 24/7 crunching rig.
You return, switch it on again, and a loud (awful) noise starts to sound.
You've got it: probably one (or more) fan(s) is (are) in the need to be replaced.

Sometimes it is clear which fan is failing, other times you have to stop fans one by one until noise stops (You have found it!), and other times fan heats and sound stops by itself after some minutes...
In that situation, I'm not happy until I catch failing fan and replace it.
A noisy fan has a loss in cooling performance, and some valuable component(s) may overheat.
And also it breaks something more invaluable: Quietness at Home:-)

Fans could be classified into three large groups, depending in how the rotor shaft is mounted:
-1) Sleeve bearing
-2) Ball bearing
-3) Magnetic/Flux bearing
They are classified in ascending lifespan... and cost.
I recommend not to try saving money in this component.
In the event to choose between type 1 (sleeve bearing) or 2 (ball bearing), I'd recommend type 2. Or better type 3 if available.
While type 1 fans are cheaper, type 2 and 3 usually feature greater reliability and longer lasting.

Sometimes, in fan references there is some clue indicating their type. If not directly indicated, reference may contain some "S" for sleeve, or "B" for ball bearing. It can be seen in this photo.


Replacing a fan is usually a simple task, requiring only a screwdriver, cable ties, and cutting pliers. It does worth the job.
For this operation, always shut down computer and disconnect it from power.
Once disconnected, press briefly computer's "Power" button to release any remaining voltages at PSU.
Also, touch some metallic part from chassis before touching any inside component. This will prevent damages due to electrostatic discharges.
Then, take note/photo of air flux direction in damaged fan, unscrew it and disconnect cable from its socket, replace by new fan taking care to keep original air flux direction, and connect it to fan socket.
Finally, take fan cables away from air path and arrange them conveniently by means of cable ties.

For hardware enthusiasts, I'll finish recommending this PappaLitto's thread:
https://www.gpugrid.net/forum_thread.php?id=5006

Profile ServicEnginIC
Avatar
Send message
Joined: 24 Sep 10
Posts: 581
Credit: 9,770,362,024
RAC: 21,500,013
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 52926 - Posted: 2 Nov 2019 | 21:45:58 UTC

Another tip regarding fans topic, and their frequently associated heatsinks:

It is advisable to maintain them from time to time, to release stacked dust, fibers, and pet's hair.
Not doing this will cause a progressive worsening in heat dissipation, and thus a gradual tempereture rising in whole system.

For this task I have three different tools:
-1) A 20 mm width common painter brush
-2) A hair dye brush. My wife gave it to me, and I quickly found its utility...
-3) A mini vacuum-cleaner


For cleaning heatsink fins, and gaps in between, hair dye brush is the most useful tool I've ever used. I can recommend it.
Painter brush's bristles are not rigid enough to enter gaps between heatsink fins easily. Hair dye brush does it.
You will undestand what I say taking a look to this photo:


However, painter brush is a very convenient tool to clean fan blades, flexible and smooth enough to do the job properly:


If after cleaning the system you start to hear a loud (awful) noise... Oh, oh, please, refer to first post in this thread...

Profile ServicEnginIC
Avatar
Send message
Joined: 24 Sep 10
Posts: 581
Credit: 9,770,362,024
RAC: 21,500,013
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 52937 - Posted: 4 Nov 2019 | 20:51:14 UTC

Regarding heat dissipation, laptops should be considered as an apart subgroup.

Normally laptops are manufactured prioritizing other guidelines than easy-to-maintain one.
Here is a picture of a common laptop hardware distribution:


For space reasons, laptop's heatsink usually is very compact, and it has to dissipate heat coming through heatpipe from CPU and GPU, with the help of forced air impulsed by system fan.
Working laptops get warm, becoming irresistible for any surrounding pet...
And here is what can be found if you manage to dismount the fan:


In this situation, heat is not efficiently dissipated, and the whole laptop may overheat.
GPU and CPU, usually the most power consuming components, become hot spots inside, and in worst case may fail due to overheating.

Some advice if crunching in laptops:
- Try to monitorize temperatures in some way, permanently or at least when laptop's fan becomes louder than usual.
- Set BOINC Manager preference for "Use at most 50 % of the CPUs".
- Please, never lye a working laptop over soft surfaces like blankets, towels, cushions... This will restrict air circulation and cause overheating.
- If there is luck enough for the laptop to survive manufacturer's warranty, consider cleaning heatsink from time to time. But I recognize that in most cases it is an arduous task!

Aurum
Avatar
Send message
Joined: 12 Jul 17
Posts: 401
Credit: 16,755,010,632
RAC: 220,113
Level
Trp
Scientific publications
watwatwat
Message 52938 - Posted: 6 Nov 2019 | 15:42:39 UTC

I use XPOWER to blow the dust off my computers:
https://www.amazon.com/gp/product/B01BI4UQK0/ref=ppx_yo_dt_b_asin_title_o06_s00?ie=UTF8&psc=1
____________

Nick Name
Send message
Joined: 3 Sep 13
Posts: 53
Credit: 1,533,531,731
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwat
Message 52939 - Posted: 6 Nov 2019 | 18:09:47 UTC

Electric leaf blower for the win!

https://www.youtube.com/watch?v=l0ohF6zthOQ
____________
Team USA forum | Team USA page
Join us and #crunchforcures. We are now also folding:join team ID 236370!

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 1340
Credit: 7,653,273,724
RAC: 13,371,735
Level
Tyr
Scientific publications
watwatwatwatwat
Message 52940 - Posted: 6 Nov 2019 | 19:35:07 UTC - in response to Message 52939.

Electric leaf blower for the win!

https://www.youtube.com/watch?v=l0ohF6zthOQ


+1 Ha ha LOL.

Profile ServicEnginIC
Avatar
Send message
Joined: 24 Sep 10
Posts: 581
Credit: 9,770,362,024
RAC: 21,500,013
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 52941 - Posted: 6 Nov 2019 | 21:16:22 UTC

I use XPOWER to blow the dust off my computers:

Nice tool!

Blowing those ways, no doubt, is more efficient to blast dust away than vacuum cleaner!
But some considerations are to be taken in mind:
- It is advisable to immobilize fans blades in some way before blowing. If not, fans may result damaged by overspinning. I broke more than one fan (but less than three) until I realized this...
- Dust inside computer before blowing, is outside all arround after... It is not recommended to do it indoors.
- Be careful of blowing near flat cables, since them are prone to act as boat sails and get damaged.
- Keep theese tools away from children when not in use.
- And please, be careful of using this method near asthmatic persons and armed wives for security reasons ;-)

rod4x4
Send message
Joined: 4 Aug 14
Posts: 266
Credit: 2,219,935,054
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 52942 - Posted: 7 Nov 2019 | 0:04:33 UTC - in response to Message 52941.

A point to consider when using"Blower" devices is Electro Static Discharge.
Ensure the device is NOT made of PVC/PET based plastics. Otherwise you risk the destruction of your electrical equipment through Electro-Static Discharge.
There are a number of ESD safe Blowers available, but tend to be more expensive due to the plastics used.
One example here: https://www.amazon.com/Canless-System-Hurricane-ESD-Safe-Replacement/dp/B00CJHGLFK

Profile ServicEnginIC
Avatar
Send message
Joined: 24 Sep 10
Posts: 581
Credit: 9,770,362,024
RAC: 21,500,013
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 52945 - Posted: 7 Nov 2019 | 12:15:08 UTC

A point to consider when using"Blower" devices is Electro Static Discharge.
Ensure the device is NOT made of PVC/PET based plastics. Otherwise you risk the destruction of your electrical equipment through Electro-Static Discharge.

Thank you very much for this remark.
I take note of it.

Another point to consider in special cases of high humidity environments if using compressed air cans or "zero residue" contact cleaners:
Pressurized containers cause a chilling effect when the content expands.
If components temperature drop below dewpoint, humidity will arise on them due to condensation.
Please, let an extra waiting time after treatment for this humidity completely evaporate.

Profile ServicEnginIC
Avatar
Send message
Joined: 24 Sep 10
Posts: 581
Credit: 9,770,362,024
RAC: 21,500,013
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 52951 - Posted: 8 Nov 2019 | 21:41:17 UTC

Another relatively common situation, mainly in veteran rigs (like mine ones):

You stop your usually 24/7 crunching rig for a preventive hardware maintenance, or it suddenly stops due to a power outage.
When you start the system again, it stops with a message indicating some data corruption - memory checksum failed.

What's happened?
When system is connected to the mains, PSU will permanently deliver a +5VSB voltage, even if system is switched off.
This voltage feeds the volatile memory chip storing BIOS setup, real-time clock, and system start related electronics.
When system is not connected to the mains, there is still a reduced part of the mainboard maintained by a battery, usually a 3V CR2032 button one.
This battery maintains energized the real-time clock and BIOS setup memory chip.
If battery becomes exhausted, real-time clock will stop, and memory data will probably get corrupted.
Solution to this problem is very easy: Replacing exhausted battery by a new one.

To do this I use the following tools:
-1) A flat head mini screwdriver
-2) A fine point CD marking pen


Disconnect system from power and briefly press Start button. This will discharge any voltage remaining at PSU.
Locate backup battery at mainboard and take note/photo of original +/- polarity in its socket.
Unpack new battery and annotate current month/year over it on its "+" side with CD marking pen. Doing this, You will prevent confusing old an new batteries, and it will serve as "maintenance logbook".
Press carefully with screwdriver the old battery's retention latch while maintaining your other hand's thumb over the battery. Battery sometimes may bounce, as bottom terminal acts as a spring.


When battery is lifted, grasp it and take away.
Set new battery on the socket respecting original polarity, and press it until retention latch clicks in place.


Reconnect power, start the system, enter BIOS setup, reconfigure parameters and date/time, and exit saving changes.

Profile ServicEnginIC
Avatar
Send message
Joined: 24 Sep 10
Posts: 581
Credit: 9,770,362,024
RAC: 21,500,013
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 52985 - Posted: 15 Nov 2019 | 19:09:36 UTC
Last modified: 15 Nov 2019 | 19:14:00 UTC

A true case more to share:

Yesterday I left a laptop normally processing a MilkyWay@home CPU task. OS: Ubuntu Linux.
Today, WU was finished but not reported by BOINC Manager.
Entering system configuration, I got a message about it was not installed any WLAN adapter.
But yes, it is installed! (Unless it has gone to a party overnight...)
Definitively it is installed, and I can prove it.



It is the one remarked in red in above image...
This is a typical case of bad electric contacts between a component and its socket.
Now my laptop is normally working again, the stalled task was reported, and a new one was downloaded and is in process.

What was the solution?
I extracted card from its socket, treated contacts, reinserted it, and it returned to work.
For this, I used a technical drawing's rubber for pencil and small brush.
I've gotten a pencil rubber not too soft, not too hard in a thecnical drawing shop.
This rubber has an ink side too, but it is not advisable to use it in delicate contacts, as them might became scratched.
Taking a careful look to following image you would be able to distinguish between narrower side of contacts (already treated) and broader side (not treated yet):
http://www.servicenginic.com/Boinc/GPUGrid/Forum/WLAN_02.JPG

Other typical problems caused by bad electrical contacts:

Symptom: Component not being recognized (as present case was)
Remedy: Extract component, clean contacts and reinsert.

Symptom: A GPU normally working fine starts to fail tasks intermitently with no apparent reason.
Remedy: Extract GPU and memory DIMMs, clean contacts and reinsert.

Symptom: There is no image at system start, or starting even stops with some combination of loud beeps.
Remedy: Extract GPU and memory DIMMs, clean contacts and reinsert.

Symptom: System intermitently stops with blue screens, or restarts with no apparent reason. It may be caused by some intermitent RAM memory corruption.
Remedy: Extract memory DIMMs, clean contacts and reinsert.

PLEASE, PAY ATTENTION

- All those components are electrostatic sensitive. Lay them over an antistatic surface. Those silver shining bags containing mainboards or graphics cards are suitable for this.
- Allways switch off computer, disconnect from mains, and ensure no residual voltages are present (I.E. some led is still lit in mainboard). Usually, you can prevent this by pressing computer's start button briefly once it is disconnected (I.E. all leds will immediately extinguish)
- Touch some metallic part of computer's case to discharge you from static charges before touching any inside component.

rod4x4
Send message
Joined: 4 Aug 14
Posts: 266
Credit: 2,219,935,054
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 52987 - Posted: 16 Nov 2019 | 2:23:22 UTC - in response to Message 52985.

Fantastic, detail description of troubleshooting and repairing. I like the idea of the pencil rubber to clean the contacts.

- Allways switch off computer, disconnect from mains, and ensure no residual voltages are present

Touch some metallic part of computer's case to discharge you from static charges before touching any inside component.

The only thing different I would do (assuming building power and cabling integrity is in good order):
Turn off power at mains and leave power cord plugged in the wall. The earth connector on the power lead will allow for static discharge to be earthed to the building earth when you touch the case. Touching the case to discharge static electricity will not work as well without the plug in the wall.
I have been Electrocuted by mains power when working on computer internals. The computer power supply was faulty and passing 240v out the 12v connectors (was investigating why the machine did not start). The building RCD saved me that day. Never make assumptions when working with power and always err on the side of safety.

Profile ServicEnginIC
Avatar
Send message
Joined: 24 Sep 10
Posts: 581
Credit: 9,770,362,024
RAC: 21,500,013
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 52994 - Posted: 17 Nov 2019 | 16:55:57 UTC - in response to Message 52987.

The only thing different I would do (assuming building power and cabling integrity is in good order):
Turn off power at mains and leave power cord plugged in the wall. The earth connector on the power lead will allow for static discharge to be earthed to the building earth when you touch the case. Touching the case to discharge static electricity will not work as well without the plug in the wall.

You're right again in your aproach, rod4x4.
Touching isolated chassis will equilibrate potentials between operator and computer ground, but not necessarily with surrounding environment.
Many PSUs have a power switch.
This switch (if working properly) disconnects Live and Neutral electric terminals (the ones bringing power), but Earth (protective) terminal continuity is maintained.
Leaving power cord connected and PSU switched off increases sucurity of electronic components against electrostatic discharges.
In the other hand, it increases risk for operators... as you have experienced by yourself. Life is a balance.

A lower risk (*) home-made approach could consist of using an specially constructed power cable with Earth terminal only connected. I've marked mine with EO! (Earth Only!)
It would look as follows:





(*) Note:
Never believe "zero risk" solutions. They don't exist.
Some reasons that could cause this last approach to fail:
- Power socket's Earth terminal connection defective
- Building's Earth installation itself defective
- Confusing tricked power cord with a regular one (Mark it clearly, please!)
- Thunderstorm passing by
- Any combination of Murphy's Laws taken from N in N...

jjch
Send message
Joined: 10 Nov 13
Posts: 101
Credit: 15,569,300,388
RAC: 3,786,488
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 53003 - Posted: 20 Nov 2019 | 1:23:37 UTC
Last modified: 20 Nov 2019 | 1:25:34 UTC

I would recommend using an Anti-Static ESD grounding mat kit with a wrist strap and the proper grounding connection to the wall outlet.

These are much better and safer for your equipment and yourself. It will also protect your work surface from damage.

Here is a link to a fairly inexpensive one for the UK or EU. It doesn't have a full metal wrist strap however it would be adequate for most uses.

https://pcvalet.co.uk/Buy/Anti_Static-ESD-Grounding-Earth-Mat-Kit%2C-500-x-400mm-%2F-UK-or-EU

Profile ServicEnginIC
Avatar
Send message
Joined: 24 Sep 10
Posts: 581
Credit: 9,770,362,024
RAC: 21,500,013
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 53007 - Posted: 20 Nov 2019 | 19:20:46 UTC - in response to Message 53003.

I would recommend using an Anti-Static ESD grounding mat kit with a wrist strap and the proper grounding connection to the wall outlet.

Thank you for your kind advice.
I find the solution proposed excelent for working at workshop.
I also would recommend it.
It combines maximum security for both electronics and operators.
But I personally find it somehow uncomfortable for working in the field.
So one ends up developing some (not so advisable) alternative strategies...

jjch
Send message
Joined: 10 Nov 13
Posts: 101
Credit: 15,569,300,388
RAC: 3,786,488
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 53008 - Posted: 21 Nov 2019 | 0:43:32 UTC

I have a portable ESD field service kit. The mat folds and fits into a pouch along with the other items. The kit also fits into my laptop bag.

It is similar to this one.
https://www.tequipment.net/Prostat/PPK-646/General-Accessories/

This one is a bit more expensive so you may need to shop around for a better price.

Profile ServicEnginIC
Avatar
Send message
Joined: 24 Sep 10
Posts: 581
Credit: 9,770,362,024
RAC: 21,500,013
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 53169 - Posted: 27 Nov 2019 | 22:08:50 UTC

Choosing the right Power Supply Unit (PSU).
Or how a few minutes of thinking can save a lot of money and wasted energy.

Hypothetically: I am designing a self made computer, and I choose the most powerful PSU I can afford. Right?... Not necessarily!
I must choose the most efficient PSU I can afford, according to computer global charasteristics.
This will save money in my electric bill and heat emitted to environment. (Please, revert terms, if you prefer)

Let's imagine a practical example:
One system is consuming 800 effective Watts, but it is demanding 1000 Watts from AC power outlet.
This 20% of lacking is wasted power, and therefore this particular PSU is being 80% efficient.
Whatever we can do to increase this efficciency, is well done.

Two main considerations:
-1) Try to choose an 80% or better efficiency certified PSU. You'll soon amortize the extra inversion in your power bill.
-2) Try to accomodate the PSU maximum rated power to about double the one demanded by your particular system.

Explanation:

-1) Please, take a look to following table:

It is taken from Cooler Master PSUs manufacturer webpage.
Picture shows an increasing efficiency PSU classification.
The more efficient, the more fine tuned and quality must be electronics, larger copper sections, optimized Power Factor Correction (PFC) circuitry... and usually it means a more expensive final product.
But as already said, it is going to self-amortize with time.

-2) And taken from the same manufacturer webpage (*), a typical efficiency curve representation.

Typically, a PSU is peaking its maximum efficiency when it is delivering about half its maximum rated power.
It can be seen by watching the Efficiency-Load curve.
But yes, you have deduced right: Efficiency decays drastically at low loadings.
In few words: If I select a 1500 Watts PSU to feed a 150 Watts low power consumption system, in some way I'll be wasting my inversion!
At following link is featured an application form to calculate maximum power demanded by a system.
https://www.coolermaster.com/power-supply-calculator/
This is calculated for all components consuming their maximum power concurrently. This is not an usual situation (unless you are processing ACEMD3:-)
For example: For a calculated maximum power of 400 Watts, I would choose a 750 Watts PSU. This would make PSU working at its maximum efficiency, and even some margin would left for future hardware upgrades.

(*) https://www.coolermaster.com/catalog/power-supplies/v-series/v850-gold/#overview

Profile ServicEnginIC
Avatar
Send message
Joined: 24 Sep 10
Posts: 581
Credit: 9,770,362,024
RAC: 21,500,013
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 53354 - Posted: 15 Dec 2019 | 22:57:31 UTC

A totally or partially defective PSU may cause a wide variety of problems at affected computer:
- System stopped and not responding at all to 'Power on' button
- System starting and switching off in few seconds in an endless loop
- System intermitently switching off or rebooting itself
- Applications failing intermitently, specially high power demanding ones
- System overheating if PSU fan fails
- Motherboard or peripheral components breakage in severe cases

PSU delivers several voltages to electrically feed computer's motherboard and peripherals.

Taken from the manual of a Gigabyte motherboard, voltages and signals coming from PSU are as follows:


Taken from an Aerocool modular PSU, Connectors functions and shapes are as follows:




Given a certain PSU, every voltage has its associated maximum rated current and power.
Taken from the same Aerocool PSU, a typical specifications label is as follows:


Replacing a PSU is more laborious than difficult.
The steps:
-1) Switch computer off, and disconnect PSU from power outlet.
-2) Wait some minutes for PSU capacitors to discharge, or force it by pressing briefly computer's 'Power on' button.
-3) Take note/photo of all the previous PSU connections: Motherboard, CPU, Hard disk(s), Optical unit(s), Graphics card(s), Peripherals, cooling devices...
-4) Disconnect all those connectors. For some of them it will be necessary to press some fixing latch while pulling up.
-5) Check all cables coming from PSU to be free. Cut carefully fixing ties if exist.
-6) Unscrew PSU fixing screws, normally four of them placed in the same PSU's face than AC power connector.
-7) Extract replaced PSU from its position, being careful for not to hit other computer components. For this, sometimes it will be necessary to extract other components as CPU cooler, optical unit, graphics card... if they block old PSU extraction and/or the insertion of the new PSU.
-8) Insert new PSU in place, and fix it with its screws. For this, see final tip (*)
-9) Reconnect all connectors previously listed in step (3). Be careful to double check full insertion for all connectors. Some of them must be pushed until fixing latch clicks in final position. Specially motherboard's 20+4 pins connector may be hard to insert. If possible, try to support at motherboard's back while pushing, for not to overstress it.
-10) Arrange all cables by means of cable ties to convert the original cable mess into a more air-flux-friendly combination.
-11) Test the system for a correct start up. For this first time, it is advisable to enter BIOS setup and check in Motherboard's System Monitor for all voltages being into specifications.
-12) Replace covers (if any), reinstall system at its usual placement, and that's all.

(*) Tip:
GND rail delivers return for the addition of all voltage rails currents.
PSU's GND rail is directly connected to its metallic chassis and to electrical protective earth terminal.
Also Motherboard's GND is connected to all fixing holes on it, and from them to computer's chassis.
The better electrical contact between PSU and computer's chassis, the less the voltage drop will be due to current circulation.
For example, a contact resistance as low as 0,01Ω causes a voltage drop of 0,4V at a circulating current of 40A.
40A is about the current demanded to +12V rail by two GTX 1080 Ti or RTX 2080 Ti cards.
If contact resistance is increased, also increases the chance to cause problems: overheating, burnt contacts, Intermitent computer errors, intermitent system rebooting...
Something as simple as selecting the best PSU's fixing screws, may help to decrease electrical resistance between PSU and computer chassis, and help a bit to conduct returning current.
The best ones are the stepped backhead screws shown at leftmost of following picture. Varnished (rightmost) screws are probably more beautiful, but varnish may worsen conductivity.

rod4x4
Send message
Joined: 4 Aug 14
Posts: 266
Credit: 2,219,935,054
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 53359 - Posted: 15 Dec 2019 | 23:54:35 UTC - in response to Message 53354.

Great post!
Nice tip about the fixing screws, now I have to go and check all my fixing screws...

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2343
Credit: 16,206,655,749
RAC: 261,147
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 53360 - Posted: 16 Dec 2019 | 10:31:50 UTC

When replacing modular PSUs:
Keep in mind that the PSU end of the modular connectors are *NOT* standardized!
You have to carefully compare the PSU end pin-out of the old and the new PSU (especially when they came from different manufacturers), and if there's any difference between them then you must remove the old cables also (not just the PSU). It is advised to remove the old cables anyway, as their contact resistance grows over time due to corrosion.
The branding of the PSU and the actual manufacturer of the product is not the same thing. So you can buy the same brand and the PSU end pin-out still could be different.
A good source of info about PSUs:
https://www.jonnyguru.com/

Profile ServicEnginIC
Avatar
Send message
Joined: 24 Sep 10
Posts: 581
Credit: 9,770,362,024
RAC: 21,500,013
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 53365 - Posted: 16 Dec 2019 | 14:47:01 UTC - in response to Message 53360.

When replacing modular PSUs:
Keep in mind that the PSU end of the modular connectors are *NOT* standardized!

Good punctualization. Thank you very much again!
If anybody thinking to replace a modular PSU and keep the old cables, Please, forget it.
Always retire old cables and install the cable set coming with the new PSU.

A good source of info about PSUs:
https://www.jonnyguru.com/

I also take note of this interesting link.

Profile ServicEnginIC
Avatar
Send message
Joined: 24 Sep 10
Posts: 581
Credit: 9,770,362,024
RAC: 21,500,013
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 53401 - Posted: 27 Dec 2019 | 22:13:02 UTC

There are specific tools to easier diagnose PSUs, as the one shown below.
PSU Tester
To on-bench diagnose, it is enough to connect the 24 pin connector, one 4 or 6 or 8 pin connector, and one SATA connector.
When switching PSU on, tester will automatically start it, and show the different voltages.
Also delay between +12V on and 'Power Good' signal is shown.
Abnormal voltages are those outside -5%...+5% for +12V, +5V, +3.3V and +5VSB.
The -12V is more tolerant, and is considered normal in the range -10%...+10%
Normal values for PG delay are in the range 200...500 ms

- If PSU Doesn't start when switching on and correctly connected, probably it is dead and in the need to be replaced.
- If any voltage is missing or outside tolerance, PSU needs to be replaced.

VERY IMPORTANT
If on-system PSU testing is made, it will be necessary to disconnect ALL PSU connectors from motherboard and peripherals. Please, double check this!
As soon as this kind of test will bring power to every connectors, damage to system may occur if any of them is left connected.

Althoug a faulty parameter indicates the need to replace the PSU, a passing PSU may be still defective.
The reason: This kind of tests are made at very low currents compared to a system running at maximum performance.
Running at higher currents may cause a parameter to fall ouside tolerance.
Also some problems are caused by failing components when heating.
And some intermitent problems might not be catched in such a brief test...

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2343
Credit: 16,206,655,749
RAC: 261,147
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 53412 - Posted: 31 Dec 2019 | 15:15:55 UTC
Last modified: 31 Dec 2019 | 15:31:17 UTC

I have recently read about a special thermal "grease".
It is actually a special metal compound which is liquid at room temperature like mercury (but it's not made of mercury).
Its thermal conductivity is 6-10(!) times higher than of a usual thermal grease (73W/mK vs 8.5W/mK).
See the manufacturer's page for reference:
Thermal Grizzly Conductonaut
I haven't tried it earlier, as it has a serious negative side: this metal compound (like most of the metals) conducts electricity as well as heat.
So it must not get out of the chip's surface as it will result in a short circuit which breaks the GPU or the CPU (or the MB) for good.
It should not be applied to aluminum heatsinks.
But if you are experienced, have the time and patience to thoroughly clean the surface of the heatsink and the chip (or the IHS) with alcohol (I had to even use polish paper on one of my heatsinks), and carefully apply as less of this thermal compound as possible (to avoid the excess metal go where it shouldn't) then you can have a try with this product.
The result will worth the time and effort:
On my single GPU systems I experience
· 10-15°C (!) lower temperatures at the same fan speed.
· 30-50% (!) lower fan speed at the same temperatures.
The better the heatsink, the better the result.
You can choose how to balance between these by your fan curve.
The factory settings will result in slightly lower (3-5°C) temperatures at a much lower fan speed.
Of course, it can be used to achieve better overclocking as well.
Most of the Intel CPUs have normal thermal grease between the CPU chip and the IHS, so it must be replaced with this product to achieve better results. As this grease is under the IHS (the metal covering the chip to protect it), first you have to remove the IHS, which is glued to the PCB (the green area) which carries the chip. There are professional tools to do this without damaging the CPU. Hopefully I'll have mine next week, and I'll report the results then.

Zalster
Avatar
Send message
Joined: 26 Feb 14
Posts: 211
Credit: 4,496,324,562
RAC: 0
Level
Arg
Scientific publications
watwatwatwatwatwatwatwat
Message 53414 - Posted: 31 Dec 2019 | 16:43:13 UTC - in response to Message 53412.

Been using it for about a year now Zoltan for some of my bigger machines with higher thread counts. Was recommended by one of my teammates and never thought about it again. Good information.
____________

Profile ServicEnginIC
Avatar
Send message
Joined: 24 Sep 10
Posts: 581
Credit: 9,770,362,024
RAC: 21,500,013
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 53428 - Posted: 1 Jan 2020 | 18:02:01 UTC - in response to Message 53412.

See the manufacturer's page for reference:
Thermal Grizzly Conductonaut

Thank you for your post, Retvari Zoltan.
Thermal conductivity specifications for Conductonaut are really impressive.
In the other hand, as you remark, it is contraindicated in live circuits due to its electrical conductivity.
Following image comes from a true thermal paste replacing operation in one of my graphics cards:

As seen in the image, GPU chip core usually is surrounded by capacitors (here remarked by red ellipses). That white compound between many of them is the reamining non-conductive factory thermal paste.
Those capacitors would be shortcircuited if in contact with an electrically-conductive compound...

Been using it for about a year now

But used with due precautions where indicated, it's worth it. Thank you for your feedback, Zalster.

I've navigated Thermal Grizzly products, and they have specific solutions for every use.
I've got very well impressed by Kryonaut for general purpose applications.

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2343
Credit: 16,206,655,749
RAC: 261,147
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 53429 - Posted: 1 Jan 2020 | 18:58:21 UTC - in response to Message 53428.
Last modified: 1 Jan 2020 | 18:59:12 UTC

Thermal conductivity specifications for Conductonaut are really impressive.
In the other hand, as you remark, it is contraindicated in live circuits due to its electrical conductivity.
Following image comes from a true thermal paste replacing operation in one of my graphics cards:

As seen in the image, GPU chip core usually is surrounded by capacitors (here remarked by red ellipses). That white compound between many of them is the reamining non-conductive factory thermal paste.
Those capacitors would be shortcircuited if in contact with an electrically-conductive compound...
I know. I was afraid of it too. But:
The chip is much thicker than the conductors are.
This metal compound acts like a fluid, while thermal grease acts like a grease. While it sounds more dangerous, you can apply it more precisely than a grease (you have to do it more carefully though). Conductonaut acts exactly like tin-lead alloy solder on a copper surface. If you ever soldered something to a large copper area of a PCB, you know how it works: the solder bonds to the copper surface, so if you don't use too much of it, it will stay there even if you try to shake it off.
Liquid metal has very high surface tension, which holds it together (think of mercury).
Because it can be (and it should be) applied in a very thin layer to the whole area of the chip, and to the chip's area on the heatsink, there won't be much excess material pushed out on the sides.
The other reason for applying as less as possible that it's quite expensive.
But if you want to be extra safe you can cover the capacitors with nail polish (or similar non conductive material) to protect them.

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2343
Credit: 16,206,655,749
RAC: 261,147
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 53430 - Posted: 1 Jan 2020 | 20:46:28 UTC - in response to Message 53429.

Another bit of advice for applying Conductonaut:
It is better to apply it on the heatsink to a slightly larger area than the chip, because in this way the excess material will stick to the heatsink.
This compound won't make the copper (silicon, nickel) surface wet by itself, you have to carefully rub it on both surfaces. This way it can be made really thin. You have to start with a very small (pinhead sized) droplet, and rub it on the desired area. If it turns out to be insufficient, you can add a little more (it won't make a droplet if you add it directly to the existing layer, it will add to the thickness of the layer instead).

Profile ServicEnginIC
Avatar
Send message
Joined: 24 Sep 10
Posts: 581
Credit: 9,770,362,024
RAC: 21,500,013
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 53431 - Posted: 2 Jan 2020 | 7:15:03 UTC - in response to Message 53430.

Now I understand the mechanics for this product.
I appreciate your masterclass very much.
It is always a good time to learn something more!

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2343
Credit: 16,206,655,749
RAC: 261,147
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 53443 - Posted: 10 Jan 2020 | 0:04:35 UTC - in response to Message 53412.
Last modified: 10 Jan 2020 | 0:13:44 UTC

On my single GPU systems I experience
· 10-15°C (!) lower temperatures at the same fan speed.
· 30-50% (!) lower fan speed at the same temperatures.
The better the heatsink, the better the result.
It's a little more complex than that:
If you have a decent heatsink on your GPU (the GPU temperature is around 70°C), and the thermal paste is thin also it's in good condition, then the temperature decrease will be "only" around 5°C.
Smaller chips with decent heatsink (for example GTX 1060 6G) will have only around 5°C decrease.
Regardless of its material, good thermal paste is a thin thermal paste, so this way it can make a very thin layer between the chip and the cooler.
You can check if your GPU needs a better thermal paste by watching its temperature when the GPUGrid client starts (on a cool GPU). If there's a sudden increase in the GPU temperature, then the thermal paste/grease is too thick, and/or it has become solid. The larger this sudden increase, the larger the benefit of changing the thermal interface material.
Another experience I had after I changed the thermal paste (it's better to call it thermal concrete) to Conductonaut on my Gigabyte Aorus GTX 1080Ti 11G is that now it makes sense to raise the RPM of the cooling fans even to 100%:
55% 1583rpm: 69°C (original fan curve) 60% 1728rpm: 65°C 70% 2016rpm: 60°C 80% 2304rpm: 56°C 90% 2592rpm: 53°C 100% 2880rpm: 50°C
The noise of the fans is tolerable on 70%.

Another idea on how to clean copper heatsinks: It's better to use scouring powder (with a little piece of wet paper towel) than polishing paper, because this way the tiny copper grains will stay on the heatsink (also it's easy to clean them off). The surface will be smoother (depending on the grit size of the polishing paper and the scouring powder).

Profile ServicEnginIC
Avatar
Send message
Joined: 24 Sep 10
Posts: 581
Credit: 9,770,362,024
RAC: 21,500,013
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 53444 - Posted: 10 Jan 2020 | 16:16:35 UTC - in response to Message 53443.

I've ordered one 1g Conductonaut syringe, and I'm expecting to receive it in less than one month.
I'm curious to test it and report results.

Profile ServicEnginIC
Avatar
Send message
Joined: 24 Sep 10
Posts: 581
Credit: 9,770,362,024
RAC: 21,500,013
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 53508 - Posted: 26 Jan 2020 | 20:10:22 UTC

Well, I've received my Conductonaut thermal compound kit this week, and I've already tested.
First of all, my special thanks to Retvari Zoltan, for bringing here this interesting topic.
I'm sharing my very first experiences with it, as I promised.

For this, I've selected this system.
Currently it has two graphics cards installed, one GTX 1660 Ti, running preferably GPUGrid among several other GPU projects, and one GTX 750 excluded from GPUGrid, for delivering video and running all other projects.
This is a BOINC Manager screenshot at the moment of starting preliminary tests.
In that situation, this is a Psensor screenshot for system state.
GTX 750 temperature was 53ºC while running a MilkyWay task, and GTX 1650 Ti temperature was 73ºC while running a GPUGrid task.
Then, I suspended activity in BOINC Manager, and GTX 1650 Ti temperature dropped 47ºC, from 73ºC to 26ºC, in about 8 minutes. Too big and fast temperature fall for me to feel comfortable.
I restarted activity again, and a subsequent temperature raise can be seen from 26ºC to 71ºC.
Such sudden temperature changes suppose a mechanical overstress to GPU due to cantractions and dilations. This might reduce its life expectancy!
So I chose GTX 1650 Ti to test Conductonaut.
This is the same card that was object for this other thread.

I started by dismounting cooler's fans and cleaning them.
Then, I dismounted GPU cooler by unscrewing its 4 spring-loaded mounting screws, and I cleaned it also.
Next step was cleaning thorougly GPU chip, as can be seen in "before" and "after" images.
I managed to carefully remove all thermal paste remains between capacitors by means of wooden toothpicks, and finally cleaned all chip surfaces with isopropanol-dampened cotton swabs.

Now, it's time for Conductonaut release...
Starting with a small drop at the center of cooler's copper core, then gradually spreading it until covering all copper surface.
I payed special attention for not to reach aluminium zones, as Conductonaut is expresely contraindicated for them.
For spreading, synthetic-tissue headed swabs are supplied in Conductonaut kit.
And they work for this purpose better than I expected.
Time now for silicon surface of GPU chip, following the same procedure.

Once surfaces were treated with thermal compound, graphics card was rebuilt in reverse order than dismounted.
Starting with GPU cooler, paying special attention for tight coincidence between Printed Circuit Board holes and cooler's ones.
The best way for me: laying cooler on the table with threaded mounting holes facing up, and then slowly lowering PCB while looking through its holes to a final perfect match.
Then, I remounted the 4 original spring-loaded screws by gradually tightening them, following a cross-pattern iterative sequence.
It is hard to photograph with a domestic camera, but if the proper amount of thermal fluid is used, GPU capacitors are safe apart from contacting it.

GTX 1650 Ti remounted in place, and... did it worth the job?
Comparing the new temperatures with the original ones while running a GPUGrid task, a reduction from 71ºC to 60ºC can be appreciated.
Definitively, it did worth the job.
And I've enjoyed the moments in between!-)

Profile ServicEnginIC
Avatar
Send message
Joined: 24 Sep 10
Posts: 581
Credit: 9,770,362,024
RAC: 21,500,013
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 53511 - Posted: 27 Jan 2020 | 7:03:40 UTC - in response to Message 53508.

Let's make a small correction to my own previous post: Every times a "GTX 1650 Ti" graphics card is mentioned, it should be GTX 1660 Ti.
"GTX 1650 Ti" is a quimeric mix between GTX 1660 Ti and GTX 1650 models.
I have a couple of GTX 1650 cards also, but I have no such temperature problems due to their lower power consumtion (75 Watts) compared to GTX 1660 Ti (120 Watts)

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2343
Credit: 16,206,655,749
RAC: 261,147
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 53514 - Posted: 27 Jan 2020 | 11:22:07 UTC - in response to Message 53508.
Last modified: 27 Jan 2020 | 11:23:09 UTC

I've received my CPU IHS remover kit. It's a Rockit-88 kit.
It makes the IHS removal to be the fun part of the whole process.
I've changed the original thermal paste to TGC so far in these CPUs:
i5-8500
i5-7500
i3-7300
i7-4790k
i5-750

I've also changed the thermal paste between the IHS and the cooler to TGC.
The i5-8500 and the i5-7500 runs 11°C lower than before.

It was very hard to clean the i7-4790k chip, the TGC didn't spread on its surface very well. I think I have to do it again.

The i5-750 is a 10 year old CPU, it runs at 78-79°C with the standard Intel (copper core) cooler, while all 4 cores are fully loaded, and the PC is in a micro ATX case (side panel is on). I'll change this cooler to a Noctua D9L, as larger coolers won't fit in this case.

Profile ServicEnginIC
Avatar
Send message
Joined: 24 Sep 10
Posts: 581
Credit: 9,770,362,024
RAC: 21,500,013
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 53634 - Posted: 9 Feb 2020 | 21:56:48 UTC
Last modified: 9 Feb 2020 | 21:59:22 UTC

Symptom: PSU's Main switch turned from Off to On, an sparking sound is heard, and overcurrent protection at electrical panel goes down (leaving 4 computers without power, by the way).
Cause: Short circuit in PSU's driver circuit component at HVDC to DC converting stage.
Solution: Replace defective PSU. (I've won in some way by exchanging the old 80+ efficiency PSU by a new modular 85+ one)
Guilty component: It can be seen at this forensics image, marked as IC2 at center of picture.
I like this kind of problems!: Clear diagnose and simple solution

Relative tip: Most PSUs have at AC rectifying stage an NTC (Negative Temperature Coefficient) resistor (thermistor) for limiting switch on current (it can be seen, out of focus at left of above image, lentil-shaped green component).
When this NTC is cold, its base resistance is in series with rectifying circuitry, thus limiting the otherwise high initial charging current for HV capacitor(s).
As the current circulates, the NTC is heating and decreasing its resistance to a near-zero value.
If PSU is switched off and then on in a rapid sequence, there is no time for this NTC to get cold, and its intended current limiting effect is decreased, thus causing a potentially nocive high current peak.
For this reason, it is advisable to wait at least (let's say) 10 seconds from switching off to switching on again the PSU, for this component has enough time to cold.

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2343
Credit: 16,206,655,749
RAC: 261,147
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 53636 - Posted: 10 Feb 2020 | 19:26:40 UTC - in response to Message 53634.
Last modified: 10 Feb 2020 | 19:32:31 UTC

Relative tip: Most PSUs have at AC rectifying stage an NTC (Negative Temperature Coefficient) resistor (thermistor) for limiting switch on current ...
When this NTC is cold, its base resistance is in series with rectifying circuitry, thus limiting the otherwise high initial charging current for HV capacitor(s).
As the current circulates, the NTC is heating and decreasing its resistance to a near-zero value.
If the PSU is switched off and then on in a rapid sequence, there is no time for this NTC to get cold, and its intended current limiting effect is decreased, thus causing a potentially nocive high current peak.
If the PSU is switched off and then on in a rapid sequence, the capacitors in the primary circuit don't have time to discharge, so the inrush current will be low (=no need for the NTC to cool down).
Unless those capacitors are broken (= lost the most of their capacity - the visual sign of it is a bump on their top and/or a brownish grunge on the PCB around them / on their top), but in this case it's better to replace the PSU.
BTW LED bulbs, other LED lighting, or other switching mode PSUs (flat TVs, set top boxes, gaming consoles, laptops, chargers, printers, etc) also could have larger inrush current (altogether), especially when you arrive at the site after an extended power outage, and all of their capacitors in their primary circuits has been discharged. It's recommended to physically switch all of them off (including PCs), or unplug those without a physical power switch before you switch on the power breaker. After the power breaker is successfully switched on (I have to do it twice in a rapid succession as there are some equipment in our home which have fixed connections to the mains), the PSUs can be switched on one by one, and then the other equipment one by one.
For this reason, it is advisable to wait at least (let's say) 10 seconds from switching off to switching on again the PSU, for this component has enough time to cold.
I don't think that 10 seconds is enough for the NTC to cool down. It depends on the position of the PSU: If it's at the top, a significant part of the heat from the PC will get there, therefore 10 seconds is way to short time for all the PC to cool down (10-20 minutes are more likely adequate). If the PSU is at the bottom, less time could be sufficient, especially if the fan of the PSU keeps on spinning for a minute after the PC is turned off.
But if the primary capacitors are in good condition, it is unnecessary to wait to reduce the inrush current.

Profile ServicEnginIC
Avatar
Send message
Joined: 24 Sep 10
Posts: 581
Credit: 9,770,362,024
RAC: 21,500,013
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 53637 - Posted: 10 Feb 2020 | 22:19:07 UTC
Last modified: 10 Feb 2020 | 22:55:13 UTC

An interesting article to deepen about thermistors.
There is an specific section about Inrush Current Limiting Thermistor
Moreover, in most of motherboards, temperatures indicated by hardware monitor are based on thermistor sensors.

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2343
Credit: 16,206,655,749
RAC: 261,147
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 53639 - Posted: 11 Feb 2020 | 17:23:49 UTC - in response to Message 53637.

An interesting article to deepen about thermistors.
There is an specific section about Inrush Current Limiting Thermistor
Moreover, in most of motherboards, temperatures indicated by hardware monitor are based on thermistor sensors.
Nice reading. I didn't know the operating temperature of these thermistors. If they operate at 80-90°C, then 10 seconds is probably enough for them to cool down to 50-60°C, so they will limit the inrush current (not that much if they start from room temperature though).

Profile ServicEnginIC
Avatar
Send message
Joined: 24 Sep 10
Posts: 581
Credit: 9,770,362,024
RAC: 21,500,013
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 53640 - Posted: 11 Feb 2020 | 20:55:43 UTC - in response to Message 53639.
Last modified: 11 Feb 2020 | 21:05:39 UTC

If they operate at 80-90°C...

They do.

...then 10 seconds is probably enough for them to cool down to 50-60°C, so they will limit the inrush current (not that much if they start from room temperature though).

We agree. Twenty seconds better than 10, but I wanted to state a realistic lapse for us commonly eager crunchers...

Profile ServicEnginIC
Avatar
Send message
Joined: 24 Sep 10
Posts: 581
Credit: 9,770,362,024
RAC: 21,500,013
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 53649 - Posted: 15 Feb 2020 | 1:58:44 UTC

Reading this post, I thought... This can be a challenge for a hardware enthusiast!
When server is plenty of WUs, everything is wonderful for our hosts with minimal intervention.
But at scarcity periods, I've past many moments clicking "Update" at BOINC Manager for requesting job...
So I asked myself: Is there a way to better employ this time in other tasks?
I rescued some basic engineering concepts, and I spent a funny weekend while developing it.
Now, after testing it to work, I'm pleased to share with those of you interested to try.

I had an old souvenir mouse stored in a drawer. I dismounted it and took as starting point.
Then, I designed the electrical circuit, I gathered the necessary material, and I got to work.
This is the resulting circuit as seen from top.
And this is as seen from bottom.
It is made with a technic I used to employ when I studied, soldering point to point by means of wire-wrapping cable.
Now time to integrate at mouse circuitry and assembly.
And here is the final result.

Does it work as intended?
Yes, it does!
This combination sends an automatic request for tasks about every two minutes.
I've tested it on my fastest crunching-only host, for downloaded WUs (if any) to be returned on the day.

Finally:
As soon as I showed it to my son, he asked me: You know that there are software applications that do the same, don't you?
Yes, I do, but... This is the hardware enthusiast's corner!

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 1340
Credit: 7,653,273,724
RAC: 13,371,735
Level
Tyr
Scientific publications
watwatwatwatwat
Message 53650 - Posted: 15 Feb 2020 | 18:03:17 UTC - in response to Message 53649.

Thanks for an enjoyable project tour.

There is much to be said and gained from cobbling a solution together, inelegant though it might be of bits and pieces scrabbled from the bit bucket of cast off parts. At least exercise of the old grey matter was performed.

Profile ServicEnginIC
Avatar
Send message
Joined: 24 Sep 10
Posts: 581
Credit: 9,770,362,024
RAC: 21,500,013
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 53757 - Posted: 24 Feb 2020 | 15:10:34 UTC

The problem:
One 24/7 processing computer loosing intermitently its network connection, thus not being able to report processed tasks, nor asking for new work.
The cause:
Its Wireless network card not fully inserted into PCIE x1 socket, resulting in an intermitent bad electrical contact problem.
The solution:
After checking, it was a mere mechanical problem.
It was corrected by dismounting the card from its mounting frame, and bending the fixing tabs in the proper (CW) direction.
As a result, the whole card was tilted in the direction of fully insert into PCIE x1 socket.

Profile ServicEnginIC
Avatar
Send message
Joined: 24 Sep 10
Posts: 581
Credit: 9,770,362,024
RAC: 21,500,013
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 53771 - Posted: 25 Feb 2020 | 20:51:44 UTC

In line with my last post:

A graphics card without extra power connector(s) is receiving all its power from the PCIE socket.
For example, this GTX 1650 rated TDP is 75 Watts, and it has no power connectors.
This requires a current of 6,25 Amperes from the +12 Volts supply. (12V x 6.25A = 75W)
For this reason, it is particularly important for this kind of cards the best possible electrical contact into PCIE socket.
Usually there is enough mechanical play at Graphics cards mounting frame to physically reseat its PCB to be deeply inserted into PCIE socket.
In my experience, this mechanical play can vary from about 0.5 to 1.5 milimeters (0.02 to 0.06 inches).
It is usually very easy, and it takes only a few minutes to reseat PCB this way.
Taking the same above mentioned graphics card as an example:
This is how its mounting frame looks like.
I'll loosen all frame's fixing nuts/screws.
Starting with the two hexagonal female-threaded nuts, marked as 1 and 2 in previous image, then finishing with all screws, here marked as 3 and 4.
Depending on the kind of card, there may be a lower or greater number of fixings, but usually they are easy to locate.
Once all fixings are loose, the mounting frame will show its mechanical play.
Holding PCB at its deeper position relative to mounting frame, all fixings are to be retightened now, starting again with the threaded nuts (here 1 - 2) and finishing with all screws (here 3 - 4).
The final result: Graphics card PCB has come down nearly 1 milimeter.
It can be appreciated when looking at Before and After images.

In a computer where its graphics card is intermitently being unrecognized, this could be a point to discard.

Profile ServicEnginIC
Avatar
Send message
Joined: 24 Sep 10
Posts: 581
Credit: 9,770,362,024
RAC: 21,500,013
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 53803 - Posted: 1 Mar 2020 | 0:00:41 UTC

If keen on bricolage and informatics, how about mixing both?
I'm explaining a good example for this.

The only fan for this GTX 1650 graphics card started to fail, and I retired it momentarily from work to avoid it to become damaged due to overheating.
I asked myself: Should I claim for warranty, and wait perhaps a couple of weeks for the new card to arrive... and lose the fun for solving it by myself?
I doubt for about 10 seconds. This is self answered in this post.

I looked for something to help among my retired cards, and I found this Gigabyte GT640 GV-N640OC-2GI that I probably would not use any more.
I like Gigabyte cards because of their usually good design, constructive quality, and well dimensioned heatsinks and fans.

Comparing heatsink mounting spacings in both cards I found to be nearly identical. And Gigabyte heatsink's surface and fan were bigger than original PNY's ones (Ok!).
But comparing the components layout below heatsinks, some problems arised.
Gigabyte's heatsink was hitting several PNY's card components: One quartz crystal (Y1), one solid capacitor (C204), and one ferrite core choke (L15)
Here is when the bricolage part comes in play...
- Marquetry saw for metal cutting, to retire some problematic fins.
- Minidrill with ceramic milling piece, to make space into aluminum where needed.
And mechanical problems are solved.

Now it's time for applying [url=http://www.servicenginic.com/Boinc/GPUGrid/Forum/HE/GpuCoolerReplacement/06_Thermal paste.JPG]thermal paste[/url] and heatsink assembly.

One more adapt was needed, because fans connectors were not compatible. But a bit of soldering and heat shrink sleeve, and also it's solved.

After this, we can compare between Before and After .

Now this peculiar hybrid PNY-Gigabyte graphics card is working again!

A final question for users that may have experienced a similar situation: Is fan usually covered by card's warranty?
If so, is the whole card replaced by distributor, or the fan only?
Your experiences at this respect would be very appreciated.

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2343
Credit: 16,206,655,749
RAC: 261,147
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 53806 - Posted: 1 Mar 2020 | 9:32:19 UTC - in response to Message 53803.

Does this card run at a lower temperature than before?

One more adapt was needed, because fans connectors were not compatible.
The original card doesn't have a 3rd pin (tachometer), so the card can't sense if the fan is not rotating. This is not a good setup for crunching.

A final question for users that may have experienced a similar situation: Is fan usually covered by card's warranty?
These cards are made for light gaming, not hardcore (7/24) crunching, so crunching (mining) isn't covered by warranty. But GPUs don't have an operating hours counter, so if you don't explicitly express on the RMA form that you used it for crunching, they will replace it. But the replacement will be the same quality, so I usually replace the fans (or the complete heatsink assembly) for a better one.

If so, is the whole card replaced by distributor, or the fan only?
It depends, but usually the whole card is replaced, then the broken card is sent to the manufacturer for refurbishing (replacing the fan in this case).

Profile ServicEnginIC
Avatar
Send message
Joined: 24 Sep 10
Posts: 581
Credit: 9,770,362,024
RAC: 21,500,013
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 53807 - Posted: 1 Mar 2020 | 11:35:32 UTC - in response to Message 53806.

Does this card run at a lower temperature than before?

Yes and no. Peak temperatures are about two degrees lower now, as new heatsink and fan are bigger than originals.
Explanation continues below.

The original card doesn't have a 3rd pin (tachometer), so the card can't sense if the fan is not rotating.

Right. This is by this card's design.
However, Fan % is temperature controlled.
And also by design, at full load card seems to "feel comfortable" at 78ºC. If temperature tends to lower this, also Fan % is lowered and temperature accomodates 78ºC again. But now Fan % at stability is about 10 % lower than with original heatsink/fan (60 % instead of previous observed 70%).

...they will replace it. But the replacement will be the same quality, so I usually replace the fans (or the complete heatsink assembly) for a better one.

I thought the same when evaluating solution.

This card is not installed in an easy environment: it is directly abobe a GTX 1660 Ti, in this double graphics card computer.

Profile ServicEnginIC
Avatar
Send message
Joined: 24 Sep 10
Posts: 581
Credit: 9,770,362,024
RAC: 21,500,013
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 53954 - Posted: 20 Mar 2020 | 14:49:23 UTC

As of current restrictions in many countries due to COVID-19 impact:
It becomes important to solve our hardware problems by ourselves.
Please, feel free to share here your problems in a Symptom - Cause - Solution scheme, or your favorite self-learnt tricks.
It may be of great help to other colleagues.
Thank you in advance!

Profile ServicEnginIC
Avatar
Send message
Joined: 24 Sep 10
Posts: 581
Credit: 9,770,362,024
RAC: 21,500,013
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 53961 - Posted: 21 Mar 2020 | 17:20:02 UTC
Last modified: 21 Mar 2020 | 17:21:39 UTC

- Symptom: A computer controlling an important process suddenly switched off by itself. Repeated attemps to switch it on again resulted in switching off after a few seconds past.

- Cause: Two Processor heatsink's fixings had broken, causing it to tilt and loss tight contact with processor surface. As a self-protecive measure, system is switching off to prevent processor damage due to overheating.

- Solution:

* Plan A:
First attempt consisted of repairing the broken fixings with fast curing cyanocrilate glue.
After two hours curing, time to renew processor's thermal paste and reassemble heatsink.
Result: After about three minutes waiting, fixing springs overcame glued parts and they got broken again.

* Plan B:
Studying carefully the heatsink mounting hardware, there was a passthrough hole at every corner in a very suitable placing to solve the problem by means of strategically arranged cable ties.
Result: Cable ties are strong enough to keep necessary tension. Problem solved, and everything is working again!

Particular conditions for this case:
This case comes from a true intervention in the PC controlling a laboratory diagnostic instrument for celiac and autoimmunity diseases.
I had to carry out this intervention dressed in all necessary PPEs (Personal Protective Equipments), thus not being fully free to go and come for spare parts.
Solving the problem meant that diagnostic results for many patients, otherwise lost, were successfully retrieved.
I took it as a NOW or NEVER situation, and happily it was NOW.

Finally: Let this be my modest tribute to all those worldwide medical staff and field service colleagues, currently working in hard conditions due to Coronavirus crisis.

Profile ServicEnginIC
Avatar
Send message
Joined: 24 Sep 10
Posts: 581
Credit: 9,770,362,024
RAC: 21,500,013
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 54175 - Posted: 31 Mar 2020 | 17:37:01 UTC

Finally, my adventure with Conductonaut thermal compound ended in an unexpected way.

For background, please, refer to my previous post dated on January 26th 2020.
On past March 29th, while a regular round of temperature checks, I found that the concerned GTX 1660 Ti card's temperature was 83ºC. (!)
Yes, it was running an ACEMD3 WU, but when I first tested Conductonaut this temperature was 60ºC...
I dismounted the GPU's heatsink and found that the original liquid-metal Conductonaut's state was converted in a soft-solid metal state.
On this new state, I observed some cracks and irregularities, explaining a bad thermal coupling and subsequent abnormal temperature raising.
It was hard to retire the altered compound, first using a plastic spatula, and then a fine polishing cotton.
I can reccomend this kind of silver cleaner, made of a fine polishing-compound impregnated cotton.
At the end, heatsink's copper surface recovered its original appearance.

I decided to replace Conductonaut using my regular non-conductive thermal paste, Arctic MX-2.
Manufacturer promises an eight years durability for it.
Based on my own experience, I've tested to last at least 4 years, because I usually prefer to preventively replace it after about this period.
It is easy to apply, due to its self-spreading ability.

After this, GTX 1660 Ti returned to work, now the temperature being reduced from previous 83ºC to 77ºC.

In a 24/7 working rig, it is advisable to check temperatures in a regular way, to prevent overheating on different components.
For sure, it will increase the life expectancy for the whole system.

Aurum
Avatar
Send message
Joined: 12 Jul 17
Posts: 401
Credit: 16,755,010,632
RAC: 220,113
Level
Trp
Scientific publications
watwatwat
Message 54176 - Posted: 31 Mar 2020 | 17:50:32 UTC

Thermalright TF8 Thermal Compound Paste is the best I've used. It has the highest thermal conductivity at 13.8 W/mK. The best thing about it is that when you remove the CPU cooler after months of use it's still gooey and hasn't solidified like most others. It's the most expensive, until competition comes along. One wants the thinnest continuous layer you can get so use as little as possible and use the spatula to spread it out. I expect it can last for years.

https://www.amazon.com/gp/product/B07K442WXV/ref=ppx_yo_dt_b_asin_title_o08_s00?ie=UTF8&psc=1

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2343
Credit: 16,206,655,749
RAC: 261,147
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 54184 - Posted: 1 Apr 2020 | 0:48:13 UTC - in response to Message 54175.

Finally, my adventure with Conductonaut thermal compound ended in an unexpected way.

For background, please, refer to my previous post dated on January 26th 2020.
On past March 29th, while a regular round of temperature checks, I found that the concerned GTX 1660 Ti card's temperature was 83ºC. (!)
Yes, it was running an ACEMD3 WU, but when I first tested Conductonaut this temperature was 60ºC...
I dismounted the GPU's heatsink and found that the original liquid-metal Conductonaut's state was converted in a soft-solid metal state.
This is very strange. I didn't experienced such change in the liquidity of the Conductonaut, and the temperatures of my CPUs / GPUs on which I've changed the thermal grease.

Profile ServicEnginIC
Avatar
Send message
Joined: 24 Sep 10
Posts: 581
Credit: 9,770,362,024
RAC: 21,500,013
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 54204 - Posted: 2 Apr 2020 | 14:30:55 UTC - in response to Message 54184.

This is very strange. I didn't experienced such change in the liquidity of the Conductonaut...

I guess that tested heatsink's core is not made of pure copper, but some kind of alloy not compatible with Conductonaut.

Profile ServicEnginIC
Avatar
Send message
Joined: 24 Sep 10
Posts: 581
Credit: 9,770,362,024
RAC: 21,500,013
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 54315 - Posted: 12 Apr 2020 | 14:45:56 UTC

Derived from current COVID-19 regulations at Spain, requiring home confinement, a challenge arose:
Will I be able to build a new crunching rig from my stored spare/scrapped pieces?

I started by rescuing an ancient Pentium 4 system "stored" at top of a wardrobe.
I dismounted motherboard, PSU, peripherals, and I got that old minitower ATX chassis as starting point.

PSU: The old PSU had not proper connections for current mainboards.
I rescued two PSUs from my scrap drawer, one with failed electronics, and the other with failed fan...
I replaced defective fan by the working one, and the PSU problem was solved.

Motherboard, CPU, RAM: I had stored at spares drawer the ones leftover from my last hardware upgrade.
There was a new problem: Available chassis is an old model one, with PSU hanging directly above CPU location. But I found an original Intel low profile CPU heatsink, and problem was solved also.

From spares, I rescued my last remaining 120 GB SSD and a GIGABYTE GTX750 factory overclocked graphics card...
With all these and a bit of (free ;-) self-workmanship, the new rig is a fact without leaving home: Test passed!

New system Host ID: 540272

New system look:

One more detail: Due to the low power consumption (38 W TDP) graphics card and the reduced CPU heatsink, this is the only of my rigs with CPU running hotter (59 ºC) than GPU (53 ºC) at full load.

Profile ServicEnginIC
Avatar
Send message
Joined: 24 Sep 10
Posts: 581
Credit: 9,770,362,024
RAC: 21,500,013
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 54367 - Posted: 18 Apr 2020 | 18:47:40 UTC

If we call severe to a problem that prevents a computer to start working.
If we call ridiculous to a trivial circumstance causing a severe problem.

This is one of the most severe-ridiculous problem I've ever found, and more than once.
It happened today in one of my rigs.
I'm documenting it this afternoon, and I'll publish the solution on tomorrow's afternoon.

- Symptom: Starting the system, it runs for some seconds, then it stops and nothing happens on following attempts to restart.
I opened this system, I made a quick contacts check, started again, and this time the start attempt succeeded (Fans turning, beep heard...) for a few seconds only.

- Cause: I started to think: PSU failure, CPU heatsink disengaged... and, If it was...? And it was it!

- Solution: ???

You have 24 hours to guess your favorite cause-solution.

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2343
Credit: 16,206,655,749
RAC: 261,147
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 54369 - Posted: 18 Apr 2020 | 19:16:08 UTC - in response to Message 54367.

A stuck power button can cause this: first it turns on the system, but if it stays in the "pressed" state it will turn off the system after 4-5 seconds (hard power off).

Ian&Steve C.
Avatar
Send message
Joined: 21 Feb 20
Posts: 1069
Credit: 40,231,533,983
RAC: 527
Level
Trp
Scientific publications
wat
Message 54370 - Posted: 18 Apr 2020 | 22:28:50 UTC

Bad PSU
Bad motherboard
Bad memory
____________

Pop Piasa
Avatar
Send message
Joined: 8 Aug 19
Posts: 252
Credit: 458,054,251
RAC: 0
Level
Gln
Scientific publications
watwat
Message 54373 - Posted: 19 Apr 2020 | 1:48:55 UTC - in response to Message 54370.
Last modified: 19 Apr 2020 | 2:15:43 UTC

Bad PSU
Bad motherboard
Bad memory


From my experience, the PSU is most likely to be problematic. Just sayin'.

Profile ServicEnginIC
Avatar
Send message
Joined: 24 Sep 10
Posts: 581
Credit: 9,770,362,024
RAC: 21,500,013
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 54379 - Posted: 19 Apr 2020 | 18:48:43 UTC - in response to Message 54367.

- Symptom: Starting the system, it runs for some seconds, then it stops and nothing happens on following attempts to restart.
I opened it, I made a quick contacts check, started again, and this time the start attempt succeeded (Fans turning, beep heard...) for a few seconds only.

- Cause: Power On button got temporarily hooked, causing the PSU's hard stop feature to suspend supply after a few seconds.
On the tilt and maneuvers to contacts checking, Power On button disengaged, and then it got hooked again on next time it was pressed.

- Solution: Usually it is possible to access to Power On button switch, most of times by dismounting chassis front panel.
Here is an image of the affected switch at its mounting position, and once it is dismounted.
Nowadays, it is a normally-open push-button. A click must be heard when pushing it, and another click when releasing it.
Problem was solved by dispensing a few drops of ethanol and pushing it repeatedly until it became disengaged and moving freely.
Pretty trivial and ridiculous, but I'm sure that maaany computers have gone to workshop for a problem like this...

On Apr 18th 2020 | 19:16:08 UTC Retvari Zoltan wrote:
A stuck power button can cause this: first it turns on the system, but if it stays in the "pressed" state it will turn off the system after 4-5 seconds (hard power off).

Congratulations!
You have won an image of my special Gold - Medal to Outstanding Analyst.
(Well... Excuse me, it is not exactly gold, it is really high quality bronze ;-)

And my special thanks to Ian&Steve C. and Pop Piasa for participating.

Ian&Steve C.
Avatar
Send message
Joined: 21 Feb 20
Posts: 1069
Credit: 40,231,533,983
RAC: 527
Level
Trp
Scientific publications
wat
Message 54388 - Posted: 20 Apr 2020 | 19:48:27 UTC

finally I was able to finish up my newest GPUGRID system. It's one of my old SETI systems, but I needed to convert it from USB risers to ribbon risers (and motherboard swap) for the increased PCIe bandwidth requirements here.

CPU: Intel Xeon E5-2630Lv2 (6c/12t,2.6GHz)
MB: ASUS P9X79 E-WS
RAM: 32GB (4x8) DDR3L-1600MHz ECC UDIMM
GPUs: [7] EVGA RTX 2070
PSUs: 1200w PCP&C + 1200W HP server PSU






went with a 2U supermicro active CPU cooler so I had enough room for the ribbon risers on the 2 GPUs above it. replaced the 60mm fan on it with a Noctua one since even at 20% speed the stock fan was very noisy. the Noctua fan doesnt cool as well as the stock server fan that came with it, but it's enough for this 60W chip (temps in the 50's @65% load) and it's a lot quieter.
____________

Profile ServicEnginIC
Avatar
Send message
Joined: 24 Sep 10
Posts: 581
Credit: 9,770,362,024
RAC: 21,500,013
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 54391 - Posted: 20 Apr 2020 | 22:16:42 UTC - in response to Message 54388.
Last modified: 20 Apr 2020 | 22:30:04 UTC

🙌

Profile ServicEnginIC
Avatar
Send message
Joined: 24 Sep 10
Posts: 581
Credit: 9,770,362,024
RAC: 21,500,013
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 54392 - Posted: 20 Apr 2020 | 22:20:21 UTC - in response to Message 54388.

I'm really impressed watching at your systems.
Thank you very much for your Masterclass.
That's what I would describe as high-level computer hardware engineering.
And your just newborn system is returning processed tasks like a charm...🙌
Congratulations!

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2343
Credit: 16,206,655,749
RAC: 261,147
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 54393 - Posted: 20 Apr 2020 | 23:32:26 UTC - in response to Message 54388.

finally I was able to finish up my newest GPUGRID system. It's one of my old SETI systems, but I needed to convert it from USB risers to ribbon risers (and motherboard swap) for the increased PCIe bandwidth requirements here.

CPU: Intel Xeon E5-2630Lv2 (6c/12t,2.6GHz)
MB: ASUS P9X79 E-WS
RAM: 32GB (4x8) DDR3L-1600MHz ECC UDIMM
GPUs: [7] EVGA RTX 2070
PSUs: 1200w PCP&C + 1200W HP server PSU






went with a 2U supermicro active CPU cooler so I had enough room for the ribbon risers on the 2 GPUs above it. replaced the 60mm fan on it with a Noctua one since even at 20% speed the stock fan was very noisy. the Noctua fan doesnt cool as well as the stock server fan that came with it, but it's enough for this 60W chip (temps in the 50's @65% load) and it's a lot quieter.
Nice!

Profile Beyond
Avatar
Send message
Joined: 23 Nov 08
Posts: 1112
Credit: 6,162,416,256
RAC: 0
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 54395 - Posted: 21 Apr 2020 | 5:03:11 UTC

Impressive. Thanks for the photos and description.

Pop Piasa
Avatar
Send message
Joined: 8 Aug 19
Posts: 252
Credit: 458,054,251
RAC: 0
Level
Gln
Scientific publications
watwat
Message 54406 - Posted: 21 Apr 2020 | 19:06:23 UTC - in response to Message 54393.
Last modified: 21 Apr 2020 | 19:11:06 UTC

Great Googly-Moogly! You rule, Retvari!
As Ray Wiley Hubbard says: "Some things here under Heaven are just cooler 'n Hell".

https://www.youtube.com/watch?v=o6C579hWdsI

Maybe name your creation something like Chico Grosso?

Ian&Steve C.
Avatar
Send message
Joined: 21 Feb 20
Posts: 1069
Credit: 40,231,533,983
RAC: 527
Level
Trp
Scientific publications
wat
Message 54407 - Posted: 21 Apr 2020 | 20:07:10 UTC - in response to Message 54406.

he was quoting my post but fixed the hyperlinks for the images. I forgot that this site breaks urls that already include http in the link in BBcode.
____________

Profile ServicEnginIC
Avatar
Send message
Joined: 24 Sep 10
Posts: 581
Credit: 9,770,362,024
RAC: 21,500,013
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 54408 - Posted: 21 Apr 2020 | 20:07:58 UTC

On April 20th 2020 | 19:48:27 UTC Ian&Steve C. wrote:

finally I was able to finish up my newest GPUGRID system...

On April 20th 2020 | 23:32:26 UTC, Retvari Zoltan kindly "revealed" the images for this system, previously not able to be seen in original post. (Thank you!)

I'm not letting pass away two comments about it:

- I can't imagine a cleaner way to build a system like this. It's not only a "processing bomb", but also it is elegantly resolved.

- In 24 hours processing, since its first valid result on April 20th 2020 at 19:11 UTC to today's same hour: it had returned 270 valid WUs, and 0 (zero) errored WUs: 100% success.
Well done! It has qualified its first working day with maximum score.

Pop Piasa
Avatar
Send message
Joined: 8 Aug 19
Posts: 252
Credit: 458,054,251
RAC: 0
Level
Gln
Scientific publications
watwat
Message 54423 - Posted: 22 Apr 2020 | 16:33:31 UTC - in response to Message 54407.

😳 Oops... sorry guys. I was so busy drooling over the rig that I forgot to read the header.
Anyway...
That has to be the best design yet for a crunching machine. You've changed my thinking about what my next opus should look like. Thank you for sharing your expertise with us.

Ian&Steve C.
Avatar
Send message
Joined: 21 Feb 20
Posts: 1069
Credit: 40,231,533,983
RAC: 527
Level
Trp
Scientific publications
wat
Message 54424 - Posted: 22 Apr 2020 | 17:42:57 UTC - in response to Message 54423.
Last modified: 22 Apr 2020 | 17:53:57 UTC

😳 Oops... sorry guys. I was so busy drooling over the rig that I forgot to read the header.
Anyway...
That has to be the best design yet for a crunching machine. You've changed my thinking about what my next opus should look like. Thank you for sharing your expertise with us.

I took cues from my experience with cryptocurrency mining. this is a pretty common type of mining setup, and the frame was cheap ($35 on Amazon), though most people doing that will use USB risers instead of these ribbon risers, both for cost and power delivery reasons. I actually had most of this hardware already, I converted it from a USB riser setup to a ribbon riser setup for the transition to GPUGRID.

Some things to keep in mind if you want to do something like this:

    1. Be mindful of the specs of your system, in particular the link width and generation of your PCIe slots. Some slots might be x16 size and fit a GPU, but only have electrical connections for a x8 or x4 link. some older boards might have a mix of PCIe 3.0 slots and pcie 2.0 (half speed) slots. Some motherboards may disable certain slots when others are in use. Pay attention to where the lanes are coming from and where the bottlenecks are. One common thing I see people overlook are the lanes coming from the chipset. It may be able to supply many lanes, but the chipset itself then only has a PCIe 3.0 x4 link back to the CPU. Read your motherboard manual thoroughly and lookup the specs of your components to understand how resources are allocated for your board.

    2. GPUGRID requires a lot of PCIe bandwidth, and that likely scales with GPU speed. I've measured up to 50% of a PCIe use on a PCIe 3.0 x8 link, or up to 25% of a PCIe 3.0 x16 link with my RTX 2070 and 2080 cards. If you have a fast GPU, I would not put it on anything slower than PCIe 3.0 x4 (not common anyway) or PCIe 2.0 x8. slower GPUs might get by on slower links.

    3. Be mindful of how much power you are pulling from the motherboard. When using USB risers you do not have to worry about this since power is supplied from external connections. But a setup like mine is pulling some of the GPU power from the motherboard slots. My motherboard has a 6-pin VGA power connection to supply extra power to the motherboard PCIe slots. PCIe spec for a x16 slot is up to 75W each! but most GPUs won't pull that much (except 75W GPUs that do not have external power!). If you plug GPUs directly to the motherboard, or use ribbon risers like I have, I wouldn't recommend using more than 3, maybe 4 GPUs (pushing it) unless you are supplying extra power to the board somehow.

    4. if on a PCIe 3.0 link, you'll want to get higher quality shielded risers. PCIe 3.0 is a lot more susceptible to interference and crosstalk in the data lines than PCIe 2.0 or 1.0. The shielded risers are a lot more expensive though. I bought what I consider to be "good enough" knockoffs and they work perfectly fine, but were still $25 each, and that's kind of the low end of the pricing for 20cm long risers. out of 14 of these brand risers that I've purchased, 2 were defective (bad PCIe signal quality causing low GPU utilization and GPU dropouts) and needed to be replaced, so test them!




I took all of these things into account to end up with what you see here :)

I have power limited all GPUs on this system to 165W (175W stock), and at full tilt the system pulls 1360W from the wall.
____________

Pop Piasa
Avatar
Send message
Joined: 8 Aug 19
Posts: 252
Credit: 458,054,251
RAC: 0
Level
Gln
Scientific publications
watwat
Message 54426 - Posted: 23 Apr 2020 | 0:02:38 UTC - in response to Message 54424.

Wow thanks a million! Info I will definitely use.

_heinz
Send message
Joined: 20 Sep 13
Posts: 16
Credit: 3,433,447
RAC: 0
Level
Ala
Scientific publications
wat
Message 54437 - Posted: 25 Apr 2020 | 14:02:35 UTC
Last modified: 25 Apr 2020 | 14:28:47 UTC

Im Running V8-XEON built in May 2013 by myself.
Board Intel Skulltrail D5400XS. (2PCI, 4 PCI-E x16, 4 FB-DIMM, Audio, Gigabit LAN)
RAM 16GB FB-DIMM, Quadchannel, Kingston
Processors 2 x E5405 Xeon. LGA771
Grafik 3x EVGA Geforce GTX Titan, (before 2 GTX 470 + 1 GTX 570)
PSU LEPA 1600W continuous Power, Gold certfied

If runs empty, V8-Xeon pulls 285 Watt out of the wall.
If it is crunching on all GPUs it pulls 860 - 890 Watt
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
I shreddert 2 Super Flower PSU 1000W After one and a half year.
The machine is absolut stable. The Xeons run over years with 100% CPU usage.
Old but fine..
____________

Profile ServicEnginIC
Avatar
Send message
Joined: 24 Sep 10
Posts: 581
Credit: 9,770,362,024
RAC: 21,500,013
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 54445 - Posted: 25 Apr 2020 | 22:45:59 UTC - in response to Message 54437.

Im Running V8-XEON built in May 2013 by myself.

Thank you very much for sharing your setup.
Casually, my oldest self-made system currently in production is this one, built on March 12th 2013, and from then, it has experienced successive upgrades.
It has cathed my attention that your system was built the same year, and for that time it was a quite advanced configuration, based on a bi-Xeon E5405 processor.
One particular trick: When I'm interested on any Intel processor specifications, I enter on Google web search "ark E5405" (for example), and the first match leads to something like this...
Best regards,

Aurum
Avatar
Send message
Joined: 12 Jul 17
Posts: 401
Credit: 16,755,010,632
RAC: 220,113
Level
Trp
Scientific publications
watwatwat
Message 54506 - Posted: 1 May 2020 | 1:14:34 UTC

One of my best headless computers was giving me a lot of trouble. Intermittently it would just stop crunching but still be powered up. Sometimes rebooting would get it going again, for a while anyway. So I put it on my desk with a monitor and started swapping power cables and RAM. Then it failed while I was watching and the GPU lights came back on even though I had --assign GPULogoBrightness=0 active in the NVIDIA X Server Settings startup program. At the same time the fans went to max even though it wasn't hot. So I pulled the GPU card to try another and the locking clip on the back of the PCIe socket popped off. I got a flashlight and was trying to figure out how to reinstall the clip when I noticed dirt inside the slot on the contacts. EVGA cards are notorious for having this clear fluid ooze out and sometimes drip down on the motherboard. I assume it's a thermal compound but don't know. It seems to be nonconductive and I've had it on the card contacts before without stopping it from working. I always wipe the card clean when I take them out. But this time I looked in the female PCIe slot with my magnifying glass and saw the contacts were coated with a dusty grime that was mixed in this mystery fluid. I took a toothbrush and cleaner the slot out and then blew it out. Put the same card back and so far so good. One more thing to add to the troubleshooting list.

Another computer would randomly turn off. Sometimes rebooting got it going for a while. When taking it apart the 8-pin CPU power connector to the motherboard had one corner pin disintegrate when I pulled the plug out. It had been running trouble free for years but the plastic of the connector got brittle. Cleaned the connector on the MB out with an X-ACTO knife, turned it upside down and blew it out. Installed a new CPU power cable and it's good as new again.

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 1340
Credit: 7,653,273,724
RAC: 13,371,735
Level
Tyr
Scientific publications
watwatwatwatwat
Message 54509 - Posted: 1 May 2020 | 1:58:49 UTC - in response to Message 54506.

The "mystery" fluid that oozes out of graphics cards is the silicon oil separating from the thermal pads on the VRM and memory chips or the from the thermal paste.

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2343
Credit: 16,206,655,749
RAC: 261,147
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 54513 - Posted: 1 May 2020 | 10:53:07 UTC - in response to Message 54506.

I noticed dirt inside the slot on the contacts. EVGA cards are notorious for having this clear fluid ooze out and sometimes drip down on the motherboard. ...
It seems to be nonconductive and I've had it on the card contacts before without stopping it from working. I always wipe the card clean when I take them out. But this time I looked in the female PCIe slot with my magnifying glass and saw the contacts were coated with a dusty grime that was mixed in this mystery fluid.
1. Cards made by any manufacturer leak this silicon oil if they are used long and hot enough. The oil's viscosity is much less on higher temperatures, so the thermal pads / grease leaks noticeable quantity of it over time.
2. Conductivity is a tricky property. It varies greatly depending on the frequency of the electromagnetic wave. Think of vacuum, which is the best insulator, light and radio waves still can travel through vacuum, as their frequency is high enough. The state of the art computers operate at the microwave frequency (GigaHertz) range, so the grime which is non-conductive on DC acts as a dielectric of a capacitor, which "turns" into a conductor at high frequencies. As grime builds up over time, it's capacitance increases, thus it's conductivity at high frequencies increases, and when it's enough to push the PCIe bus out of specifications, the GPU won't work anymore (or it will run at PCIe2.0 instead of 3.0).

Profile ServicEnginIC
Avatar
Send message
Joined: 24 Sep 10
Posts: 581
Credit: 9,770,362,024
RAC: 21,500,013
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 54514 - Posted: 1 May 2020 | 11:17:22 UTC

On May 1st 2020 | 1:14:34 UTC Aurum wrote:

One more thing to add to the troubleshooting list.

Thank to all of you for helping to complete with this topic this somehow never-ending list.

Moreover, "non-conductive" fluids have usually a very high "efficiency" in retaining dust particles, that sometimes are conductive themselves, or when dampening with environment humidity... And then (misterious) problem(s) may arise.

Aurum
Avatar
Send message
Joined: 12 Jul 17
Posts: 401
Credit: 16,755,010,632
RAC: 220,113
Level
Trp
Scientific publications
watwatwat
Message 54566 - Posted: 3 May 2020 | 22:32:20 UTC
Last modified: 3 May 2020 | 23:02:59 UTC

1. Silicone oil, that makes perfect sense. I was tempted to turn the MB upside down and spray the slots with either isopropyl alcohol or brake cleaner (methanol, toluene, acetone & CO2). Thought better of it since the solvents might dissolve the phenolic or epoxy resin and my board would disintegrate in my hands :-)

The brake cleaner can has a warning: Electrical Shock Hazard: this spray will conduct electricity, keep away from all electrical sources. Imagine doing your brakes one evening with the drop lamp hanging in the wheel well and it's the last thing you ever see.

But seriously, my toothbrush was not a very good way to clean goo out of a little slot. Suggestions welcome.

2. I'm running full steam on Rosetta CPU WUs waiting for OpenPandemics to kick off. Rosetta needs about 1 GB RAM per WU. I've been frustrated with MBs that won't run multiple sticks of RAM. I've been trying to get 64 GB on my 40t & 44t CPUs but only one MB lets me run 4 x 16 GB even when they're the same. I've tried every combination I could. E.g. MSI X99A Gaming 9 ACK, MSI X99A Raider & MSI X99A SLI will only acknowledge 3 x 16 GB. But an MSI X99 SLI Plus will run 4 x 16 GB.
At first I thought it was a bad slot but then I could move the third stick to other array and it would work: DIMM slots: 1, 5 & 3 or 1, 5 & 7.
I bought these cheap on fleaBay so maybe gamers overclocked and overtaxed them and I'm dealing with cripples.

3. The best MB I've got is the cheapest: Huananzhi X99-8M Gaming.

4. Some MBs just won't run the full range of CPUs their specs claim. E.g., my MSI X99A SLI Plus has chronic intermittent stoppages with a Xeon E5-2673 v4 SR2KE but runs flawlessly when replaced with an i7-5930k SR20R. Maybe it's too old and just can't lift the weight any more, like me :-)

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 1340
Credit: 7,653,273,724
RAC: 13,371,735
Level
Tyr
Scientific publications
watwatwatwatwat
Message 54569 - Posted: 3 May 2020 | 23:59:37 UTC

I've always found that not reading all the sticks of installed memory on Intel LGA socket motherboards is due the cpu not being inserted correctly in the socket.

Or bent out of place LGA pins in the socket or are overheated. The cpu needs to be correctly located in the socket and also the locking clamp and cooler need to be installed to the correct torque specifications.

The pins in the LGA socket undergo both lateral and vertical position displacement when a cpu is installed. The LGA socket in location is actually very tight. The pins and pads on the cpu need to maintain 40 micron absolute positional location to be within spec.

The notches in the cpu substrate that locate the cpu in the socket allow for a lot of slop. When I don't read all the channels in the installed memory. I always undo the socket clamp and wiggle the cpu in the socket to allow the LGA socket pins to orbit around and hopefully mate with the corresponding pad on the substrate. Then reclamp and test for all the memory to be picked up again.

If you look at the more recent Intel cpu pin mappings, you will find the outside perimeter of pins often contain the memory channel assignments. And those pins undergo some of the greatest positional translation when under compression.

Aurum
Avatar
Send message
Joined: 12 Jul 17
Posts: 401
Credit: 16,755,010,632
RAC: 220,113
Level
Trp
Scientific publications
watwatwat
Message 54673 - Posted: 12 May 2020 | 17:16:50 UTC
Last modified: 12 May 2020 | 17:17:55 UTC

I'm seeing X99 motherboards (e.g. i7-6950X or Xeon E5-2673 v4) that use DDR3 RAM or both slots for DDR3 & DDR4 to be used in an either/or way. My first reaction is what a nice way to get some more mileage from my old DDR3.

Is there a technical reason that combining DDR3 memory with the X99 generation of CPUs will slow them down or disable some of their functionality???

This company has both: http://www.huananzhi.com/html/1/184/185/index.html

Profile ServicEnginIC
Avatar
Send message
Joined: 24 Sep 10
Posts: 581
Credit: 9,770,362,024
RAC: 21,500,013
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 54676 - Posted: 12 May 2020 | 20:34:08 UTC - in response to Message 54673.

Is there a technical reason that combining DDR3 memory with the X99 generation of CPUs will slow them down or disable some of their functionality???

I've found the motherboard you probably are referring to: http://www.huananzhi.com/html/1//184/185/362.html

My experience with combo mainboards:
I asked myself the same question several years ago... but when I was trying to squeeze a bit more some DDR2 memory modules.
I bought this MSI G41M-P33 COMBO
It is still working at my system #540272
But I've found several drawbaks that had made me to think not to repeat this policy.
Now this motherboard is running with 8GB of DDR3 1333 MHz, for better performance than DDR2 800 MHz on CPU tasks.
But I've had to set DDR3 1333 MHz to run at 1066 MHz for system stability reasons. (1333 MHZ is specified as overclock for this G41+ICH7 chipset)
And recently I've found new Nvidia Turing based graphics cards not being compatible with this motherboard. System doesn't even start.
I'm running a GTX 950 on it for this reason...

On the other hand, your suggested motherboard has attractive specifications, and it has made me to enter in doubt about my previous determination 🤔️

Some other opinions or experiences would be welcome...

Ian&Steve C.
Avatar
Send message
Joined: 21 Feb 20
Posts: 1069
Credit: 40,231,533,983
RAC: 527
Level
Trp
Scientific publications
wat
Message 54677 - Posted: 12 May 2020 | 21:21:45 UTC

you need to use specific CPUs that support both DDR3 and DDR4, and none of them have official intel ark pages. there appear to only be a handful of them:

E5-2678v3
E5-2696v3
E5-2629v3
E5-2649v3
E5-2669v3
E5-2672v3
E5-2673v3

if you use an "offical" chip like a retail i7 chip or other retail Xeon Chip, you will probably only be able to use the DDR4 slots, since those chips don't have DDR3 controllers. or maybe they wont work at all in this board, I'm not sure.
____________

Aurum
Avatar
Send message
Joined: 12 Jul 17
Posts: 401
Credit: 16,755,010,632
RAC: 220,113
Level
Trp
Scientific publications
watwatwat
Message 54678 - Posted: 13 May 2020 | 2:17:40 UTC

Ok that explains the ad I saw with that list of CPUs but no explanation:
https://www.amazon.com/gp/product/B07XLH1WSF/ref=ox_sc_saved_title_3?smid=A2M1V9OGLU9XW&psc=1
This idea sounds too risky. I'm sticking with DDR4 MBs. Thx folks.

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2343
Credit: 16,206,655,749
RAC: 261,147
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 55052 - Posted: 13 Jun 2020 | 22:18:36 UTC - in response to Message 54204.
Last modified: 13 Jun 2020 | 22:23:37 UTC

This is very strange. I didn't experienced such change in the liquidity of the Conductonaut...

I guess that tested heatsink's core is not made of pure copper, but some kind of alloy not compatible with Conductonaut.
You are probably right.
My Gigabyte AORUS GTX 1080 Ti showed the same symptoms (its GPU temperature rose to 90°C). First I cleaned its fins, but there were no change in GPU temperature, so I reduced its power target to 150W until I could remove the card again for disassembly. After I did, I've noticed that the TGC has solidified, and completely gone from the silicone of the GPU chip. So I re-applied some TGC on both surfaces, and assembled the card. Now it's running fine again (71°C). I regularly check the temperatures of my GPUs, so I'm sure that this change in the physical state of TGC was quite sudden.
However I have a GTX 2080 Ti with a copper heatsink, and it's running fine. Other cards with nickel(?) heatsinks and TGC are running fine. I keep an eye on them, if another one will have higher temperatures I'll disassemble that card too.

Profile ServicEnginIC
Avatar
Send message
Joined: 24 Sep 10
Posts: 581
Credit: 9,770,362,024
RAC: 21,500,013
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 55054 - Posted: 15 Jun 2020 | 20:22:15 UTC

I'm pleased to return to posting, after a forced silent period: my main computer crashed (the same computer I'm writing this from), and I had to recover it first.
Good opportunity to tell some curiosities regarding this...
It is a permanently working/crunching Ubuntu Linux + Windows 10 OS computer. I usually stop it only for some preventive maintenance from time to time.

- Symptoms: On first day I found this computer was blocked in the morning, not responding to keyboard or mouse. I restarted it, and apparently it returned to normality.
On second day, I found it had blocked again, but when I restarted it and Ubuntu was booting, a black screen with multiple errors regarding hard disk access appeared.
I did a hard stop by pressing power switch, then I checked all HDD SATA and power cables.
After rebooting, the Ubuntu HDD was not recognized, and system tried to boot from Windows 10 HDD, but It failed.
A new connections check, but in a subsequent reboot no SATA devices were recognized: nor Ubuntu HDD, nor Windows 10 HDD, nor SATA optical unit (CD/DVD writer).

- Cause: Motherboard's SATA controller section failure.

- Solution: Motherboard replacement.

Starting from a veteran system with motherboard for Socket 775 processors and DDR3 memory, replacing motherboard by a current one implies also to replace CPU and RAM.

Old system.

New motherboard installed.

Installing CPU.

Applying self-spreading thermal paste.

CPU heatsink and DDR4 memory modules installed.

Now I decided to replace the previous Pascal GTX1050Ti graphics card by a Turing GTX1650 one. This card was not recognized on old motherboard... Every cloud has a silver lining!
But I found a mechanical problem due to some chassis slot separators protruding inwards, and preventing the new card to seat properly.
I Cut them with a sharp wire cutter, as seen in red (before cutting) and green (after cutting).
Once the graphics card was properly seated , a new problem arose: again, one inopportune chassis flap was preventing the HDMI connector to enter its socket.
Cutting some space in the flap and bending it back and forth with a cable plier did the trick.

This is the inside of the "new" system, and this is the outside.. In some way, a retro look that I like.

These are this new Ubuntu OS System characteristics:

And this is the same system on its Windows 10 side:
At this point, a... let's say... new problem:
When starting Windows 10, one message was shown indicating that too many changes had been detected in computer's hardware. And thus, the old OEM Windows 10 license was no more valid on this "new" system.
I had to buy a new license to reactivate my Windows 10 copy... And now everybody are happy: Microsoft and me ;-)

Profile ServicEnginIC
Avatar
Send message
Joined: 24 Sep 10
Posts: 581
Credit: 9,770,362,024
RAC: 21,500,013
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 55083 - Posted: 27 Jun 2020 | 22:21:50 UTC
Last modified: 27 Jun 2020 | 22:24:00 UTC

A very specific problem, with a simple preventive action:

- Specific conditions: 24/7 working computer, with Linux (Ubuntu) operating system, and communications based on a WiFi network interface.
- Symptom: From time to time, WiFi connection is lost, thus preventing BOINC Manager to report completed tasks and ask for new ones.
- Specific cause: WiFi network interface entering power saving mode.
- Specific solution: Deactivate WiFi network interface power management, for it to be always active.

The way I use to achieve this:

-1) Enter a Terminal window
-2) Enter command

sudo iwconfig

-3) Something like this is obtained:

wlx031415926536 IEEE 802.11 ESSID:"WLAN_PI"
Mode:Managed Frequency:2.452 GHz Access Point: A0:B1:C2:D3:E4:F5
Bit Rate=65 Mb/s Tx-Power=20 dBm
Retry short long limit:2 RTS thr:off Fragment thr:off
Encryption key:off
Power Management:on
Link Quality=59/70 Signal level=-51 dBm
Rx invalid nwid:0 Rx invalid crypt:0 Rx invalid frag:0
Tx excessive retries:0 Invalid misc:44 Missed beacon:0

-4) Watch at the line "Power Management". If it is indicated to be off, the problem is probably due to other reasons. If it is on:
-5) Type command
sudo gedit /etc/rc.local
to edit rc.local file.
-6) Add the line
iwconfig wlx031415926536 power off
immediately before the last line. Note that the string "wlx031415926536" must be equal to the one starting the list obtained in step 3.
-7) The resulting file would look as follows:

#!/bin/sh -e
#
# rc.local
#
# This script is executed at the end of each multiuser runlevel.
# Make sure that the script will "exit 0" on success or any other
# value on error.
#
# In order to enable or disable this script just change the execution
# bits.
#
# By default this script does nothing.

iwconfig wlx031415926536 power off
exit 0

-8) Save changes to rc.local file and reboot.

After following theese steps, when entering the same command than in step 2, the answer would look something like this:

wlx031415926536 IEEE 802.11 ESSID:"WLAN_PI"
Mode:Managed Frequency:2.452 GHz Access Point: A0:B1:C2:D3:E4:F5
Bit Rate=39 Mb/s Tx-Power=20 dBm
Retry short long limit:2 RTS thr:off Fragment thr:off
Encryption key:off
Power Management:off
Link Quality=59/70 Signal level=-51 dBm
Rx invalid nwid:0 Rx invalid crypt:0 Rx invalid frag:0
Tx excessive retries:6 Invalid misc:102 Missed beacon:0

Note that Power Management now is off.
And if so, WiFi network interface will remain always active.
____________

Profile ServicEnginIC
Avatar
Send message
Joined: 24 Sep 10
Posts: 581
Credit: 9,770,362,024
RAC: 21,500,013
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 55117 - Posted: 24 Jul 2020 | 17:28:38 UTC

Another recurrent hardware problem to be mentioned:
It may cause intermitent read/write troubles on SATA units, and even system to get randomly frozen due to this.
The reason is a progressive deformation in SATA power connectors, causing an eventually bad electrical contact and subsequent power transients on SATA unit.
It can be seen at the following image:



A good connector will show a perfect paralelism on its two longitudinal sides
A bad connector is usually widened at its central zone, causing it to look like a couple of "parenthesis symbols" ( )
Please, don't trust a connector like this if you want to avoid misterious problems...

Profile ServicEnginIC
Avatar
Send message
Joined: 24 Sep 10
Posts: 581
Credit: 9,770,362,024
RAC: 21,500,013
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 55129 - Posted: 26 Jul 2020 | 21:21:37 UTC

Now, a very disturbing temperature problem, demonstrating the importance of proper refrigeration:
It was observed at this system
It was mounted in an old ultracompact minitower case, as seen at this picture.
CPU cooler was an Intel stock low profile one.
And PSU was mounted directly over it.
Room temperature: (Canary Islands summer) arround 28 ºC
At this situation, Psensor readings were as follows:
As can be seen at Psensor image
- CPU temp peaked 77 ºC
CPU maximum case temp is rated 76,3 ºC, as stated in Q9550S CPU datasheet.
- temp2 (Chipset Temp) peaked 79 ºC... Too hot for my peace-of-mind !
Intel stock CPU cooler is a passive one, irradiating CPU heat all arround, including nearby Chipset's heatsink...
- CPU cooler's fan was turning at 3355 RPM: It sounded like a drone ready to take off.

Then, I had the chance to get an scrapped extra-wide PC tower case.
And I rescued from my spares drawer an old Gigabyte G-Power II Pro CPU cooler
It is a good heat-pipe based CPU cooler. It was at drawer because... It only seats at extra-wide PC cases :-)

And now, the same system after reinstalling it in a wider, better vented case, plus new CPU cooler, reaches the following temperatures:
As can be seen at new Psensor image
- CPU temperature peaks at 50 ºC (27 ºC less than before, at the same room temperature)
- Chipset temp peaks at 63 ºC (16 ºC less than before)
- New CPU cooler's bigger fan is turning at 1675 RPM... Half the previous speed, and much quieter

Now, I can go happily to drink a fresh lemonade...
🤗️

Profile ServicEnginIC
Avatar
Send message
Joined: 24 Sep 10
Posts: 581
Credit: 9,770,362,024
RAC: 21,500,013
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 55132 - Posted: 31 Jul 2020 | 0:06:57 UTC

As of this kind rod4x4's suggestion:

A Gigabyte graphics card based on a factory overclocked GTX 750 Ti GPU.
After many years of intesive processing, both of its original fans became defective.
Heading to retirement?
No, if I can evite it...

-1) Let's start by dismounting the fans frame. Now the heatsink is at sight.
-2) By any chance, I had one 60x60x25 12 VDC fan, and another 70x70x25, this last with integral speed sensor. They seem to fit the heatsink very well.
All mounting holes but one coinciding with pass-through fins. So let's arrange a cable tie for all of them (thank you, rod4x4), and a convenient screw for the exception.
-3) Ok, now all fixings are in position. Time to electrical connections for both fans.
-4) Starting by joining negative (black) and positive (red) terminals, and inserting heat-shrink sleeve to all cables.
If both fans had speed sensor terminal (usually yellow), please, don't put them together. Use only one of them and isolate apart the other.
And if different size fans, use the one of the biggest size (lower RPM) fan. Joining the speed sensor terminals from several fans, will interfere signals from each other.
-5) Now we have soldered every terminals with the respective ones in card's fan connector.
Previously, I've pulled backwards along the cables the heat-shrink sleeves. If not, they will shrink with heat coming from soldering.
And if you forgot to insert previously the sleeves... now it is too late (you'll remember this when it happen:-)
-6) This is the final look after shrinking the isolating sleeves with the heat of a gas burner.
-7) And finally, we will arrange conveniently all cables and connect to fan's socket.

Now we can compare this graphics card

Before and After

To test and take profit of this rescued card, I assembled this system.

That's this:

And it seems to be working fine, as can be seen at Psensor readings and finished tasks.
👍️

Profile ServicEnginIC
Avatar
Send message
Joined: 24 Sep 10
Posts: 581
Credit: 9,770,362,024
RAC: 21,500,013
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 55151 - Posted: 3 Aug 2020 | 11:10:03 UTC - in response to Message 55132.
Last modified: 3 Aug 2020 | 11:26:23 UTC

Some corrections to my previous post:

By any chance, I had one 60x60x25 12 VDC fan, and another 70x70x25, this last with integral speed sensor.

For being more precise:
The true dimentions for both fans were 60x60x12 mm and 70x70x15 mm.
And I didn't have those fans "by any chance". I like to have assorted spare fans, because this component is a relatively frequent failing one...
I'll also add a close detail for screw fixing and cable tie fixing.
And two more curiosities:
- I found the mentioned graphics card's invoice. I purchased it on 01/16/2015. Cost (taxes included): 159,00 €
- After about one week of working in its "new life", this card has successfully processed 27 GPUGrid tasks, with no errors so far.

rod4x4
Send message
Joined: 4 Aug 14
Posts: 266
Credit: 2,219,935,054
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 55154 - Posted: 3 Aug 2020 | 23:29:36 UTC - in response to Message 55132.

-1) Let's start by dismounting the fans frame. Now the heatsink is at sight.
-2) By any chance, I had one 60x60x25 12 VDC fan, and another 70x70x25, this last with integral speed sensor. They seem to fit the heatsink very well.
All mounting holes but one coinciding with pass-through fins. So let's arrange a cable tie for all of them (thank you, rod4x4), and a convenient screw for the exception.
-3) Ok, now all fixings are in position. Time to electrical connections for both fans.
-4) Starting by joining negative (black) and positive (red) terminals, and inserting heat-shrink sleeve to all cables.
If both fans had speed sensor terminal (usually yellow), please, don't put them together. Use only one of them and isolate apart the other.
And if different size fans, use the one of the biggest size (lower RPM) fan. Joining the speed sensor terminals from several fans, will interfere signals from each other.
-5) Now we have soldered every terminals with the respective ones in card's fan connector.
Previously, I've pulled backwards along the cables the heat-shrink sleeves. If not, they will shrink with heat coming from soldering.
And if you forgot to insert previously the sleeves... now it is too late (you'll remember this when it happen:-)
-6) This is the final look after shrinking the isolating sleeves with the heat of a gas burner.
-7) And finally, we will arrange conveniently all cables and connect to fan's socket.

Great work and attention to detail.
My Fans are cable tied very roughly, nowhere near as good as your setup.
Another option for powering the fans and bypass the soldering stage (I am very lazy), is to use a free Chassis fan header on the Motherboard and set it to around 50% (depending on environment and fan size) or just use a molex adapter to fan 3 pin with a voltage reducing cable.
Currently running GTX750 and GTX750ti with case fans.

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2343
Credit: 16,206,655,749
RAC: 261,147
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 55182 - Posted: 23 Aug 2020 | 20:59:07 UTC

TGC has solidified on another GPU (RTX 2080Ti this time) in one of my hosts.
It was completely gone from the silicone itself, the markings of the chip left their mirrored print on the heatsink.
I'm suspecting that I put too little amount of TGC on these GPUs, in fear of spilling it on the PCB around the GPU chip (full of SMD capacitors).
It's much easier to put on the TGC for the second time, as it makes the solidified part liquid again, or at least the fresh TGC spreads on it very well without cleaning the surface. If the TGC reacts with the copper of the heatsink (as I suspect), leaving the "used" TGC on its surface may prevent further reaction between the two materials. I'll see, and report.

Profile ServicEnginIC
Avatar
Send message
Joined: 24 Sep 10
Posts: 581
Credit: 9,770,362,024
RAC: 21,500,013
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 55185 - Posted: 24 Aug 2020 | 19:20:11 UTC - in response to Message 55182.

Thank you very much for your feedback

Stacie
Send message
Joined: 29 Mar 20
Posts: 22
Credit: 742,472,371
RAC: 548,722
Level
Lys
Scientific publications
wat
Message 55186 - Posted: 25 Aug 2020 | 10:18:47 UTC - in response to Message 55185.

I read this thread and decided to shut down my computer and blow my heat sinks out with Office Depot Cleaning Duster. *COUGH COUGH COUGH* Large dust cloud!! I guess 2 years is a bit long to wait.
____________

Profile ServicEnginIC
Avatar
Send message
Joined: 24 Sep 10
Posts: 581
Credit: 9,770,362,024
RAC: 21,500,013
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 55187 - Posted: 25 Aug 2020 | 11:00:09 UTC - in response to Message 55186.

I read this thread and decided to shut down my computer and blow my heat sinks out...

I applaud your decision.
Computers are very avid dust-eating animals.
And this can lead to indigestion if not treated on time...

Pop Piasa
Avatar
Send message
Joined: 8 Aug 19
Posts: 252
Credit: 458,054,251
RAC: 0
Level
Gln
Scientific publications
watwat
Message 55188 - Posted: 25 Aug 2020 | 14:43:16 UTC - in response to Message 55182.

Guys, I highly recommend Hydronaut by Thermal Grizzly. It beats both the Noctua and Arctic Silver thermal grease I was using by several degrees C.

Stacie
Send message
Joined: 29 Mar 20
Posts: 22
Credit: 742,472,371
RAC: 548,722
Level
Lys
Scientific publications
wat
Message 55209 - Posted: 31 Aug 2020 | 10:38:41 UTC - in response to Message 55187.

It filled the room, lol. I had to leave for a few minutes.
____________

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 1620
Credit: 8,825,716,430
RAC: 19,538,814
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 55324 - Posted: 21 Sep 2020 | 10:59:21 UTC

May I ask you hardware enthusiasts to double-check my thoughts on running BOINC on the new RTX 3070/3080/3090 range?

I've been studying
NVIDIA A100 Tensor Core GPU Architecture and
NVIDIA Ampere GA102 GPU Architecture

BOINC uses the number of CUDA cores per SM, and a flops multiplier, to estimate the GPU's peak speed. I'm getting that the GA102 (and above, but not the A100) benefit from both an increase from 64 to 128 cores per SM, and the ability to process two FP32 streams concurrently.

So I think that the current v7.16.11 BOINC client will rate the new cards at one-quarter of the flops reported by other tools.

Can anybody confirm that? If it's true, I'll code a patch for the next release of BOINC.

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 1340
Credit: 7,653,273,724
RAC: 13,371,735
Level
Tyr
Scientific publications
watwatwatwatwat
Message 55330 - Posted: 21 Sep 2020 | 16:38:45 UTC - in response to Message 55324.
Last modified: 21 Sep 2020 | 16:40:05 UTC

May I ask you hardware enthusiasts to double-check my thoughts on running BOINC on the new RTX 3070/3080/3090 range?

I've been studying
NVIDIA A100 Tensor Core GPU Architecture and
NVIDIA Ampere GA102 GPU Architecture

BOINC uses the number of CUDA cores per SM, and a flops multiplier, to estimate the GPU's peak speed. I'm getting that the GA102 (and above, but not the A100) benefit from both an increase from 64 to 128 cores per SM, and the ability to process two FP32 streams concurrently.

So I think that the current v7.16.11 BOINC client will rate the new cards at one-quarter of the flops reported by other tools.

Can anybody confirm that? If it's true, I'll code a patch for the next release of BOINC.

I think you are correct Richard. Basically you will have to duplicate your fix for the Pascal to Turing transition for CUDA cores per SM in reverse for the Ampere cards. 128 cores per SM for Ampere and the two concurrent FP32 pipelines.

Would be best for someone with and actual card running and BOINC 7.6.11 running to report what BOINC shows for computed GFLOPS.

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 1620
Credit: 8,825,716,430
RAC: 19,538,814
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 55331 - Posted: 21 Sep 2020 | 17:16:26 UTC - in response to Message 55330.

Would be best for someone with and actual card running and BOINC 7.6.11 running to report what BOINC shows for computed GFLOPS.

Absolutely - yes, please.

Ray Hinchliffe has shown me a SIV report of 29,768 GFlops for an RTX 3080, and of 37,461 GFlops for an RTX 3090 - but those are still calculated values, albeit using a different method. He has a card on delivery, but my local supplier is still awaiting stock - and rationing orders to one per customer in the early days.

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 1340
Credit: 7,653,273,724
RAC: 13,371,735
Level
Tyr
Scientific publications
watwatwatwatwat
Message 55332 - Posted: 21 Sep 2020 | 21:04:41 UTC - in response to Message 55331.

I saw a note that EVGA is expecting "thousands" of 3080 chips into inventory in the future. I am still awaiting the hybrid version to appear on their website.

No bites yet on the OCN forums about anyone running BOINC yet and reporting the calculated GFLOPS in the BOINC startup.

Profile ServicEnginIC
Avatar
Send message
Joined: 24 Sep 10
Posts: 581
Credit: 9,770,362,024
RAC: 21,500,013
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 55365 - Posted: 27 Sep 2020 | 14:26:13 UTC

Let's begin this post with a thermal experiment:

- We're going to start by setting into refrigerator two lemonade/beer jars filled 3/4 with tap water.
- We wait for them to reach 5 ºC
- Then we take one of the lemonade/beer jars, we put it into microwaves oven, and run it full power until water boils (100 ºC)
- So far, everything is ok. (Now, for example, we can prepare an instant soup with that boiling water...)
- WARNING! Please, DON'T follow subsequent steps in this experiment if you appreciate your lemonade/beer jar.
- Now we take the second lemonade/beer jar from refrigerator, empty hot jar, and fill it with water from cold jar... May be nothig happens, but you have a high chance for the hot jar to crack!
(Yes, I confess... I've cracked once a jar after prepairing an instant soup :-)
The thermal trip from 5 ºC to 100 ºC is the same than from 100 ºC to 5 ºC... The difference is in the sudden of the thermal change in the second example.

Now, the relationship of this experiment with high power electronics, as graphics cards GPU chips.
Directly, I'm inserting an image with Psensor graphics.

You can ctrl + click over it to have a full size image opened in a new browser tab.
It is a true image taken from this double GPU system.
I'm explaining this mess of curves and data:
Graphic starts with about 4 minutes of system state while CPU and both GPUs running at 100% usage.
It is followed by about 8 minutes with BOINC activity suspended, thus the whole system working at a residual low % usage.
And finally, BOINC activity is restored, and system has passed from idle to 100 % usage again.
Red and blue graph are associated respectively to GPU 0 Temperature and Fan level.
While GPU 0 temperature is 76 ºC at full load, fan level is running 84%.
When GPU 0 activity is suspended, GPU temperature starts decreasing (red curve), followed also by fan level (blue curve). This is factory configured this way, for achieving a more gradual GPU chip cooling figure (remember preceeding second jar example)
On this particular GTX 1650 graphics card, when GPU chip temperature drops below 45 ºC, fans are completely stopped (0% level) for cooling to be even more gradual.

Long time ago, I decided that playing with overclocking was a very mind-energy consuming activity. Currently I leave the job on manufacturer's hands, and I directly purchase factory overclocked graphics cards.
And my particular self-acquired custom: I prefer leaving factory programmed fan curves untouched.
So far, it is empirically confirmed by ten years of intense GPU processing without any of my squeezed GPUs becoming electrically broken.
I still conserve operative my first crunching graphics card, based on a GTS 450 GPU, and my second one, based on a GTX 650 Ti Boost, both replaced by newer models.

As always, experiences from other users are welcome.
For example, I have no personal background with water cooling. Theoretically, it should be the best way for eviting thermal stress affairs, right?

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 1340
Credit: 7,653,273,724
RAC: 13,371,735
Level
Tyr
Scientific publications
watwatwatwatwat
Message 55368 - Posted: 27 Sep 2020 | 16:25:48 UTC

My cards are all still alive and working after ten years or more. Going all the way back to a GTX460.

Cards never had much thermal stresses in their lifetime. Cards got installed and then immediately run 24/7 at 100% fan speed until they were replaced with the next generation. Then pulled and put on the shelf.

I also always ran a very mild core overclock of 0-40Mhz depending on the card and thermal environment.

I also always ran a significant memory overclock of 400-2000Mhz to compensate for the Nvidia compute penalty on consumer cards depending on the card generation.

Profile ServicEnginIC
Avatar
Send message
Joined: 24 Sep 10
Posts: 581
Credit: 9,770,362,024
RAC: 21,500,013
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 55369 - Posted: 27 Sep 2020 | 18:51:54 UTC - in response to Message 55368.

My cards are all still alive and working after ten years or more. Going all the way back to a GTX460.

That speaks very well about your ability to take care of your hardware...

bozz4science
Send message
Joined: 22 May 20
Posts: 109
Credit: 102,786,176
RAC: 103,187
Level
Cys
Scientific publications
wat
Message 55377 - Posted: 30 Sep 2020 | 10:39:21 UTC - in response to Message 55368.

I don't want to steer the direction this thread is taking off-topic here, but was just curious when reading this:

... significant memory overclock of 400-2000Mhz to compensate for the Nvidia compute penalty on consumer cards ...


Is this really true? I have read this many times now and am starting to wonder, if instead of overclocking (if at all) the core and memory clock at roughly the same rate ~100 MHz (GTX 750 Ti), it would be wiser, to decrease/suspend core clock OC and consider after testing a more substantial memory OC setting. At least I could test this out.

Do you have any personal experience with this issue across various gens that you could share with me?

Thanks

rod4x4
Send message
Joined: 4 Aug 14
Posts: 266
Credit: 2,219,935,054
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 55379 - Posted: 30 Sep 2020 | 12:45:28 UTC - in response to Message 55377.
Last modified: 30 Sep 2020 | 12:46:58 UTC

I don't want to steer the direction this thread is taking off-topic here, but was just curious when reading this:

... significant memory overclock of 400-2000Mhz to compensate for the Nvidia compute penalty on consumer cards ...


Is this really true? I have read this many times now and am starting to wonder, if instead of overclocking (if at all) the core and memory clock at roughly the same rate ~100 MHz (GTX 750 Ti), it would be wiser, to decrease/suspend core clock OC and consider after testing a more substantial memory OC setting. At least I could test this out.

Do you have any personal experience with this issue across various gens that you could share with me?

Thanks

This is true for Pascal and Turing cards that are limited to Performance level P2 when compute functions are detected. The above quoted memory clocking takes the Performance level back to P0 levels.
Maxwell cards (gtx 750ti) can benefit from memory overclocking but are not Performance limited like the Pascal and Turing cousins when computing. Maxwell cards do go to Performance level P0 without overclocking the memory.

rod4x4
Send message
Joined: 4 Aug 14
Posts: 266
Credit: 2,219,935,054
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 55380 - Posted: 30 Sep 2020 | 12:50:13 UTC - in response to Message 55379.
Last modified: 30 Sep 2020 | 13:01:53 UTC

...

Ian&Steve C.
Avatar
Send message
Joined: 21 Feb 20
Posts: 1069
Credit: 40,231,533,983
RAC: 527
Level
Trp
Scientific publications
wat
Message 55381 - Posted: 30 Sep 2020 | 13:48:25 UTC - in response to Message 55379.
Last modified: 30 Sep 2020 | 13:49:30 UTC

I don't want to steer the direction this thread is taking off-topic here, but was just curious when reading this:

... significant memory overclock of 400-2000Mhz to compensate for the Nvidia compute penalty on consumer cards ...


Is this really true? I have read this many times now and am starting to wonder, if instead of overclocking (if at all) the core and memory clock at roughly the same rate ~100 MHz (GTX 750 Ti), it would be wiser, to decrease/suspend core clock OC and consider after testing a more substantial memory OC setting. At least I could test this out.

Do you have any personal experience with this issue across various gens that you could share with me?

Thanks

This is true for Pascal and Turing cards that are limited to Performance level P2 when compute functions are detected. The above quoted memory clocking takes the Performance level back to P0 levels.
Maxwell cards (gtx 750ti) can benefit from memory overclocking but are not Performance limited like the Pascal and Turing cousins when computing. Maxwell cards do go to Performance level P0 without overclocking the memory.


yup, but it's only the memory clock that is affected. and some cards even in the Pascal and Turing generations aren't affected, but only the low end models like the 1050ti and the 1650. these cards seem to have no penalty.

overclocking the memory is easy in the P2 state, and it brings performance right back to where it would be otherwise.
____________

Profile ServicEnginIC
Avatar
Send message
Joined: 24 Sep 10
Posts: 581
Credit: 9,770,362,024
RAC: 21,500,013
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 55385 - Posted: 30 Sep 2020 | 15:33:38 UTC

Interesting topic.

This was also discussed on this Keith Myers post and successive related ones.
From then, I've checked that my last purchased GTX 1650 SUPER GPU is also affected by this policy. It is downgraded to P2 performance level while processing GPUGrid.

bozz4science
Send message
Joined: 22 May 20
Posts: 109
Credit: 102,786,176
RAC: 103,187
Level
Cys
Scientific publications
wat
Message 55398 - Posted: 30 Sep 2020 | 21:17:49 UTC

Thanks for your insightful answer! very much appreciated

Pop Piasa
Avatar
Send message
Joined: 8 Aug 19
Posts: 252
Credit: 458,054,251
RAC: 0
Level
Gln
Scientific publications
watwat
Message 55615 - Posted: 17 Oct 2020 | 22:05:11 UTC

0_6-GERARD_pocket_discovery_f0a0d98e_6ca4_446d_b600_a00239226478-2-3-RND5078

initial replication 2


This is a great chance to see 2 GPUs compared running identical tasks. Be sure to check the other GPU's time and get a better perspective of how yours compares.

Profile ServicEnginIC
Avatar
Send message
Joined: 24 Sep 10
Posts: 581
Credit: 9,770,362,024
RAC: 21,500,013
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 55630 - Posted: 25 Oct 2020 | 21:51:55 UTC
Last modified: 25 Oct 2020 | 22:01:33 UTC

Hardware Microcosmos

I've got this 1,3 MegaPixel USB microscope, and I thought that it would be funny to watch computers hardware from other perspective...
Lets take a look, and judge for yourself.

All images are taken at this microscope native resolution: 1280x1024 pixels.
Illumination is self provided by four high intensity perimetral white LEDs.

*1) First image is taken from a close watch to drivers CD included with this microscope.
It can be seen CD's internal border, with a fragment of identifying text, external surface with microscopic scratches and dust, and internal data surface.
It is curious how lanes in reflective material act as prisms, diffracting incident white light into its different component wavelenths red, green, blue...

*2) Now we are seeing a black text cursor on white background, in a LED monitor.
It is a magnified sight of the Red, Green and Blue (RGB) led matrix, where each pixel is composed of a set of this three primary colours.
It is just the opposite than in image number one: white colour is achieved by lighting all three RGB LEDs (Light Emitting Diodes) at their full intensity.
All other colours in the palette are achieved by different blends of intensities for each primary colour on each pixel.

*3) This is the magnified image of a read/write electromagnetic head on an old mechanical Hard Disk Drive (and its specular reflection on the polished media surface)

*4) If you ask me for one element ever present inside a 24/7 working home computer, surely my answer would be: "Dust". This image is a magnified look of the rotor and dusty blade of a computer's fan.

*5) This is how looks like the shiny polished surface of a GPU chip. It was taken from a retired NVIDIA GeForce GT 640 graphics card.
Grooves left by the polishing bur can be appreciated, and... yes, ubiquitous adhered dust.

*6) Here we're watching an old Intel Pentium 4 CPU "gold plated pins forest" and its matching PGA478 socket.

*7) And coming from previous images, an evolution for CPU layout: This Intel CORE 2 QUAD processor with its round gold plated contacts array and its matching LGA775 socket contacting pins.

*8) Continuing with contacs matter, here is a close sight of gold plated contacts on a graphics card PCIE bus, and its matching PCIE x16 (dusty!!!) socket.
In the first image, base copper can be seen extending beyond gold plating at the end of each bus contact.
Gold plating over copper is used for effectively lowering contact resistance and extending lifespan due to a much lower oxidation rate.

*9) Now, in relationship with graphics cards, an image of male PCIE power connector, and its matching Female connector
In this kind of connectors, tin plating can be seen at contacts.

10) And finally, here is the magnified image for a burnt driver chip on a computer's PSU.
It literally exploded at the moment of switching PSU on. Not recommended for sensible hearts ;-)


My favorite image? Probably This
And yours?

sph
Send message
Joined: 22 Oct 20
Posts: 4
Credit: 34,434,982
RAC: 0
Level
Val
Scientific publications
wat
Message 55631 - Posted: 25 Oct 2020 | 23:18:54 UTC - in response to Message 55630.

Hardware Microcosmos


A different perspective of PC and it's parts. Nice journey.

My favourite picture was the "cursor on white background". Not what most people would expect.

Stacie
Send message
Joined: 29 Mar 20
Posts: 22
Credit: 742,472,371
RAC: 548,722
Level
Lys
Scientific publications
wat
Message 55633 - Posted: 26 Oct 2020 | 5:47:56 UTC

Maybe someone here can help me? I have 2 GTX 1070 GPUs in my Core I7 desktop, device 0 is running, device 1 is inactive. There is plenty of work in my BOINC cue but no applications running on the second GPU, lights lit but fans not running. Does anyone know how I might wake it up or figure out what is wrong? It has done this before and then eventually started running again without any help from me. When it runs it only runs one application, the other one runs several. Thanks-
____________

Profile ServicEnginIC
Avatar
Send message
Joined: 24 Sep 10
Posts: 581
Credit: 9,770,362,024
RAC: 21,500,013
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 55634 - Posted: 26 Oct 2020 | 6:40:42 UTC - in response to Message 55633.

no applications running on the second GPU, lights lit but fans not running

Depending on the graphics card model, this behavior might be normal: Some models only activate fans when GPU gets hotter than certain temperature, not reached if your GPU is iddle.
Try editing your cc_config.xml file, usually located at C:\ProgramData\BOINC directory, and add in <options> section the following line:

<cc_config>
<options>
<report_results_immediately>1</report_results_immediately>
<use_all_gpus>1</use_all_gpus>
<http_1_0>1</http_1_0>
</options>
</cc_config>

Save the changes, reboot computer, and see if this helps...

Stacie
Send message
Joined: 29 Mar 20
Posts: 22
Credit: 742,472,371
RAC: 548,722
Level
Lys
Scientific publications
wat
Message 55636 - Posted: 26 Oct 2020 | 9:47:29 UTC - in response to Message 55634.

Will do, thanks! I played a few rounds of World of Warships and now it is running again. I don't know if it had anything to do with it but it seems like I can eliminate the possibility of a hardware problem if it runs part of the time.
____________

Stacie
Send message
Joined: 29 Mar 20
Posts: 22
Credit: 742,472,371
RAC: 548,722
Level
Lys
Scientific publications
wat
Message 55637 - Posted: 26 Oct 2020 | 10:09:40 UTC - in response to Message 55634.

System is Windows 10. I don't see a Program Data folder in C, I see Program Files. Inside Program Files is a BOINC folder. Inside it is boinc, boinccmd, boincscr, boincmgr, boincsvcctrl, and boinctray. When I double-click boinc it opens a DOS window that says cc_config.xml not found - using defaults. I don't know where to enter the command.
____________

Profile ServicEnginIC
Avatar
Send message
Joined: 24 Sep 10
Posts: 581
Credit: 9,770,362,024
RAC: 21,500,013
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 55638 - Posted: 26 Oct 2020 | 10:54:01 UTC - in response to Message 55637.

System is Windows 10. I don't see a Program Data folder in C

You're right.
ProgramData folder is hidden by default in Windows 10.
Try searching in Search option for "folder", select "Folder Options", then "View" tab, and check "Show hidden files and folders" option.
Then ProgramData folder will become visible.

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2343
Credit: 16,206,655,749
RAC: 261,147
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 55639 - Posted: 26 Oct 2020 | 13:28:31 UTC - in response to Message 55638.
Last modified: 26 Oct 2020 | 13:30:51 UTC

System is Windows 10. I don't see a Program Data folder in C

You're right.
ProgramData folder is hidden by default in Windows 10.
Try searching in Search option for "folder", select "Folder Options", then "View" tab, and check "Show hidden files and folders" option.
Then ProgramData folder will become visible.
There's no need for that. Hidden directories can be accessed if you know the exact path of that given folder.
So
1. mark this: C:\ProgramData\BOINC and press <CTRL+C> (it copies the marked text to the clipboard)
1. press <windows key + E> (the Windows explorer is opened)
3. click on the address bar field and press <CTRL+V> (the C:\ProgramData\BOINC text should appear there)
4. press <ENTER>
5. now you should see the cc_config.xml file, right click on it, and select "edit" from the context menu.

Profile ServicEnginIC
Avatar
Send message
Joined: 24 Sep 10
Posts: 581
Credit: 9,770,362,024
RAC: 21,500,013
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 55640 - Posted: 26 Oct 2020 | 16:53:43 UTC - in response to Message 55639.

1. mark this: C:\ProgramData\BOINC and press <CTRL+C> (it copies the marked text to the clipboard)
1. press <windows key + E> (the Windows explorer is opened)
3. click on the address bar field and press <CTRL+V> (the C:\ProgramData\BOINC text should appear there)
4. press <ENTER>
5. now you should see the cc_config.xml file, right click on it, and select "edit" from the context menu.

Nice, elegant way to do the job 👍️

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 1620
Credit: 8,825,716,430
RAC: 19,538,814
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 55641 - Posted: 26 Oct 2020 | 18:32:18 UTC - in response to Message 55640.

Nice, elegant way to do the job 👍️

Once.

If you think you might ever want to do this again, it's better to go the folder properties route - then, you don't have to find an example to copy. It'll always be visible.

Profile ServicEnginIC
Avatar
Send message
Joined: 24 Sep 10
Posts: 581
Credit: 9,770,362,024
RAC: 21,500,013
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 55719 - Posted: 10 Nov 2020 | 22:03:57 UTC

Suddenly, a few days ago, on November 5th 2020, grew nine (9) twin hosts like this one #566744, owned by an anonymous user.
I discovered today while looking at Hosts ranking , where these systems have quickly arrived to first page.
I've got completely astonished.
Starting by processor, based on Intel(R) Xeon(R) Gold 6248R CPU @ 3.00GHz
This is an Intel Xeon processor with 24 cores / 48 threads. As 96 processors are shown at host characteristics, this must be based on a bi-processor mainboard.
Following by memory: 361762,6 MB are shown. Rounding to the next standard value, and being this processor capable of a 6-channel memory distribution, I'd bet for 6 x 64 or 12 x 32 GB DDR4 memory modules = 384 GB RAM
And arriving to 10 x NVIDIA Quadro RTX 6000 graphics cards... Specifications.
Incredible.
Processor TDP: 205W x2 = 410W ; GPU TDP: 295W x10 = 2950W. Adding peripherals power, let's say 3500W at full load for every of these 9 hosts (31,5 KW in total???)
That far exceeds my imagination.
And let's not talk about the economic cost for building such awesome systems (!!!)

Ian&Steve C.
Avatar
Send message
Joined: 21 Feb 20
Posts: 1069
Credit: 40,231,533,983
RAC: 527
Level
Trp
Scientific publications
wat
Message 55721 - Posted: 10 Nov 2020 | 22:40:21 UTC - in response to Message 55719.

Suddenly, a few days ago, on November 5th 2020, grew nine (9) twin hosts like this one #566744, owned by an anonymous user.
I discovered today while looking at Hosts ranking , where these systems have quickly arrived to first page.
I've got completely astonished.
Starting by processor, based on Intel(R) Xeon(R) Gold 6248R CPU @ 3.00GHz
This is an Intel Xeon processor with 24 cores / 48 threads. As 96 processors are shown at host characteristics, this must be based on a bi-processor mainboard.
Following by memory: 361762,6 MB are shown. Rounding to the next standard value, and being this processor capable of a 6-channel memory distribution, I'd bet for 6 x 64 or 12 x 32 GB DDR4 memory modules = 384 GB RAM
And arriving to 10 x NVIDIA Quadro RTX 6000 graphics cards... Specifications.
Incredible.
Processor TDP: 205W x2 = 410W ; GPU TDP: 295W x10 = 2950W. Adding peripherals power, let's say 3500W at full load for every of these 9 hosts (31,5 KW in total???)
That far exceeds my imagination.
And let's not talk about the economic cost for building such awesome systems (!!!)


I usually keep a close eye on my systems, as well as others just for curiosity and keeping an eye on the competition :P So I noticed when they showed up. They belong to Syracuse University, it wasn't hard to figure out which systems belong to them even if they have it hidden. here's one of them: https://stats.free-dc.org/host/ps3/566770

they showed up with 9 systems, each containing 10x Quadro RTX 6000 GPUs. these are a little slower than a 2080ti. I was curious how they got so many GPUs in a single system as I was sure a university wouldn't be making custom builds like this to house in mining racks like i do, but I also wasn't aware of any servers that supported 10x GPUs (most stop at 8). then I came across this: https://www.servethehome.com/dell-emc-dss-8440-10x-gpu-4u-server-launched/

and with 9 or 10 of these, upwards of 100x GPUs in a whole server rack. just imagine that cost LOL. I'm sure they don't plan to just use these for DC projects and just run them on BOINC for load testing and/or in their downtime. Probably doing some cool AI/ML or other compute heavy research over there.
____________

bozz4science
Send message
Joined: 22 May 20
Posts: 109
Credit: 102,786,176
RAC: 103,187
Level
Cys
Scientific publications
wat
Message 55728 - Posted: 12 Nov 2020 | 13:11:32 UTC

Very impressive indeed. Couldn't have imagined that they belong to a private individual... Lol. Just imagining 100 GPUs in a server rack... Besides the incredible electricity consumption, just trying to fathom the noise level, the requirement for a superior cooling solution etc. Needless to say: that's a beast!
But even if it would just be for load testing, I am glad they don't waste their computational resources for only running stupid benchmarks but are supporting science instead in the meantime.

Pop Piasa
Avatar
Send message
Joined: 8 Aug 19
Posts: 252
Credit: 458,054,251
RAC: 0
Level
Gln
Scientific publications
watwat
Message 55786 - Posted: 19 Nov 2020 | 17:24:10 UTC - in response to Message 55721.

Fascinating, I just ran a dual GERARD task with the anonymous donor and the computer was this:

Owner Anonymous
Created 5 Nov 2020 | 2:27:17 UTC
Total credit 70,972,050
Average credit 3,777,006.94
CPU type GenuineIntel
Intel(R) Xeon(R) Gold 6248R CPU @ 3.00GHz [Family 6 Model 85 Stepping 7]
Number of processors 96
Coprocessors [10] NVIDIA Quadro RTX 6000 (4095MB) driver: 450.80
Operating System Linux Ubuntu
Ubuntu 20.04.1 LTS [5.4.0-52-generic|libc 2.31 (Ubuntu GLIBC 2.31-0ubuntu9)]
BOINC version 7.16.6
Memory 361794.6 MB
Cache 36608 KB
Measured floating point speed 1000 million ops/sec
Measured integer speed 1000 million ops/sec


It completed in 5,383.87 seconds compared to my GTX 1650 needing 24,258.18 seconds to complete.

I'd love to know what motherboard it is and if that was actually a Quadro RTX 6000, or a lesser GPU among the ten. Here is where the BOINC Manager needs a tweak to make it individualize multiple GPUs and their stats.

Ian&Steve C.
Avatar
Send message
Joined: 21 Feb 20
Posts: 1069
Credit: 40,231,533,983
RAC: 527
Level
Trp
Scientific publications
wat
Message 55790 - Posted: 19 Nov 2020 | 19:14:54 UTC - in response to Message 55786.

Fascinating, I just ran a dual GERARD task with the anonymous donor and the computer was this:


It completed in 5,383.87 seconds compared to my GTX 1650 needing 24,258.18 seconds to complete.

I'd love to know what motherboard it is and if that was actually a Quadro RTX 6000, or a lesser GPU among the ten. Here is where the BOINC Manager needs a tweak to make it individualize multiple GPUs and their stats.


check my reply 2 posts up. it's likely that dell system I linked to. using risers or daughterboards to connect the GPU (fore) to the system (aft).

it's possible that they are different GPUs in the system, since we really cant know for sure due to the way BOINC reports coprocessors. but I'm 99.9999% sure that system belongs to Syracuse University, and given the level of hardware, and the customer, it's likely they bought this solution complete from Dell with matching hardware.

the RTX 6000 performs closely to a 2080ti so it's no surprise that it's almost 5x faster than a 1650
____________

Pop Piasa
Avatar
Send message
Joined: 8 Aug 19
Posts: 252
Credit: 458,054,251
RAC: 0
Level
Gln
Scientific publications
watwat
Message 55801 - Posted: 19 Nov 2020 | 21:53:25 UTC

Here's what I wish for this Christmas but only the Pope can afford...

https://www.supermicro.com/en/products/system/10U/9029/SYS-9029GP-TNVRT.cfm

An entire team worth of crunching in one machine.
Comes with it's own utility substation (sarc).

Pop Piasa
Avatar
Send message
Joined: 8 Aug 19
Posts: 252
Credit: 458,054,251
RAC: 0
Level
Gln
Scientific publications
watwat
Message 55803 - Posted: 20 Nov 2020 | 1:32:25 UTC - in response to Message 55801.
Last modified: 20 Nov 2020 | 2:08:00 UTC

Here's what I wish for this Christmas but only the Pope can afford...

https://www.supermicro.com/en/products/system/10U/9029/SYS-9029GP-TNVRT.cfm

An entire team worth of crunching in one machine.
Comes with it's own utility substation (sarc).


With 16 Tesla V100 SXM3s, I figure it could knock down close to 20 million a day.

At 350W apiece, I'd probably need an extra ton of A/C in the summer and ducting to the rack. In the winter it would provide ample heat to lower my Propane use.
----------------
the RTX 6000 performs closely to a 2080ti so it's no surprise that it's almost 5x faster than a 1650


I think that would put the Dell server in the neighborhood of 11 million a day in credit. Awesome.

Profile ServicEnginIC
Avatar
Send message
Joined: 24 Sep 10
Posts: 581
Credit: 9,770,362,024
RAC: 21,500,013
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 55804 - Posted: 21 Nov 2020 | 15:28:18 UTC - in response to Message 55721.
Last modified: 21 Nov 2020 | 15:39:36 UTC

they showed up with 9 systems, each containing 10x Quadro RTX 6000 GPUs. these are a little slower than a 2080ti. I was curious how they got so many GPUs in a single system as I was sure a university wouldn't be making custom builds like this to house in mining racks like i do, but I also wasn't aware of any servers that supported 10x GPUs (most stop at 8). then I came across this: https://www.servethehome.com/dell-emc-dss-8440-10x-gpu-4u-server-launched/

What I have ever found admirable is how smoothly Linux OS seems to manage these kind of massive Multi CPU/GPU systems.

A curious detail: Navigating GPUGrid hosts list I found this host #566140.
Comparing it to this other host #566749:

* Host #566140
- Owner: Anonymous
- CPU: Intel(R) Xeon(R) Gold 6248R CPU @ 3.00GHz [Family 6 Model 85 Stepping 7]
- Processors amount: 24
- Coprocessors: [5] NVIDIA Quadro RTX 6000 (4095MB) driver: 452.57
- RAM: 31999.55 MB
- OS: Microsoft Windows 10 Enterprise x64 Edition, (10.00.18362.00)
- Current RAC at GPUGrid (2020/11/21 14:55 UTC): 169,764.77
- Current host position by RAC at GPUGrid: 925

* Host #566749
- Owner: Anonymous
- CPU: Intel(R) Xeon(R) Gold 6248R CPU @ 3.00GHz [Family 6 Model 85 Stepping 7]
- Processors amount: 96
- Coprocessors: [10] NVIDIA Quadro RTX 6000 (4095MB) driver: 450.80
- RAM: 361856.6 MB
- OS: Linux Ubuntu 20.04.1 LTS [5.4.0-52-generic|libc 2.31 (Ubuntu GLIBC 2.31-0ubuntu9)]
- Current RAC at GPUGrid (2020/11/21 14:55 UTC): 4,004,328.46
- Current host position by RAC at GPUGrid: 4

I leave for everyone's homework to take conclusions...

Finally, as a hardware enthusiast, I don't want to miss the opportunity to recommend this interesting Ian&Steve C. thread: 8 GPUs on a motherboard with 7 PCIe slots: Bifurcation

Ian&Steve C.
Avatar
Send message
Joined: 21 Feb 20
Posts: 1069
Credit: 40,231,533,983
RAC: 527
Level
Trp
Scientific publications
wat
Message 55805 - Posted: 21 Nov 2020 | 16:09:56 UTC - in response to Message 55804.

I imagine it’s the same physical system. They probably are running virtualization or doing some other form of resource partitioning and allocation. And probably found out what a lot of us know already, that the apps are just faster under Linux. About 15-20% faster.

If you pick the right hardware setup, Linux and multi-GPU is very stable.
____________

Pop Piasa
Avatar
Send message
Joined: 8 Aug 19
Posts: 252
Credit: 458,054,251
RAC: 0
Level
Gln
Scientific publications
watwat
Message 55806 - Posted: 22 Nov 2020 | 4:04:48 UTC - in response to Message 55805.

...apps are just faster under Linux. About 15-20% faster.


That's for sure. Windows is all about itself anymore. Users are all assimilated into the Borg. They insist on all users running umpteen million largely useless processes which sap the machine progressively until you break down and buy a faster one.

Here's my own experience on WU 0_1-GERARD_pocket_discovery_aca700f5_8d26_46c9_bce3_baf63237f164-0-2-RND7116_1

Operating System Linux Ubuntu
Ubuntu 20.04.1 LTS [5.4.0-52-generic|libc 2.31 (Ubuntu GLIBC 2.31-0ubuntu9)]
Run time 5,383.87
CPU time 5,378.54
vs.
Operating System Microsoft Windows 10
Professional x64 Edition, (10.00.19041.00)
Run time 24,258.18
CPU time 24,095.42

Those stats appear to support your statement very well, Ian. My i5-10400 runs at only 70% usage (6 threads of WCG along with ACEMD using 2 GPUs) so bandwidth is not an issue in my comparison. Unless I missed something, the Ubuntu OS only wasted around 5 seconds in a 90 minute run and my Win10 OS wasted 163 seconds during a run lasting 6 hrs and 45 mins.
That's around 24 seconds lost per hour for windows and only 3 or 4 seconds per hour lost under Linux.
It also looks pretty consistent as I peruse the stats, so I don't think this example is an outlier.

A better OS that's free is well worth the effort of learning to use it.

Pop Piasa
Avatar
Send message
Joined: 8 Aug 19
Posts: 252
Credit: 458,054,251
RAC: 0
Level
Gln
Scientific publications
watwat
Message 55807 - Posted: 23 Nov 2020 | 0:17:36 UTC

Viewing the volunteers page I noticed that Anonymous has verified Ian's discovery that they are really Syracuse U. I hope we can engage them in conversation. I'd love to know more about their endeavors.

Profile ServicEnginIC
Avatar
Send message
Joined: 24 Sep 10
Posts: 581
Credit: 9,770,362,024
RAC: 21,500,013
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 55808 - Posted: 24 Nov 2020 | 21:19:10 UTC
Last modified: 24 Nov 2020 | 21:20:57 UTC

PWM fan fireworks

On a routinary temperature check, I noticed an unusual temperature and fan% rising at the graphics card on this host.
I dismounted chassis side panel, and I found what can be seen at this video.
One of the graphics card fans had completely stopped, and the other fan was running at high speed.
Both fans are the same model: 95 mm diameter, 13 blades, PWM (Pulse Width Modulation) speed controlled.
I thought it would be difficult to find replacement for this kind of special fans, but I searched at a known provider by the reference at its back label "FDC10M12S9-C", and several alternatives were shown.
Finally, I purchased two units of this model, and I received the ones pictured at this image.

First question: How is it possible that it goes (almost) unnoticed a completely stopped fan?
It can be better understood by watching the fans layout in this particular graphics card.
Previous image is a close-up of intermediate fan connectors at the back of the fans mounting frame. Both are 4 pin male connectors, but third pin is missing in one of them.
As can be seen on PWM fan four pins connector layout, missing pin is the speed sensing one: both fans are in parallel, but only is used the speed sensing signal coming from one of them. This is a very common policy in many multi-fan graphics cards.
Non sensed fan was the failing one, thus not being detected that it had stopped.
The only clue was the observed temperature increase on GPU, and fan% raising at working fan trying to compensate it...

Ok, lets go.
Now the graphics card is on-table, along with two spare fans. I'm changing both, for them to be well paired each other.
Fans mounting frame can be detached from heatsink by grasping four fixing latches, two each side.
Once mounting frame is loose, intermediate male-female connectors have to be detached. Better than pulling the cables, I prefer to insert the blade of a small flat screwdriver at the slit in between and pry them apart.
Each fan is attached to mounting frame by screws at 5 fixing places.
A proper magnetized-head screwdriver will help now, and damaged fan is loose.
Better try to not drop these small screws. They tend to hide at unimaginable places...
After repeating this with the second fan and reassembling in reverse order, now the new fans are assembled at the frame.
And replacing the frame to its original position, the graphics card is repaired.

Once repaired graphics card is mounted at mainboard again, original thermal behavior is restored, as seen in Psensor readings.
Second question: What's that spike-like sudden drop seen at GPU temperature graphic?
It is exactly the transition between an Asteroids@home task and a GPUGrid task.
Asteroids@home tasks are less power demanding than GPUGrid ones, and that's why the observed plateau temperature difference is due, while the spike is the transient near-zero power consumption between the two consecutive tasks.

Third question: What happened to defective fan for it to stop?
Defective fan forensics
Careful observers may have noticed a small burn at fan's plastified label.
A close microscopic sight can confirm this.
And after removing the label, the failing component is found to be the fan driver chip.
Here is a close up of the chip's surface, and here another at printed circuit board level.

Tip: When some of these small componentes shortcircuits, there is a great amount of current available at +12 VDC rail for it to get roasted...
🔥

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 1340
Credit: 7,653,273,724
RAC: 13,371,735
Level
Tyr
Scientific publications
watwatwatwatwat
Message 55809 - Posted: 24 Nov 2020 | 23:41:04 UTC

Great repair summary. I have a EVGA 2070 that has a failed/failing fan that most of the time won't spin up. Sometimes I get lucky on a reboot and it will spin up and run until rebooted again.

I figure it has something to do with the driver chip also.

Card is under warranty but I have been reluctant to RMA it and lose production. I moved the card from the middle of the stack to the outside of the stack where it can breathe better. Card runs about 10 degrees warmer than it does when both fans are spinning. And I lose about 100 Mhz of clocks due to downclocking.

Will have to do something about it eventually, probably when the warranty is about to run out or I decide to upgrade it with something better.

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2343
Credit: 16,206,655,749
RAC: 261,147
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 55810 - Posted: 25 Nov 2020 | 0:15:02 UTC - in response to Message 55808.

The magic smoke makes ICs to work. If it is ever released, the IC won't work anymore.

rod4x4
Send message
Joined: 4 Aug 14
Posts: 266
Credit: 2,219,935,054
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 55812 - Posted: 25 Nov 2020 | 0:55:14 UTC - in response to Message 55810.

The magic smoke makes ICs to work. If it is ever released, the IC won't work anymore.

That is gold!

ServicEnginIC, your microscope takes excellent pictures!

Ian&Steve C.
Avatar
Send message
Joined: 21 Feb 20
Posts: 1069
Credit: 40,231,533,983
RAC: 527
Level
Trp
Scientific publications
wat
Message 55813 - Posted: 25 Nov 2020 | 17:31:26 UTC - in response to Message 55808.
Last modified: 25 Nov 2020 | 17:33:29 UTC

probably stemmed from all the corrosion on pin#3.

what's the humidity like where this system runs?
____________

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 1340
Credit: 7,653,273,724
RAC: 13,371,735
Level
Tyr
Scientific publications
watwatwatwatwat
Message 55814 - Posted: 25 Nov 2020 | 19:56:27 UTC

That's not corrosion. That is the pin and trace burned up. The IC died not just internally.

Profile ServicEnginIC
Avatar
Send message
Joined: 24 Sep 10
Posts: 581
Credit: 9,770,362,024
RAC: 21,500,013
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 55815 - Posted: 25 Nov 2020 | 20:21:47 UTC - in response to Message 55813.

I live in an inner, mid heigth (580 m above sea level) zone, in Canary Island of Tenerife.
Humidity is usually in the range 40 - 60 % RH most of the year.
During winter, we have to use a dehumidifier in the living room to make it habitable...
Really, not a good environment for electronics.
But this time I agree Keith Myers, for the cause being an internal shortcircuit creating temperatures high enough to Magic Smoke be liberated (I liked it !-) and some nearby PCB lanes to become burnt.

Pop Piasa
Avatar
Send message
Joined: 8 Aug 19
Posts: 252
Credit: 458,054,251
RAC: 0
Level
Gln
Scientific publications
watwat
Message 55816 - Posted: 25 Nov 2020 | 21:50:32 UTC - in response to Message 55809.

... under warranty but I have been reluctant to RMA it and lose production.

I have the same issue with a MSI Z490-A PRO motherboard. When I load the memory properly or load all the slots, it won't boot. when I just use slots 3 and 4 it runs normally and even though it reports single channel memory it benchmarks well. It has the latest BIOS version and all MSI will do is tell me they'll give me an RMA#.

Next machine will be Asrock/evga, I'm thinkin.

Pop Piasa
Avatar
Send message
Joined: 8 Aug 19
Posts: 252
Credit: 458,054,251
RAC: 0
Level
Gln
Scientific publications
watwat
Message 55817 - Posted: 25 Nov 2020 | 22:09:11 UTC - in response to Message 55814.

That is the pin and trace burned up. The IC died not just internally.

Ouch, Keith.
Would an aftermarket liquid cooling system work?

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 1340
Credit: 7,653,273,724
RAC: 13,371,735
Level
Tyr
Scientific publications
watwatwatwatwat
Message 55820 - Posted: 26 Nov 2020 | 0:35:15 UTC - in response to Message 55816.

... under warranty but I have been reluctant to RMA it and lose production.

I have the same issue with a MSI Z490-A PRO motherboard. When I load the memory properly or load all the slots, it won't boot. when I just use slots 3 and 4 it runs normally and even though it reports single channel memory it benchmarks well. It has the latest BIOS version and all MSI will do is tell me they'll give me an RMA#.

Next machine will be Asrock/evga, I'm thinkin.

Anytime you are dealing with a LGA socket and missing memory channels, first thing to do is pull the cpu, examine the socket pins for any pins on the outside rows that are out of alignment and then reseat the cpu and wiggle it a bit in the socket before clamping down the retention mechanism.

Reboot and see if the missing memory channels show up. The alignment of the pad to pin is fairly critical. On the order of 40 microns C-C.

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 1340
Credit: 7,653,273,724
RAC: 13,371,735
Level
Tyr
Scientific publications
watwatwatwatwat
Message 55821 - Posted: 26 Nov 2020 | 1:41:46 UTC - in response to Message 55817.

That is the pin and trace burned up. The IC died not just internally.

Ouch, Keith.
Would an aftermarket liquid cooling system work?

Not really, the fan controller died. The card still works quite well, just at a higher temp than it would if both fans ran all the time.

I always run my gpu fans at 100% all the time. I normally use EVGA Hybrid gpus exclusively but in this host I made my first attempt at custom gpu water cooling with a 1080 Ti and 2080 water blocked.

The issue is that I can't really fit the usual hybrid card between the two custom blocked cards because the hybrid hoses occupy the same location as the bridge between the two custom cards.

So I ended up using standard air cooled cards for the middle card between the two custom blocks. That card gets hot because their is no room to breathe. It runs both its fans all the time with no issue. But the one card that has an intermittent fan did not cut it in that location. So it got moved to the outside of the stack next the side panel and it cools fairly well even with just one fan running most of the time.

I have plenty of gpus I can substitute in that host, just not of the same caliber as the 2070. I never could figure out where to mount a hybrid card in that location because the hoses are not long enough to reach where I actually could mount a hybrids radiator. The roof of the case is occupied by two 360mm radiators for the two custom loops.

bozz4science
Send message
Joined: 22 May 20
Posts: 109
Credit: 102,786,176
RAC: 103,187
Level
Cys
Scientific publications
wat
Message 55829 - Posted: 26 Nov 2020 | 17:10:51 UTC
Last modified: 26 Nov 2020 | 17:20:34 UTC

I would be curious in your opinion about the storage requirement for a dual boot system. Currently, I plan to load Win 10 and ubuntu 20.04 LTS. Should a 120GB SSD suffice for this purpose? Or rather 250GB? Didn't mean to interrupt the ongoing discussion here, but never seemed to find the right moment to ask this question.

Profile ServicEnginIC
Avatar
Send message
Joined: 24 Sep 10
Posts: 581
Credit: 9,770,362,024
RAC: 21,500,013
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 55830 - Posted: 26 Nov 2020 | 18:03:24 UTC - in response to Message 55829.

My personal preference for dual boot systems is to install each operating system at its own independent drive, and use Linux GRUB to select which OS is starting at every boot.
For Linux, a 120 GB SSD is currently enough, if used only for BOINC. For Windows, perhaps better 240 GB rather than 120 GB.
I set Linux as preferred OS by default, because it is not necessary to log in after a reboot for BOINC to start processing. This way, if there is a power outage while I'm not present, processing is restored on Linux host as soon as power comes back.
Advantages for using two disks:
- If one of them fails, you'll always have the other one left for keep processing going on.
- If you get new hardware available, you can detach (for example) Linux SSD and attach it to new hardware, and now you'll have in a moment two concurrent systems processing instead only one.
Cons for using two disks:
- Two SSD drives cost double than only one...

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 1340
Credit: 7,653,273,724
RAC: 13,371,735
Level
Tyr
Scientific publications
watwatwatwatwat
Message 55831 - Posted: 26 Nov 2020 | 18:15:23 UTC - in response to Message 55830.

+100

Sound advice and what I recommend also.

Pop Piasa
Avatar
Send message
Joined: 8 Aug 19
Posts: 252
Credit: 458,054,251
RAC: 0
Level
Gln
Scientific publications
watwat
Message 55832 - Posted: 26 Nov 2020 | 18:52:34 UTC - in response to Message 55820.

Anytime you are dealing with a LGA socket and missing memory channels, first thing to do is pull the cpu, examine the socket pins for any pins on the outside rows that are out of alignment and then reseat the cpu and wiggle it a bit in the socket before clamping down the retention mechanism.



Thanks Keith, I'll give that another try the next time we run out of work. The first time I did that I only checked for bent pins like the MSI support auto-reply states. Socket 1200 must have very tiny pins because I really didn't see them. Next time I'll use loops to examine things. I really appreciate your learned advice.

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 1340
Credit: 7,653,273,724
RAC: 13,371,735
Level
Tyr
Scientific publications
watwatwatwatwat
Message 55833 - Posted: 26 Nov 2020 | 19:43:13 UTC - in response to Message 55832.

Anytime you are dealing with a LGA socket and missing memory channels, first thing to do is pull the cpu, examine the socket pins for any pins on the outside rows that are out of alignment and then reseat the cpu and wiggle it a bit in the socket before clamping down the retention mechanism.



Thanks Keith, I'll give that another try the next time we run out of work. The first time I did that I only checked for bent pins like the MSI support auto-reply states. Socket 1200 must have very tiny pins because I really didn't see them. Next time I'll use loops to examine things. I really appreciate your learned advice.


Yes, I have had issues with both a TR socket and a 2011v-3 socket that wouldn't read all the memory. A wiggle on the TR got it centered on the pins to read all channels.

On the 2011v-3 socket I had to move about 8 pins that were out of alignment. A 10X jewelers loupe and strong illumination is best. I use a very bright tactical flashlight shining at low angle across the pins.

What you are looking for is any pin ball-tip reflection that is out of alignment in X-Y with the other pins in the row and columns.

Then I used a sewing needle to gently nudge the pins back into alignment.

Took about an hour and in the end I had all my memory channels reading correctly and the cpu is running well overclocked to this day.

Profile ServicEnginIC
Avatar
Send message
Joined: 24 Sep 10
Posts: 581
Credit: 9,770,362,024
RAC: 21,500,013
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 55839 - Posted: 28 Nov 2020 | 22:03:59 UTC - in response to Message 55833.
Last modified: 28 Nov 2020 | 22:07:31 UTC

On the 2011v-3 socket I had to move about 8 pins that were out of alignment. A 10X jewelers loupe and strong illumination is best. I use a very bright tactical flashlight shining at low angle across the pins.

What you are looking for is any pin ball-tip reflection that is out of alignment in X-Y with the other pins in the row and columns.

Then I used a sewing needle to gently nudge the pins back into alignment.

Fantastic!
With socket LGA775 it was much easier.
Doing this with 2011 socket is a truly feat...

Pop Piasa
Avatar
Send message
Joined: 8 Aug 19
Posts: 252
Credit: 458,054,251
RAC: 0
Level
Gln
Scientific publications
watwat
Message 55840 - Posted: 28 Nov 2020 | 23:40:00 UTC - in response to Message 55833.
Last modified: 28 Nov 2020 | 23:58:12 UTC

Okay, so I'm supposed to find the outer rows of pins on the processor and check the alignment with loops under bright light. Next time I reboot to patch my funky windows OS I'll give it a go. This z490 board seems flawless otherwise and with 2 GTX 1650's putting out around 525,000 between them while my i5-10400 crunches 8 threads of WCG tasks, they're averaging close to the 225,000 benchmark that was no doubt set by a Linux machine when rod4x4 surveyed the stats.

Guess I shouldn't dis MSI because of my lack of tech savvy.
Thanks again.

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 1340
Credit: 7,653,273,724
RAC: 13,371,735
Level
Tyr
Scientific publications
watwatwatwatwat
Message 55842 - Posted: 29 Nov 2020 | 3:25:35 UTC - in response to Message 55840.

I did some research on the 2011v3 socket in pin assignments and looked for the pins that handle Channel D.

Found what I needed in the Intel Core i7 Processor for LGA2011-v3 Socket document. The section labeled Processor Land Listing showed the pin assignments. That was the document that also listed the maximum deviation of the absolute position of the land grid pins. Which is a lot tighter than I would have expected. Doesn't take much misalignment to have a memory channel go missing.

They were in the outer row on the side nearest the VRM heatsink.

bozz4science
Send message
Joined: 22 May 20
Posts: 109
Credit: 102,786,176
RAC: 103,187
Level
Cys
Scientific publications
wat
Message 55857 - Posted: 30 Nov 2020 | 13:51:46 UTC

Most NVME SSDs that don't come included with a heatsink usually have a warranty sticker on top. Some manufactures say that their stickers are meant to improve heat transfer as there are apparently woven copper lines in it. While I question that, they also void warranty if the sticker is removed. Now my question, if the mobo comes with included heatsinks for the M2 NVME drives:
(1) Should I just install the SSDs as it is without the heatsink?
(2) Should I just put the heatsink on top of the sticker? Is this safe to do, in the sense of the heatsinks thermal pad directly touching the SSD's sticker?

If removing the sticker would be the way to go in order to properly and safely install the heatsink on top of it, I would probably opt for option 1. Thanks in advance for your input

Ian&Steve C.
Avatar
Send message
Joined: 21 Feb 20
Posts: 1069
Credit: 40,231,533,983
RAC: 527
Level
Trp
Scientific publications
wat
Message 55858 - Posted: 30 Nov 2020 | 14:57:16 UTC - in response to Message 55857.

i just put it over the sticker.
____________

bozz4science
Send message
Joined: 22 May 20
Posts: 109
Credit: 102,786,176
RAC: 103,187
Level
Cys
Scientific publications
wat
Message 55859 - Posted: 30 Nov 2020 | 15:09:00 UTC - in response to Message 55858.

Perfect, so that's what I'll do as well. Thanks!

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2343
Credit: 16,206,655,749
RAC: 261,147
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 55863 - Posted: 1 Dec 2020 | 0:44:07 UTC - in response to Message 55858.

i just put it over the sticker.
+1

Profile ServicEnginIC
Avatar
Send message
Joined: 24 Sep 10
Posts: 581
Credit: 9,770,362,024
RAC: 21,500,013
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 55867 - Posted: 1 Dec 2020 | 23:16:45 UTC
Last modified: 1 Dec 2020 | 23:20:40 UTC

On June 15th 2020 I related How I had to replace motherboard-memory-processor in a system with failing SATA controller.

- Cause: Motherboard's SATA controller section failure.

- Solution: Motherboard replacement.

Starting from a veteran system with motherboard for Socket 775 processors and DDR3 memory, replacing motherboard by a current one implies also to replace CPU and RAM.

The related failing motherboard is this one.
But recently I thought: This motherboard has also an integrated IDE controller, will it be still working?
The only way to discover it is giving it a try...
Ok, I have motherboard, memory and processor.
From my fossils wardrobe I rescued:
- An old scrapped minitower chassis
- An IDE 160 GB hard disk drive and IDE cable
- An IDE DVD optical unit and DVD media with Ubuntu 20.04.1 LTS image, dowloaded from Ubuntu official site.
- A 600 watts PSU coming from a previous PSU upgrade in a triple GPU system
- Two 775 processor copper core heatsinks, discarded due to their fixing backplates were broken
- A working GT1030 low power GPU, coming from a GPU upgrade on other system
- An ancient Wireless-G PCI network adapter
Therefore, I had everything I needed to assemble a "new" host.
I first started cutting four healthy backfixings for CPU heatsink, and assembling them into motherboard.
Then, a little workmanship and motherboard and peripherals are assembled into chassis.
Now, PSU is at position also, just over CPU and adjacent to graphics card in this mini tower compact chassis.

Power on!
Initial beep sounds, followed by video, entering BIOS setup is successful, and both IDE hard disk and optical unit are recognized. Ok!
Time to OS installation. Linux install media is entered into optical unit... but it isn't recognized, and a subsequent try to eject it doesn't work.
Well, there is a spanish saying: "Mal comienzo, buen final" ("Bad start, happy ending")
Trying with a watchmaker's screwdriver into emergency ejection hole, media platform is out.
And taking a look to ejection mechanism pulleys, transmision belt is confirmed to be broken. It is something quite usual, and I like to have spare belts available (spanish "correas")
After mounting it back again, now Ubuntu installation media is happily running, and OS installation can be carried on.

Was it worth it or not?
Now it can be seen working at GPUGrid as host #567828 since november 15th.
It has crunched its first million credits with zero errored tasks, and current RAC is over 65000 after 16 days working.
This is what I expected to achieve with its 30 Watts TDP low power GPU...
I really enjoy giving a second or third life to hardware resources!

PS: Due to the diverse provenance of the scrapped parts composing this host, I've called it: "Mr. Frankhostein" ;-)

bozz4science
Send message
Joined: 22 May 20
Posts: 109
Credit: 102,786,176
RAC: 103,187
Level
Cys
Scientific publications
wat
Message 55870 - Posted: 2 Dec 2020 | 23:08:05 UTC
Last modified: 2 Dec 2020 | 23:10:33 UTC

Sorry for diverting attention once again away from the recent discussion. Currently I am very confused as to what PSU characteristics are required to be compatible with the B550 Asus Rog Strix Gaming mobos. As far as I understand it, you are supposed to run an 8 pin + 4 pin connector in addition to the standard 24 pin connector powering the mobo. The 8 pin is the usual CPU power connector that supplies the mobo's CPU socket but what is the 4 pin adapter for? The B550 Gaming E/F/F Wifi all seem to have the additional 4 pin connector and I am not sure if the PSU I found on sale today will fit the bill (Corsair RM750i). Am I missing something, or is there really missing a connector to run the mobo (safely)? And many of the researched PSUs seem to be incompatible with them. Is it only additional power needed for extreme overclocking and can be run safely without it or is it a necessary requirement? – The manual of the mentioned mobos labels the referred connectors EATX12V (4+8 pin). Thanks for any advice

Ian&Steve C.
Avatar
Send message
Joined: 21 Feb 20
Posts: 1069
Credit: 40,231,533,983
RAC: 527
Level
Trp
Scientific publications
wat
Message 55871 - Posted: 2 Dec 2020 | 23:59:52 UTC - in response to Message 55870.
Last modified: 3 Dec 2020 | 0:00:48 UTC

the 4-pin is exactly half of the 8-pin connector. most PSUs have the 8-pin actualy break out into 2x4-pin.

however, the extra 4-pin is totally unnecessary. you do not have to use it, just the 8-pin is enough. the extra 4-pin is intended to add extra power for extreme overclocking, like using LN2 where you could be pulling huge amounts of power.

personally having an 8-pin + 4-pin on a B550 board doesnt make sense. B550 is a more budget lower end board. and the guys who will seriously do extreme overclocking with LN2 aren't going to use a B550 board lol, they will be using a top end X570. it just seems like something the board manufacturers do because it costs them an extra 5 cents to add the 4-pin, to justify marking the board up another $5.
____________

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 1340
Credit: 7,653,273,724
RAC: 13,371,735
Level
Tyr
Scientific publications
watwatwatwatwat
Message 55872 - Posted: 3 Dec 2020 | 0:33:48 UTC

Every power supply I own is modular. And they all come with at least 3 EATX power cables.

(1) 8-pin EATX12V
(2) 4-pin EATX12V

The extra 4 pin EATX cable is usually intended for multi-gpu usage. The reason is to reduce as much as possible the 12V current draw being supplied by the standard 24-pin EATX motherboard cable.

Plenty of stories about burning up the 24 pin cable when running all possible PCIE slots with high powered gpus when you don't have auxiliary 12V power being supplied.

bozz4science
Send message
Joined: 22 May 20
Posts: 109
Credit: 102,786,176
RAC: 103,187
Level
Cys
Scientific publications
wat
Message 55874 - Posted: 3 Dec 2020 | 12:11:06 UTC
Last modified: 3 Dec 2020 | 12:14:29 UTC

Thank you both for your explanations! Much appreciated.

however, the extra 4-pin is totally unnecessary. you do not have to use it, just the 8-pin is enough. the extra 4-pin is intended to add extra power for extreme overclocking, like using LN2 where you could be pulling huge amounts of power.

The extra 4 pin EATX cable is usually intended for multi-gpu usage. The reason is to reduce as much as possible the 12V current draw being supplied by the standard 24-pin EATX motherboard cable.

So in essence, could I summarise your replies as following: It is generally safe to run B550 mainstream chipset boards without the extra 4 pin connector as long as I am not planning to run all PCIe Slots with high-end GPUs, however it is not disadvantageous if I were to run it with the 4 pin EATX 12V connector in addition to the 8 pin EATX 12V connector to the CPU but rather avoid unnecessary strain on the PSU? Definitely not in the league of LN-cooling Ian, lol

it just seems like something the board manufacturers do because it costs them an extra 5 cents to add the 4-pin, to justify marking the board up another $5.
That definitely sounds like something that hardware producers would do...

In the end I opted for a mainstream chipset board, that for IMO was the right balance between features and price. The Asus Strix B550 -E Gaming still offers to 2x gen4 x16 slots that run in dual x8 mode. The rest is pretty much standard for mainstream retail boards, and the price premium of nearly 50% over "higher-end" mainstream B550 boards was not justified for me personally. This is pretty much I will require for the foreseeable future. In addition, I didn't like that the X570 boards all require active cooling for the chipset, which is an additional source of noise and point of failure. For my first build, I really wanted to go with cost efficient components.

While it appears that the mentioned PSU actually had only 1x 8-pin EATX12V connector for their version prior to 2018, I checked my PSU shortlist against what you just taught me, and I can now confirm with you that most of the modern PSUs actually do come with the very same split that you pointed out Keith.
(1) 8-pin EATX12V
(2) 4-pin EATX12V


Is ATX12V a connector standard evolution of the EPS12V connector? I saw these names interchangeably in various manuals and wondered whether they are the same or compatible.

Ian&Steve C.
Avatar
Send message
Joined: 21 Feb 20
Posts: 1069
Credit: 40,231,533,983
RAC: 527
Level
Trp
Scientific publications
wat
Message 55875 - Posted: 3 Dec 2020 | 14:53:53 UTC - in response to Message 55874.

it won't hurt anything to plug it in if you have it. if you don't have the extra connector, don't worry about it, you don't need to replace the PSU just for that connector.
____________

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 1340
Credit: 7,653,273,724
RAC: 13,371,735
Level
Tyr
Scientific publications
watwatwatwatwat
Message 55877 - Posted: 3 Dec 2020 | 18:33:28 UTC - in response to Message 55874.

Is ATX12V a connector standard evolution of the EPS12V connector? I saw these names interchangeably in various manuals and wondered whether they are the same or compatible.

They are used interchangeably but actually have different origins. The EPS12V standard came out of the Server Standard Infrastructure group.

It has a 24-pin ATX motherboard connector (same as ATX12V v2.x), an 8-pin secondary connector and an optional 4-pin tertiary connector. Rather than include the extra cable, many power supply makers implement the 8-pin connector as two combinable 4-pin connectors to ensure backwards compatibility with ATX12V motherboards.


The ATX12V standard came out of the ATX group.

Most common PC's and motherboards are built to ATX standards so they should get coupled with ATX power supplies.

Read through the power supply section of the ATX wiki.

https://en.wikipedia.org/wiki/ATX#ATX_power_supply_revisions

And there is an upcoming transition called ATX12VO that removes all of the voltages except for 12V from the motherboard main connector and replaces the 24 pin connector with a 10 pin one.

bozz4science
Send message
Joined: 22 May 20
Posts: 109
Credit: 102,786,176
RAC: 103,187
Level
Cys
Scientific publications
wat
Message 55878 - Posted: 3 Dec 2020 | 18:45:45 UTC

Very enlightening! Thanks, even though I feel a bit stupid now as I had the very same article opened this morning ... Old habit to search the wiki article in German first even though the English versions tend to be much better organized. Thanks again for the short 101 of PSU connectors :)

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 1340
Credit: 7,653,273,724
RAC: 13,371,735
Level
Tyr
Scientific publications
watwatwatwatwat
Message 55879 - Posted: 3 Dec 2020 | 19:18:37 UTC

I'm interested in what changes the new ATX12VO standard is going to cause in the industry. A lot more motherboard real estate is going to be used up by 12V DC-DC inverters to create the +5 and +3.3V needed for SATA storage devices. Also the SATA power connectors will have to move off the power supply and onto the motherboard.

Benefit will be no more burnt up 12V pins, melted Molex housings and yellow wires on the ATX 24-pin motherboard connector. The pin and wire gauge will go up considerably to handle the higher current draws on the 12V lines.

Disadvantages will be even less multi-PCIE slot motherboards that only support a single gpu likely. Probably will force more people onto the HEDT and server platforms that need to host multiple gpus.

Profile ServicEnginIC
Avatar
Send message
Joined: 24 Sep 10
Posts: 581
Credit: 9,770,362,024
RAC: 21,500,013
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 55880 - Posted: 4 Dec 2020 | 8:34:22 UTC
Last modified: 4 Dec 2020 | 8:49:02 UTC

Sorry for diverting attention once again away from the recent discussion. Currently I am very confused as to what PSU characteristics are required...

bozz4science,You're welcome.
Every question regarding hardware is on topic here.
And this has lead me to learn new subjects thanks to Ian&Steve C. and Keith Myers kind clarifications.

On the practical side:
it won't hurt anything to plug it in if you have it. if you don't have the extra connector, don't worry about it, you don't need to replace the PSU just for that connector.

+1

I've been running this triple GPU system for months without any problem. It is Host #480458 at GPUGrid.
Motherboard is a Gigabyte Z390 UD for LGA1151 processors, and it has three PCI express slots for graphics cards.
Motherboard's extra four pins supply connector is left free, due to 750 Watts PSU hasn't connection available to match it.
All three graphics cards are based on GTX 1650 GPU. This way, there isn't any problem when restarting GPUGrid tasks between them.
Two of the cards are getting their power directly from PCIE slots (cards 2 and 3). Their rated TDP is 75 Watts each one.
And I chose a model for card number 1 with an extra PCIE 6 pin power connector, for not stressing motherboard supply too much.
This last card is a factory overclocked model, with a rated TDP increased to 85 Watts.

Pop Piasa
Avatar
Send message
Joined: 8 Aug 19
Posts: 252
Credit: 458,054,251
RAC: 0
Level
Gln
Scientific publications
watwat
Message 55881 - Posted: 5 Dec 2020 | 0:43:15 UTC - in response to Message 55880.

I've been running this triple GPU system for months without any problem. It is Host #480458 at GPUGrid.
Motherboard is a Gigabyte Z390 UD for LGA1151 processors, and it has three PCI express slots for graphics cards.
Motherboard's extra four pins supply connector is left free, due to 750 Watts PSU hasn't connection available to match it.
All three graphics cards are based on GTX 1650 GPU. This way, there isn't any problem when restarting GPUGrid tasks between them.
Two of the cards are getting their power directly from PCIE slots (cards 2 and 3). Their rated TDP is 75 Watts each one.
And I chose a model for card number 1 with an extra PCIE 6 pin power connector, for not stressing motherboard supply too much.
This last card is a factory overclocked model, with a rated TDP increased to 85 Watts.


I thought I solved the issue of the extra 4-pin CPU power input by using a 4 to 2x4 pin splitter adaptor. I guess it probably was a waste of time and money powering a 65W i5-10400 but MSI stated that all the CPU power plugs must be connected when I couldn't get it to boot. Turns out Keith Myers nailed the problem as bent processor pins.

This thread is awesome (grosso) for us, ServicEnginIC. Thanks again.

Pop Piasa
Avatar
Send message
Joined: 8 Aug 19
Posts: 252
Credit: 458,054,251
RAC: 0
Level
Gln
Scientific publications
watwat
Message 55882 - Posted: 5 Dec 2020 | 0:43:20 UTC - in response to Message 55880.

I've been running this triple GPU system for months without any problem. It is Host #480458 at GPUGrid.
Motherboard is a Gigabyte Z390 UD for LGA1151 processors, and it has three PCI express slots for graphics cards.
Motherboard's extra four pins supply connector is left free, due to 750 Watts PSU hasn't connection available to match it.
All three graphics cards are based on GTX 1650 GPU. This way, there isn't any problem when restarting GPUGrid tasks between them.
Two of the cards are getting their power directly from PCIE slots (cards 2 and 3). Their rated TDP is 75 Watts each one.
And I chose a model for card number 1 with an extra PCIE 6 pin power connector, for not stressing motherboard supply too much.
This last card is a factory overclocked model, with a rated TDP increased to 85 Watts.


I thought I solved the issue of the extra 4-pin CPU power input by using a 4 to 2x4 pin splitter adaptor. I guess it probably was a waste of time and money powering a 65W i5-10400 but MSI stated that all the CPU power plugs must be connected when I couldn't get it to boot. Turns out Keith Myers nailed the problem as bent processor pins.

This thread is awesome (grosso) for us, ServicEnginIC. Thanks again.

Pop Piasa
Avatar
Send message
Joined: 8 Aug 19
Posts: 252
Credit: 458,054,251
RAC: 0
Level
Gln
Scientific publications
watwat
Message 55883 - Posted: 5 Dec 2020 | 1:29:39 UTC

Wow, a double posting! How'd I rate that?

Profile ServicEnginIC
Avatar
Send message
Joined: 24 Sep 10
Posts: 581
Credit: 9,770,362,024
RAC: 21,500,013
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 55884 - Posted: 5 Dec 2020 | 9:14:05 UTC

Taken from the manual of recently mentioned Z390 UD motherboard:
Notice the last warning on the picture... Perhaps not all hardware handlers know this...

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2343
Credit: 16,206,655,749
RAC: 261,147
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 55885 - Posted: 5 Dec 2020 | 12:53:01 UTC - in response to Message 55881.
Last modified: 5 Dec 2020 | 12:54:41 UTC

I guess it probably was a waste of time and money powering a 65W i5-10400 but MSI stated that all the CPU power plugs must be connected when I couldn't get it to boot. Turns out Keith Myers nailed the problem as bent processor pins
You have found the reason of your system's boot failure, and the way you've fixed it tells that it has nothing to do with the number of power connectors, yet you conclude that you don't need the extra 4-pin power connector.
The real reason of this incorrect conclusion is that Intel made you think that your i5-10400 consumes 65W, while in real life it consumes about 50% more of that when you actually use it.
You can check the actual power consumption with CoreTemp (or similar tools).

Originally (in electronics) TDP stands for Total Dissipated Power, which is a real life measure of the sum of the power dissipation of all components at full load. This is used for the design of the cooling solution to make critical components stay under their maximum allowed working temperature.

Intel changed the definition of TDP: Thermal Design Power, which may look the same, but listen carefully every word of their definition:
What is Thermal Design Power (TDP)?

TDP stands for Thermal Design Power, in watts, and refers to the power consumption under the maximum theoretical load. Power consumption is less than TDP under lower loads. The TDP is the maximum power that one should be designing the system for. This ensures operation to published specs under the maximum theoretical workload.
Have you noticed the word "theoretical"? I assure you, that real world workloads (like crunching) are way beyond that, therefore you should not design the CPU power lines and cooling of cruncher PCs by their TDP figures.
They are giving a quite polite hint of it:
What is the maximum power consumption for my processor?
Under a steady workload at published frequency, it is TDP. However, during turbo or certain workload types such as Intel® Advanced Vector Extensions (Intel® AVX) it can exceed the maximum TDP but only for a limited time , or

    · Until the processor hits a thermal throttle temperature, or
    · Until the processor hits a power delivery limit.

The final blow on their definition of TDP is revealed when you click on the TDP text or the question mark next to it on any processor's product specification page (for example the i5-10400)
TDP
Thermal Design Power (TDP) represents the average power, in watts, the processor dissipates when operating at Base Frequency with all cores active under an Intel-defined, high-complexity workload. Refer to Datasheet for thermal solution requirements.
This definition is slightly different than the previous one, but sums the previous two quoutes:
The main point is that the i5-10400 has a TDP of 65W on 2.9GHz, all cores active, while they advertise it as a 4.3GHz CPU, which actually is, but its TDP is much higher than 65W on 4.3GHz.
This is true for all Intel processors which have a "base frequency" and a "turbo frequency".
Intel never disclose the TDP of their processors at "turbo frequency", but they advertise them as being that ("turbo frequency") fast.
This is a questionable way of making their processors more appealing.

Pop Piasa
Avatar
Send message
Joined: 8 Aug 19
Posts: 252
Credit: 458,054,251
RAC: 0
Level
Gln
Scientific publications
watwat
Message 55886 - Posted: 5 Dec 2020 | 17:11:45 UTC - in response to Message 55885.

Thanks much Zoltan, yes, when I look at the MSI Afterburner hardware monitor it shows that when running around 90% capacity, my 10-400 consumes close to 70 watts so I sure won't call the extra power plug on this z-490 board unnecessary. I hope that splitting one of the power leads into two plugs doesn't cause me problems down the road. I should have upgraded to a semi-modular PSU. The current one is a EVGA 750W White model.

I'm assuming that the PCIE slots are powered by these plugs also. Is that correct? That adds up to 150 watts powering 2 GTX 1650s, if so.

Thanks for posting all that info, Every day is a good day to learn.

Pop Piasa
Avatar
Send message
Joined: 8 Aug 19
Posts: 252
Credit: 458,054,251
RAC: 0
Level
Gln
Scientific publications
watwat
Message 55887 - Posted: 5 Dec 2020 | 17:20:47 UTC - in response to Message 55884.

Thank you too, ServicEnginIC. I don't recall if MSI was that detailed or not with their instruction. I admit I probably glanced at the directions and then went about slapping it in with a total lack of attention.

That's what makes you guys such valuable mentors in this thread!

Multi gracias!

Ian&Steve C.
Avatar
Send message
Joined: 21 Feb 20
Posts: 1069
Credit: 40,231,533,983
RAC: 527
Level
Trp
Scientific publications
wat
Message 55888 - Posted: 5 Dec 2020 | 18:37:57 UTC - in response to Message 55886.

But the extra 4-pin IS unnecessary if you’re already using an 8-pin. The 8-pin alone can supply over 200W. Your 70W chip isn’t stressing it at all.
____________

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 1340
Credit: 7,653,273,724
RAC: 13,371,735
Level
Tyr
Scientific publications
watwatwatwatwat
Message 55889 - Posted: 6 Dec 2020 | 2:12:23 UTC

The PCIE slots are powered mainly from the 24 pin EATX connector. Why an extra EATX12V connector on the motherboard is desirable when all the slots are occupied by high powered (200W or more gpu cards) so you don't risk burning up the 24 pin connector.

Most modern cpus pull enough power to demand at least the 4 pin CPU EATX power plug connected or preferably the 8 pin connector.

Most mobo manuals state the 8 pin should be connected if overclocking.

Also how much power is pulled from the slot is dependent on the card hardware and firmware. The PCIE slot can support a max of 75W from the specification of which only 66W is for the 12V lines. The slot also has 3.3V lines in it that get the rest of the 75W allotment. But a card does not necessarily pull the max 66W from the slot if it is designed to pull the majority of its power from the auxiliary PCIe 12V connectors that high powered cards have.

Profile ServicEnginIC
Avatar
Send message
Joined: 24 Sep 10
Posts: 581
Credit: 9,770,362,024
RAC: 21,500,013
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 55891 - Posted: 7 Dec 2020 | 11:47:32 UTC

But the extra 4-pin IS unnecessary if you’re already using an 8-pin. The 8-pin alone can supply over 200W.

I agree that in my case currently it is not necessary, but what if it was?

I'll try to make a solution with my available resources.
Returning to this triple GPU system, it is supplied by a semi-modular Mars Gaming MPII 750 PSU.
This PSU has a wired 20+4 pins ATX motherboard connector, and 4+4 pins connectors for supplying motherboard with +12V.
It has also one modular +12V output for supplying 2xPCIE 6+2 pins power connections, and three more combined modular outputs, two of them ramaining available.
And I'd kept (just in case) cutted out CPU 4+4 pins power wirings and two combined modular cables from a broken PSU... It is all what I need, apart from some (funny) workmanship.
Tools and materials required: Cutting pliers, cutter, gas lighter, soldering iron, soldering tin, thermoshrinking sleeve, and several cable ties.
I'll start by separating the necessary connectors, and then retiring unnecessary red (+5V) and orange (+3,3V) wires. Now only black (ground) and yellow (+12V) wires are remaining.
(- Let's plug in soldering iron now, for it to be hot enough when needed -)
Then, I'm stripping about 12 mm (~1/2 inch) of insulation at every wire ending, by cutting it around carefully and peeling it.
Now I'm unfolding the stripped wire ends in what I call "the shape of a peacock's tail".
Then, this spreaded wire ends are overlapped and twisted together to obtain this result.
(It's said that world can be divided into two groups: The one of people twisting clockwise, and the other that prefer twisting counterclockwise... It doesn't matter, the result is the same ;-)
Insert now about 32 mm (~1 1/4 inch) of shrinking sleeve at each grouped wires, and push it far apart from wire ends.
Let me present at this moment a tool that I initially forgot: This practical "third hand" to hold wires while soldering them.
Once soldering iron is hot enough (important), heat well the stripped wire end while applying generously some soldering tin along it, as can be seen (?) at this first take video.
And at this other video we can see how yellow color wires are attached together by melting their preapplied soldering tin.
Finally, we're protecting bare junction with preinserted (?!?) shrinking sleeve. I usually employ the colder blue portion of a gas flame, spreading heat quikly along all the sleeve until it completely shrinks.
On the same way, I'll join all black wires together, and protecting them with isolating sleeve.
Once soldering tasks are finished, don't forget unpluging soldering iron, and picking up every leftovers.
We will finish grouping wires by means of evenly arranged cable ties.
That's all. Here we have the final result.

Does it work?
Here is a before and after 01, after 02, and after 03 images of Host #480458

Pop Piasa
Avatar
Send message
Joined: 8 Aug 19
Posts: 252
Credit: 458,054,251
RAC: 0
Level
Gln
Scientific publications
watwat
Message 56052 - Posted: 18 Dec 2020 | 20:05:09 UTC - in response to Message 55833.

During the server outage yesterday I tried out Keith Myers' suggestion of checking the CPU socket pins to explain my missing memory channel and he was spot-on. Two bent pins, 2nd & 3rd from the bottom of the 3rd column from the right when viewing the socket with the alignment corner bottom left side.

I used my smallest flat jewelers screwdriver, but it still felt a bit like using a Track Hoe to straighten a parking bollard, looking through the loops. Fortunately, my old hand was steady enough and I now have dual-channel DDR4-3000 running @ 2933 MHz, even with a locked 10-400 processor (I didn't know that was possible). My bad to diss my MSI Z490 board, it rocks!

Anyway, thanks to Keith and ServicEnginIC, also anyone else who advised me on this problem.

Profile ServicEnginIC
Avatar
Send message
Joined: 24 Sep 10
Posts: 581
Credit: 9,770,362,024
RAC: 21,500,013
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 56053 - Posted: 18 Dec 2020 | 22:14:02 UTC - in response to Message 56052.

Congratulations!
Hurrah to your courage, and to Keith Myers for his wise advice.
🙌

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 1340
Credit: 7,653,273,724
RAC: 13,371,735
Level
Tyr
Scientific publications
watwatwatwatwat
Message 56054 - Posted: 18 Dec 2020 | 23:39:44 UTC - in response to Message 56052.

Happy to hear of your successful socket surgery Pop.

It is a bit nerve wracking to intentionally introduce any object into the field of LGA pins in a socket.

But with careful patience and steady hands, you can perform some mild manipulations to get all your memory channels readable.

Pop Piasa
Avatar
Send message
Joined: 8 Aug 19
Posts: 252
Credit: 458,054,251
RAC: 0
Level
Gln
Scientific publications
watwat
Message 56055 - Posted: 19 Dec 2020 | 1:42:08 UTC - in response to Message 55891.

(It's said that world can be divided into two groups: The one of people twisting clockwise, and the other that prefer twisting counterclockwise...


Do you suppose that has anything to do with what hemisphere you live in, amigo?

I guess that question needs to be answered by rod4x4.

rod4x4
Send message
Joined: 4 Aug 14
Posts: 266
Credit: 2,219,935,054
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 56056 - Posted: 19 Dec 2020 | 3:30:31 UTC - in response to Message 56055.

(It's said that world can be divided into two groups: The one of people twisting clockwise, and the other that prefer twisting counterclockwise...


Do you suppose that has anything to do with what hemisphere you live in, amigo?

I guess that question needs to be answered by rod4x4.

Clockwise or counter-clockwise, I am dizzy at the skill and precision of all of ServicEnginIC work he presents. Attention to detail is amazing! Well done ServicEnginIC!

Pop Piasa
Avatar
Send message
Joined: 8 Aug 19
Posts: 252
Credit: 458,054,251
RAC: 0
Level
Gln
Scientific publications
watwat
Message 56081 - Posted: 19 Dec 2020 | 21:37:30 UTC - in response to Message 56056.

(It's said that world can be divided into two groups: The one of people twisting clockwise, and the other that prefer twisting counterclockwise...


Do you suppose that has anything to do with what hemisphere you live in, amigo?

I guess that question needs to be answered by rod4x4.

Clockwise or counter-clockwise, I am dizzy at the skill and precision of all of ServicEnginIC work he presents. Attention to detail is amazing! Well done ServicEnginIC!


Dittos, mate! I would feel quite secure knowing he serviced the ventilator I was on, were I in that situation. 👍👍

Profile ServicEnginIC
Avatar
Send message
Joined: 24 Sep 10
Posts: 581
Credit: 9,770,362,024
RAC: 21,500,013
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 56218 - Posted: 1 Jan 2021 | 0:06:16 UTC

Pop Piasa
Avatar
Send message
Joined: 8 Aug 19
Posts: 252
Credit: 458,054,251
RAC: 0
Level
Gln
Scientific publications
watwat
Message 56252 - Posted: 9 Jan 2021 | 2:11:11 UTC

I've a discovery to share with anyone who runs a Dell OptiPlex mini-tower.

Because of the design of the CPU cooler, these cases direct the exhaust airstream directly across the GPU cooler intake if the graphics card is in the top (fastest) PCIE slot. This is compounded by the card being upside-down from the ATX boards and pulling air downward from above.

My GTX 750ti Dell card was running 70C and would only cool to 67C with the case side panel removed, until I set a 3 inch PVC tube coupling atop the spiral cooler.

What a difference!

The tube pulled cool air from above the CPU exhaust stream and reduced recycling of its own exhaust heat, dropping temperature by several degrees C.

I was able to improve cooling and achieve a GPU temperature of 124F/52C by fashioning an extension of metal foil duct tape to scoop air from the open side of the machine. I used tiny strips of the tape to affix the tube to the fan shroud.

I closed the case again and my GPU ran at around 57C, crunching MDADs. It's running at up to 60C while crunching Fahcore CUDA tasks at this time, with the CPU crunching WCG tasks.

Profile ServicEnginIC
Avatar
Send message
Joined: 24 Sep 10
Posts: 581
Credit: 9,770,362,024
RAC: 21,500,013
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 56253 - Posted: 9 Jan 2021 | 10:02:18 UTC - in response to Message 56252.

Thank you very much for sharing it.
Bringing imagination into practice, many useful actions can be achieved to enhance refrigeration.
Particularly, I like very much the kind like the one you are describing.
There is much to gain and little to lose... Well... Not exactly... You have lost 10 ºC at your GPU temperature ;-)
Well done!

rod4x4
Send message
Joined: 4 Aug 14
Posts: 266
Credit: 2,219,935,054
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 56254 - Posted: 9 Jan 2021 | 13:45:30 UTC - in response to Message 56252.

I've a discovery to share with anyone who runs a Dell OptiPlex mini-tower.

Because of the design of the CPU cooler, these cases direct the exhaust airstream directly across the GPU cooler intake if the graphics card is in the top (fastest) PCIE slot. This is compounded by the card being upside-down from the ATX boards and pulling air downward from above.

My GTX 750ti Dell card was running 70C and would only cool to 67C with the case side panel removed, until I set a 3 inch PVC tube coupling atop the spiral cooler.

What a difference!

The tube pulled cool air from above the CPU exhaust stream and reduced recycling of its own exhaust heat, dropping temperature by several degrees C.

I was able to improve cooling and achieve a GPU temperature of 124F/52C by fashioning an extension of metal foil duct tape to scoop air from the open side of the machine. I used tiny strips of the tape to affix the tube to the fan shroud.

I closed the case again and my GPU ran at around 57C, crunching MDADs. It's running at up to 60C while crunching Fahcore CUDA tasks at this time, with the CPU crunching WCG tasks.

Assessing the issue and implementing a fix.
great job. Many thanks for sharing. Now I have to go and have a second (third or fourth) look at my Hosts.

Pop Piasa
Avatar
Send message
Joined: 8 Aug 19
Posts: 252
Credit: 458,054,251
RAC: 0
Level
Gln
Scientific publications
watwat
Message 56257 - Posted: 10 Jan 2021 | 2:34:09 UTC

Thanks, guys. I thought about posting a picture, but my finished product isn't as photogenic as those ServicEnginIC has chronicled. 😉

Pop Piasa
Avatar
Send message
Joined: 8 Aug 19
Posts: 252
Credit: 458,054,251
RAC: 0
Level
Gln
Scientific publications
watwat
Message 56259 - Posted: 10 Jan 2021 | 4:18:52 UTC - in response to Message 56253.
Last modified: 10 Jan 2021 | 5:15:26 UTC

Incidentally, My single fan GPU cooler runs quieter at 100% speed with the intake mod. Also a bonus.
Additionally, the larger diameter end of the CPVC coupler should mate with the fan shroud. This was the best performance for me.

Profile ServicEnginIC
Avatar
Send message
Joined: 24 Sep 10
Posts: 581
Credit: 9,770,362,024
RAC: 21,500,013
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 56298 - Posted: 16 Jan 2021 | 18:54:58 UTC

Taking advantage of the current impasse situation at Gpugrid, I've calmly made some pending hardware reforms at several of my hosts.
One of these reforms consisted of transferring my highest performance Graphics card, based so far on a GTX 1660 Ti GPU, from a 2nd generation PCIe motherboard host to a 3rd generation PCIe one.

This particular graphics card is very hard to keep cold.
I've made an air flow improvement at the receiving host that I'll share next.

The chassis for this host had not any other upper air exhaust than the opening for PSU.
It is a physical fact that hot air is lighter, and regular convection makes it tend to accumulate at chassis higher zones.
Adding a new air exhaust helped by a fan at the top, will better evacuate hot air from top, and improve cold air circulation from bottom.

- How can we perforate a proper opening to fit an 80x80x25 mm fan?
The easiest way I've found is by means of a 79 mm bi-metal crown drill. It is suitable for drilling aluminum or mild steel alloys that computer chassis are generally made of.

- Is it safe to perforate a computer chassis with hardware mounted in it?
Normally it is a very risky manoeuvre, because any metallic splinter or leftover will for sure short circuit the most expensive component (if Murphy's law is applied ;-)

I've developed my own technique for this.
I squint looked this vegetable cream 140 mm diameter plastic container...
I'll first cover the inside top of chassis with american tape. We call it american tape (cinta americana) at spain. I don't know how it is called at America (?)
Then, I'm attaching the same way the (ultratech) leftovers container at inside, directly below the fan selected placing.
I'm starting by opening a 6 mm guiding hole at the center, using a common 6 mm drill for metal.
Then, the 79 mm crown drill comes into action. Slowly, and taking several pauses, because at low revolutions the electric drill might overheat!
Special care and low pressure must be taken at the end, when perforation is near to detach the inner circular plate.
Here we have it.
Now I'm presenting and centering the fan on place, and I'll mark fan fixing holes positions by means of a long pencil lead.
I prefer to firstly perforate these holes with a 2,5 mm drill for metal. This way, it is easier to maintain the original marked position. And then, enlarging with the 6 mm drill.
After every drilling operations are finished, we can detach our protective fixture and all its adhesive fixings.
And now the fan and its protective grid are fixed in position, air flux oriented outwards.
This is the final look for the resulting host.

It is prepared for the new Gpugrid WUs to come...
🤔️

Profile ServicEnginIC
Avatar
Send message
Joined: 24 Sep 10
Posts: 581
Credit: 9,770,362,024
RAC: 21,500,013
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 56304 - Posted: 18 Jan 2021 | 6:54:41 UTC

Another reform consisted of:
- Replace the motherboard with a failed SATA controller on one system.
- Clone previous IDE mechanical HDD for this host by a much more agile SATA SSD unit.
- Replace graphics card from a GT 1030 to a GTX 750.
Here are pictures of this system:
Before

and After


Some considerations that may be of interest:
- I've found Linux OS te be very resilient for this kind of refurbished, usually adapting to the changes with no need for further actions.
- I've discovered a very useful Linux native command to clone an old drive to a new one.
Taking this as an example:
Here we can see the old (year 2006) 160 GB IDE drive, and the raplacing new 240 GB SATA SSD.
In detail, here can be seen the differences between IDE (parallel) and SATA (serial) connections.
And here is a picture of the drives arrangement employed to the cloning operation.
The SATA 2,5" SSD is installed into a SATA to USB external enclosure, while the 3,5" HDD is connected to an IDE to USB adapter and an external PSU.
HDD (source) is assigned by Linux to be /dev/sde unit, while SSD (destination) is assigned to be /dev/sdd
At this situation, the following terminal command was used:

sudo dd if=/dev/sde of=/dev/sdd bs=1M

There is no any progress indication. Patience and waiting is required, but after about one hour, the following message was obtained:

160041885696 bytes (160 GB, 149 GiB) copied, 4090,62 s, 39,1 MB/s

And after that, SDD was used to successfully replace the old HDD at the refurbished system.

Notes:
- Please, be extremely careful to adapt the indicated command to your particular source (if) and destination (of) drives. Otherwise, unwanted data overwriting might happen if improper destination is chosen.
- Destination drive will be an identical copy of the source one, so an equal or greater size than source has to be used.
- If a greater size is chosen for destination drive, partition resizing tools can be used afterwards (GParted, for example) to gain the extra space available.
- Not only Linux OS drives can be cloned, this tool is useful to gain storing space with no risk (source drive remains unchanged as a backup) also for other Operating Systems.

Profile ServicEnginIC
Avatar
Send message
Joined: 24 Sep 10
Posts: 581
Credit: 9,770,362,024
RAC: 21,500,013
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 56323 - Posted: 26 Jan 2021 | 19:11:06 UTC

Re-Turning to the origins

I started publishing this thread on November 2nd 2019, with an article regarding fans and their affairs.
Yesterday, I switched on one of my hosts for a Linux kernel update, and a typical loud noise started to sound.
This time, I took a short video with the method I usually employ to diagnose what fan is to be replaced.
Diagnosing defective fan - Video
When it is not clear which of the fans is the origin of the noise, I get it certain by stopping the suspect ones in sequence.
Please, use a soft tool for this, if you want to keep your fingers count up to ten!
I'm using a cotton swab at present video.
Near to noise source, I first stopped rear chassis fan, then CPU heatsink fan, and noise finally ceased when stopping PSU fan. That's it!
Here is a picture of dissassembled PSU and replacement fan candidate, 120x120x25 mm. So far, replacement candidate Pass
Now, you can see at this picture the specifications for original and replacement fans, both checked to be very similars, and even supply connectors are the same: Replacement candidate Pass
But a closer look to both supply connectors... And supply poles red (+12 V) and black (Ground) are inverted: Replacement candidate Doesn't pass!
Please, be very careful. Different manufacturers may employ different conventions, as in this case.
Connecting a fan with inverted polarity may cause a non-turning fan, or even a permanent damage if the fan is not reverse-polarity protected.
Also, before disassembling the original fan, please, take note of the air flow direction and respect it when replacing.
No problem. At this new image, reversed polarity is solved after a "cut and paste" operation to the supply wires.
Now, fan's supply connector is replaced to its original position at inside the PSU.
And after reassembling in reverse order, this system is ready again.

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 1340
Credit: 7,653,273,724
RAC: 13,371,735
Level
Tyr
Scientific publications
watwatwatwatwat
Message 56324 - Posted: 26 Jan 2021 | 20:05:57 UTC

I would have just swapped the pins in the connector. Faster than firing up the soldering iron.

A sewing needle is all that is required.

Profile ServicEnginIC
Avatar
Send message
Joined: 24 Sep 10
Posts: 581
Credit: 9,770,362,024
RAC: 21,500,013
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 56330 - Posted: 30 Jan 2021 | 18:59:59 UTC - in response to Message 56324.

I would have just swapped the pins in the connector. Faster than firing up the soldering iron.
A sewing needle is all that is required.

Good annotation, Keith.
If you pull the cable while pushing the connection retaining tab across the frame's side window, it is possible to extract it.
Actually, that was my first option.
But it was not the first time that I did this operation on that old connector, and one of the metallic retaining tabs broke.
I then preferred to plug in the soldering iron, and recover the original, newer connector.

Profile ServicEnginIC
Avatar
Send message
Joined: 24 Sep 10
Posts: 581
Credit: 9,770,362,024
RAC: 21,500,013
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 56341 - Posted: 31 Jan 2021 | 21:11:22 UTC

Extracting motherboard's plastic spacers - Videotrick

Jim1348
Send message
Joined: 28 Jul 12
Posts: 819
Credit: 1,591,285,971
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 56342 - Posted: 31 Jan 2021 | 21:30:36 UTC - in response to Message 56341.

I have found that placing a small-size socket wrench over them works too.
But they are all gone, thank goodness. I have only metal ones now.

Profile ServicEnginIC
Avatar
Send message
Joined: 24 Sep 10
Posts: 581
Credit: 9,770,362,024
RAC: 21,500,013
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 56349 - Posted: 1 Feb 2021 | 5:56:33 UTC - in response to Message 56342.

But they are all gone, thank goodness. I have only metal ones now.

I keep employing them at non-chassis-matching holes, as a reinforcement ;-)

bozz4science
Send message
Joined: 22 May 20
Posts: 109
Credit: 102,786,176
RAC: 103,187
Level
Cys
Scientific publications
wat
Message 56350 - Posted: 1 Feb 2021 | 19:01:44 UTC
Last modified: 1 Feb 2021 | 19:05:58 UTC

Don't know if this is a fitting place to get advice with my current issue: My computer doesn't start/boot.

Background:
Having built my first PC back in December, I was quite thrilled that my system did post right off the start and has been running smoothly ever since. It was mostly used for light programming stuff and running BOINC in the background. Never noticed any issues. Last Friday, I was working late and noticed that my Win 10 OS didn't report the right time. It occasionally happens, so I just went into the settings and synced with the time server (as I have done multiple times so far). While still showing the "greyed" sync field as in still processing the sync command, I headed into another program. That's when my computer froze. As BOINC was running in the background, I can only assume that the CPU and RAM was under heavy load at that point in time.

I'd describe my troubleshooting skills as mediocre, so I went online and researched various ways of handling this issue. As nothing helped "unfreezing" it, I wanted to just shut it down. But it didn't react to my desperate attempts to do just that.

Then the PC didn't let it power down. My mouse still worked fine and could click as did my keyboard, but task manager did not respond. Windows didn't let me pull up the terminal to try shutting it down via cmd line. Even a "hard" shutdown by pressing the power button didn't work.

I went to bed and let my PC powered on in this "stuck" state. The next morning, nothing did change. As I needed to drive away over the weekend, I didn't know my around "pulling" the power cable from the wall.

Now, after coming back to my apartment, I had to discover that my PC won't power back on. I checked every cable. I made sure that the power supply cable was plugged in correctly and was not sitting too sluggish. Switching my power supply on from 0 to - only lights up the power LED of my GPU, but the PC doesn't power on by clicking the power button. Checked another power supply cable. Nothing. Pulled all PSU cables and reconnected them. Nothing. No fans spinning, no input detected on monitor... I don't see any indication at all that the computer is reacting to me pressing the power button.

I only have clearing CMOS left, but only would like to proceed if this might be the right next step to do. I kind of fear that I damaged the PSU or other parts by having pulled the plug. Can you advice how to proceed troubleshooting my PC not powering on.

By the way, I haven't changed anything hardware-wise since the initial build in December '20.

This is the said host if it helps at all:
Troubled host

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 1340
Credit: 7,653,273,724
RAC: 13,371,735
Level
Tyr
Scientific publications
watwatwatwatwat
Message 56351 - Posted: 1 Feb 2021 | 19:22:09 UTC

Well since I just had to contend with the same issue, computer would not boot past the display of the "Press DEL or F1" to enter BIOS, and not respond to any keyboard input, I thought I would suggest what it ended up being the reason for my failure to boot.

I chased red herrings for a while until I pulled all my drive data connections.

That got the computer to boot, and I plugged each drive back in until it failed to boot again. Turned out my oldest SSD was preventing the computer to boot.

So with basic troubleshooting, reduce down to cpu, 1 stick of RAM, 1 gpu and no data drives to see if you can at least get into the BIOS.

bozz4science
Send message
Joined: 22 May 20
Posts: 109
Credit: 102,786,176
RAC: 103,187
Level
Cys
Scientific publications
wat
Message 56352 - Posted: 1 Feb 2021 | 19:34:29 UTC
Last modified: 1 Feb 2021 | 19:39:30 UTC

All right, thanks. I‘ll try it that out next. Just have 2 nvme SSDs installed on the mobo and 1 SATA SSD.

Profile ServicEnginIC
Avatar
Send message
Joined: 24 Sep 10
Posts: 581
Credit: 9,770,362,024
RAC: 21,500,013
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 56353 - Posted: 1 Feb 2021 | 19:36:03 UTC

So with basic troubleshooting, reduce down to cpu, 1 stick of RAM, 1 gpu and no data drives to see if you can at least get into the BIOS.

+1

Profile ServicEnginIC
Avatar
Send message
Joined: 24 Sep 10
Posts: 581
Credit: 9,770,362,024
RAC: 21,500,013
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 56354 - Posted: 1 Feb 2021 | 19:43:48 UTC
Last modified: 1 Feb 2021 | 19:45:10 UTC

If not starting anyway with the remaining hardware, try again by swapping it with the retired one.
If this second try succeeds, you may suspect for any defective element of those laying on the table...

bozz4science
Send message
Joined: 22 May 20
Posts: 109
Credit: 102,786,176
RAC: 103,187
Level
Cys
Scientific publications
wat
Message 56355 - Posted: 1 Feb 2021 | 21:05:24 UTC
Last modified: 1 Feb 2021 | 21:09:04 UTC

So far nothing... Still going through the process of elimination. However, I do notice a light "buzzing" sound as if a fly would be caught in one of those electric fly traps every time I switch off the PSU. Definitely a very nerve wracking process.

I am starting to suspect some failure with the PSU. Can a PSU really have such a short lifetime? Hasn't been 2 months ...

Should I remove the SATA drive only or also the NVMEs, one of them being the boot drive?

Profile ServicEnginIC
Avatar
Send message
Joined: 24 Sep 10
Posts: 581
Credit: 9,770,362,024
RAC: 21,500,013
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 56356 - Posted: 1 Feb 2021 | 21:30:11 UTC - in response to Message 56355.

I am starting to suspect some failure with the PSU

Me too

bozz4science
Send message
Joined: 22 May 20
Posts: 109
Credit: 102,786,176
RAC: 103,187
Level
Cys
Scientific publications
wat
Message 56357 - Posted: 1 Feb 2021 | 21:55:57 UTC

Finally, .... it booted directly into the OS. That elimination process worked like a charm.

It turned put that the culprit was one of my 2 RAM sticks. (?!)
Strangely, it didn't boot up with either sticks in slot 4, but once I tried out all combination and plugged the first retired module (from slot 4) back into slot 2, it did boot up. I have no clue how that could have gotten corrupted. As I couldn't believe that, I changed the prior slot 2 stick to slot 4 and I did boot up as well. (???)

Kind of afraid now that this can/could happen again the parts aren't easily accessible under my massive CPU cooler, but glad it worked out fine in the end.

Thanks for your hints!

Profile ServicEnginIC
Avatar
Send message
Joined: 24 Sep 10
Posts: 581
Credit: 9,770,362,024
RAC: 21,500,013
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 56358 - Posted: 1 Feb 2021 | 22:02:21 UTC - in response to Message 56357.

🍾

bozz4science
Send message
Joined: 22 May 20
Posts: 109
Credit: 102,786,176
RAC: 103,187
Level
Cys
Scientific publications
wat
Message 56359 - Posted: 1 Feb 2021 | 22:08:17 UTC

Definitely feels like a reason to celebrate :) Thanks again!

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 1340
Credit: 7,653,273,724
RAC: 13,371,735
Level
Tyr
Scientific publications
watwatwatwatwat
Message 56360 - Posted: 1 Feb 2021 | 23:34:30 UTC

Great news. Kudos for sticking with the troubleshooting formula.

Generally, you can expect new electronics to fail within the first month or so of being put into use.

What the electronics industry calls "infant mortality" This exposes some flaw in the manufacturing process or poor product design or part selection inappropriate for the actual usage.

If a device survives past this stage, you can expect it to last exactly one day past its warranty period. {sarcasm hat on}

Or in reality until some catastrophic system failure like a lightning strike or power mishap or physical damage.

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2343
Credit: 16,206,655,749
RAC: 261,147
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 56361 - Posted: 2 Feb 2021 | 20:12:14 UTC - in response to Message 56357.
Last modified: 2 Feb 2021 | 20:16:23 UTC

It turned put that the culprit was one of my 2 RAM sticks. (?!)
Strangely, it didn't boot up with either sticks in slot 4, but once I tried out all combination and plugged the first retired module (from slot 4) back into slot 2, it did boot up. I have no clue how that could have gotten corrupted. As I couldn't believe that, I changed the prior slot 2 stick to slot 4 and I did boot up as well. (???)

Kind of afraid now that this can/could happen again the parts aren't easily accessible under my massive CPU cooler, but glad it worked out fine in the end.
My advice for the next time it happens:
The memory slots are in direct connection with the CPU. If the memory slot is not physically damaged (there's no strange object(s) between the gold plated connector pins), then you should remove your CPU from it's socket, do a visual check of its pins for bent ones, if there's none then re-seat your CPU, and try again memory slot 4.

In the meantime you should check if there's a new BIOS for your MB on the manufacturer's webpage. If there is, you should flash it (using a pendrive).

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 1340
Credit: 7,653,273,724
RAC: 13,371,735
Level
Tyr
Scientific publications
watwatwatwatwat
Message 56362 - Posted: 2 Feb 2021 | 20:43:15 UTC
Last modified: 2 Feb 2021 | 20:44:49 UTC

If on a LGA socket, yes slightly misaligned pins can cause the loss of a memory channel.

First release the hold down bracket and wiggle the cpu substrate in the socket to try and better align the socket pins with the package pads.

If that doesn't reclaim the missing channels, then remove the cpu and look for misaligned pins in the socket.

Use a high intensity flashlight at a low angle to look for the reflections off the pin ball tips to see if all the pins are aligned in columns and rows.

Use a magnifying glass and a sewing needle to gently nudge the pins that have reflections out of line with the nearest neighbors in each column and row.

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2343
Credit: 16,206,655,749
RAC: 261,147
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 56363 - Posted: 2 Feb 2021 | 20:48:57 UTC - in response to Message 56362.
Last modified: 2 Feb 2021 | 20:51:57 UTC

He's using an AMD Ryzen 7 3700X, it's bottom has pins:

Its socket still could have some bad contacts, so removing the CPU and putting it back might help.

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 1340
Credit: 7,653,273,724
RAC: 13,371,735
Level
Tyr
Scientific publications
watwatwatwatwat
Message 56364 - Posted: 2 Feb 2021 | 23:58:15 UTC

I hadn't looked to see what type of cpu he had.

I've never had any issues with memory channels missing on a PGA socket. In 20 years of PGA socket use.

I HAVE had issues with LGA sockets not reading all memory channels correctly though. Multiple times.

Profile ServicEnginIC
Avatar
Send message
Joined: 24 Sep 10
Posts: 581
Credit: 9,770,362,024
RAC: 21,500,013
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 56366 - Posted: 3 Feb 2021 | 22:08:36 UTC

Profile ServicEnginIC
Avatar
Send message
Joined: 24 Sep 10
Posts: 581
Credit: 9,770,362,024
RAC: 21,500,013
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 56373 - Posted: 6 Feb 2021 | 16:50:34 UTC

Profile ServicEnginIC
Avatar
Send message
Joined: 24 Sep 10
Posts: 581
Credit: 9,770,362,024
RAC: 21,500,013
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 56374 - Posted: 6 Feb 2021 | 16:51:56 UTC

Profile ServicEnginIC
Avatar
Send message
Joined: 24 Sep 10
Posts: 581
Credit: 9,770,362,024
RAC: 21,500,013
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 56375 - Posted: 6 Feb 2021 | 16:52:54 UTC
Last modified: 6 Feb 2021 | 16:53:11 UTC

Please, excuse me for my last three posts.
They are a collateral consequence of too much idle time lately, in the wait for new Work Units at Gpugrid...

😊

Profile ServicEnginIC
Avatar
Send message
Joined: 24 Sep 10
Posts: 581
Credit: 9,770,362,024
RAC: 21,500,013
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 56488 - Posted: 13 Feb 2021 | 22:29:32 UTC

Returning to the matter

One of my hosts had recently the following problem:
When I casually arrived by this host late in the afternoon, I noticed that it was repeatedly restarting, no video, and motherboard POST reporting some problem at video card(s) (one long beep followed by three short ones)
I switched it off and left for a later diagnose.

When I returned with more time available, the system even didn't start at all.
It can be seen at the following video:

Sytem not starting - Video

Clues for diagnose:
- When power switch is pressed, PSU fan starts turning, and also rear fan, that is connected directly to +12V across a molex connector.
- No Beep from system POST (Power On Self Test) is heard. Neither the single normal beep, nor any beep combination for errors.
- CPU fan is stopped, an also both graphics cards fans.
- Motherboard's +5VSB monitoring LED is turned on.

Trying to discard a problem on any peripheral component with a simplified system, both graphics cards, memory modules, PCI WiFi card and SATA drives were disconnected, with no change as can be seen:

Simplified system not starting - Video

With the above simplified configuration, at least an error beep combination indicating lack of RAM should be heard, in the event that "the processor heart is beating"... This isn't a good sign :-(

Next step: Dismounting motherboard for a closer examination.
And immediately the problem got discovered.
As can be seen at previous image, the two +12V supply lines on both motherboard and PSU connectors were totally destroyed (burnt).
Is this the end for both motherboard and PSU?

🤔️🤔️🤔️

-1) I have special affection for that motherboard: It is the one with which I assembled the first computer for my son. (Currently he is using a new computer assembled by himself)
-2) I never give up without giving a try
-3) Good challenge for a hardware enthusiast to get some fun!

Let's go.
I started removing with a cutter the burnt plastic from motherboard power connector, then polishing, tinning, and joining together both +12V supply pins.
A portion of about two inches of 16 AWG cable was attached then to them.
The two original cables bringing current from PSU to motherboard were of 18 AWG section. 16 AWG section cable can carry about double the current than 18 AWG, as seen at tables on this useful link: American Conductor Stranding - AWG Table
Next step was to cut the burnt portion of +12V yellow cables coming from PSU and soldering together.
After that, burnt plastic and electric burnt terminals were removed with the cutter from PSU female connector, leaving a "passthrough channel".
The reworked female connector from PSU was then attached to male connector on motherboard.
Now, +12V supply cables coming from motherboard and PSU were soldered together, and covered with thermo shrinking sleeve.

Time to check whether the efforts are rewarded or not!

This is the final look for this system.
It is currently processing tasks at Primegrid, according to the performance of its two GTX 750 GPUs, that is not enough to process in time the current heavy ADRIA tasks at Gpugrid.
Temperatures and behavior are completely normal.

🤗️

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 1340
Credit: 7,653,273,724
RAC: 13,371,735
Level
Tyr
Scientific publications
watwatwatwatwat
Message 56490 - Posted: 13 Feb 2021 | 22:50:31 UTC

A lot of time to shadetree engineer a fix.

I'm just not that sentimental about hardware, especially OLD hardware.

If it was me, it would have been sent to recycling.

Kudos for the effort.

Profile ServicEnginIC
Avatar
Send message
Joined: 24 Sep 10
Posts: 581
Credit: 9,770,362,024
RAC: 21,500,013
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 56492 - Posted: 14 Feb 2021 | 0:08:37 UTC
Last modified: 14 Feb 2021 | 0:10:26 UTC

A lot of time to shadetree engineer a fix.

Recovering OLD hardware is not the only purpose of this kind of "crazy" repairs.
- In some way, they are an excuse to maintain well trained these kind of skills that I need for my daily field service engineer occupation.
- I consider me to be one of those fortunates that, moreover, enjoy doing it ;-)

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 1340
Credit: 7,653,273,724
RAC: 13,371,735
Level
Tyr
Scientific publications
watwatwatwatwat
Message 56493 - Posted: 14 Feb 2021 | 0:32:46 UTC - in response to Message 56492.

Well I have been soldering since I was a pup. Don't think I would ever forget that skill.

About the only thing that I have forgotten how to do is multi-layer PCB repair which I learned to do with NASA 5300.b certification. But since you have to maintain that cert yearly with testing, that has fallen by the wayside. Pretty sure that I would botch that level of repair if tasked.

What I still get satisfaction from is being able to attach a small SMD device I knocked off the motherboard corner when I carelessly let the mobo rotate a few degrees while securing the board in the case. Of course I cursed the mobo manufacturer for putting a device in a keep out zone in the first place in my opinion.

And that was without the benefit of a hot-air SMD rework station that I don't have access to anymore. Just used my trusty Hakko workstation.

Always amazes me that I can even find those small devices when I knock them off in the first place. A small resistor necessary for letting the BMI interface on my server motherboard work.

The last boo-boo one was near the cpu socket back that let the onboard LAN interface work on my daily driver when I inadvertently scraped it off with the residue of double stick foam tape that I use to secure a 40mm fan to the socket backside with AMD cpus.

An old trick I learned to keep temps down from back in the K6 days.

Profile ServicEnginIC
Avatar
Send message
Joined: 24 Sep 10
Posts: 581
Credit: 9,770,362,024
RAC: 21,500,013
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 56667 - Posted: 22 Feb 2021 | 22:01:17 UTC

Recycling to win

I've always worried about one of my graphics cards, currently the highest performance one, being too hot while processing.
This lead me to write a regarding thread called "Fighting temperature at hardware level".
I also tested to replace thermal compound, with results reported previously at this same thread.
This card is an Asus DUAL-GTX1660TI-O6G, and it is currently running 24/7 at my host #569442

I had also a retired Asus GTX650TIBDC2OC2GD5.
It was for me a good crunching graphics card at my main host, until it become obsolete, overcome by newer technologies.
It is based on a GTX 650 Ti Boost GPU.
One of its strongest points is its excellent, heat pipe based, dual PWM fan heatsink.
Is it possible to reuse this heatsink to manage the overheating problem at the newer GTX 1660 Ti?
Lets give it a try!

Starting point:

GTX 1660 Ti at this host reaching and steady maintaining 80 ºC while processing at maximum performance PrimeGrid CUDA tasks.

Check points:

-1) TDP.
Rated TDP for Turing GTX 1660 Ti is 120 Watts.
Rated TDP for the old Kepler GTX 650 Ti Boost is 134 Watts. Higher, good!

-2) Old heatsink mechanically fitting at new card.
Old heatsink is hitting two choke components at new card. It is a problem.
Some drilling jobs to make space for both components at old heatsink aluminium block. Problem solved.

-3) Mechanical fit for female threads between the two heatsinks.
They aren't compatible. It is a problem.
But I have a 3 mm diameter threading tool, and suitable 3 mm screws, washers and springs. Some additional job... Problem solved.

-4) Memory cooling.
Four of the six memory chips are not covered by the old heatsink. This may cause them to overheat. It is a problem.
But there is enough room under old heatsink to insert individual adhesive heatsinks for each uncovered memory chips. Problem solved.

-5) Fans compatibility.
The PWM fans at original heatsink were at independent configuration, being conducted to an unique connector by means of a concentrator cord.
The PWM fans at recycled heatsink are paralleled, being RPM signal taken from only one of them. It is a problem.
But individual connectors for every fan are compatible, and it is possible to attach the recycled fans at individual configuration by means of concentrator cord from original heatsink. Problem solved.

After solving every intermediate problems, finally the recycled heatsink is attached to the GTX 1660 Ti graphics card.
And attaching fans frame and its electrical connection, we've got this final result.

And here we have the comparative images of the system
Before and After

Has all this work been worth it?
Let's put it to the test...

Ok. System starts (It's great news!)
At resting situation, temperatures for both processor and GPU are below 30 ºC.
And when processing at full performance, temperature for GPU stabilizes at 65 ºC... This is 15 ºC less than the 80 ºC reached with original heatshink at the same conditions. I like it!

Embolden by this, I decided to go one step beyond, and try some overclocking.
This would have been unthinkable with the original heatsink.
Fixing fan settings to 80%, and then + 100 MHz offset to GPU clock and + 500 MHz offset to memory clock, system seems to be stable and temperature remains at a surprising 56 ºC level. I like it even more!
This card was processing Gpugrid new heavy ADRIA tasks in times ranging from 93193 and 93427 seconds.
First task after new configuration took 86893 seconds. Better than I expected, and my personal record for this card... at the steady temperature of 56 ºC.

Definitely, for me, it did worth the job.
And not to mention the fun I got ;-)

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2343
Credit: 16,206,655,749
RAC: 261,147
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 56672 - Posted: 23 Feb 2021 | 1:00:47 UTC - in response to Message 56667.

This card was processing Gpugrid new heavy ADRIA tasks in times ranging from 93193 and 93427 seconds.
First task after new configuration took 86893 seconds. Better than I expected, and my personal record for this card... at the steady temperature of 56 ºC.
You can try to squeeze that 8m 20s to hit the 24h bonus.
Perhaps you can do that without further overclocking your GPU, if you simply stop crunching CPU tasks on that host.

klepel
Send message
Joined: 23 Dec 09
Posts: 189
Credit: 4,718,591,418
RAC: 2,169,719
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 56673 - Posted: 23 Feb 2021 | 1:24:42 UTC

I have a GTX 1650 S with a “problem”. The computer (AMD 2600 on an AMD 450 motherboard), where the card was installed, did not start after it ran for several months without switching the computer off/on.
I tried to troubleshoot the computer and came to the conclusion, that it would be the motherboard. Thankfully my computer technician got the computer working again with another GPU.
In the meanwhile, as I assumed it was a defective motherboard, I tried the GPU on several other computers, always with the same or similar result:
1st computer: After I installed GPU and started the computer, the ventilator on the GPU spun (I can`t remember if I had an image on the monitor). After one or two restarts, the ventilator stopped to spin. I installed the old and working GPU, ventilator spun, but no image. Could not get it to work again.
2nd computer: After I installed the GPU on this computer http://www.gpugrid.net/show_host_detail.php?hostid=523675 in the second slot, computer started and second card got recognized and BOINC downloaded a second WU for the GPU. However GPU crashed several times and after some crashes I noted that the second GPU was not recognized anymore and ventilator stopped to spin . I tried several times, always with the same result.
3rd computer: After I installed GPU and started the computer, the ventilator on the GPU spun (I can`t remember if I had an image on the monitor). After one or two restarts, the ventilator stopped to spin. I installed the old and working GPU, ventilator spun, but no image. Thankfully my computer technician got the computer working again with the old GPU.
So I am hesitating to try this particular GPU in a fourth computer… but as there is GPU shortage, I am wondering, what it might be and what to check?
After the second COVID-lockdown, I might be able to electronic workshop – in Peru there are some repairing GPUs, motherboards etc.

tullio
Send message
Joined: 8 May 18
Posts: 190
Credit: 104,426,808
RAC: 0
Level
Cys
Scientific publications
wat
Message 56675 - Posted: 23 Feb 2021 | 5:49:30 UTC

On my GTX 1650 GPU-Z does not show the fan RPM but says it is 51%.Tasks complete in about 47 hours.
Tullio
____________

Profile ServicEnginIC
Avatar
Send message
Joined: 24 Sep 10
Posts: 581
Credit: 9,770,362,024
RAC: 21,500,013
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 56680 - Posted: 23 Feb 2021 | 17:52:43 UTC - in response to Message 56673.

I have a GTX 1650 S with a “problem”...
.
So I am hesitating to try this particular GPU in a fourth computer.

What is worrying at this case is that a presumable problem at that GPU may cause that system where it is tested get faulty as a result...
Some comments:

Not spinning fans at a graphics card isn't necessarily due to a malfunction.
Some graphics cards models are designed for the fans to start turning only when GPU reaches certain temperature.
If GPU is under low load, there isn't enough power dissipation for the temperature to rise above the level stated for fans to start spinning.

If it was me, I'd start for checking and cleaning the card's PCIE contacts as directed at this previous post.
Followed by dismounting the GPU heatsink and thoroughly cleaning dust from its metallic fins, fan blades, and the whole circuitry.
Dust + humidity can cause disturbing problems for electronics working at such high frequencies as a GPU does.
Nex step, cleaning GPU chip from old grease and renewing it. I prefer to use a good non-conductive, self-spreading thermal grease for this.
And after reassembling everything in reverse order:
I'd try first to testing it in a minimum risk configuration, disconnecting every drives, both signal and supply cables.
Of course, +12V PCIE supply connector for the graphics card must be connected.
At this configuration, you can try to start the system to check if there is video and it is possible to enter BIOS.
If there is no video, you can suspect for a true serious problem at the graphics card.
If you are able to enter BIOS, jump to different menus, and everything looks normal, then you can take the risk for a further test, after switching system off and reconnecting the OS drive...

🤞️🤞️

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 1620
Credit: 8,825,716,430
RAC: 19,538,814
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 56686 - Posted: 24 Feb 2021 | 18:23:41 UTC

Is there a hardware doctor in the house? This one isn't looking too happy, so while we have a pause, I'll pull it out and take a look.



The same card, a few minutes apart. Both fans have been bouncing around for a few days. Fan % has been stable at 88%. GPU clock and power follow the fans - apply a bit of fan, the card speeds up and draws more power. It's an Asus GTX 1660 Super Dual Evo, under warranty. The supplier says Asus is likely to take it back and refund the price, but there are no new ones in the UK to buy with the proceeds. I can check for dust bunnies etc., but if I go much further it'll probably void the warranty and break the card completely. It's reached 96% of a D3RBandit task (slowly!), and it's displaying a perfectly clear image on my 1920x1200 monitor, so it can't be too far gone.

Suggestions?

Ian&Steve C.
Avatar
Send message
Joined: 21 Feb 20
Posts: 1069
Credit: 40,231,533,983
RAC: 527
Level
Trp
Scientific publications
wat
Message 56687 - Posted: 24 Feb 2021 | 18:56:06 UTC - in response to Message 56686.

it’s definitely thermal throttling. And the fans look really iffy. Are they actually spinning when it says they aren’t? At those temps the fans should be 100%.

Do you have an option to send the card for RMA (repair or replacement)? Usually situations like this the warranty will either fix the card or replace it with an equal product, not just refund the purchase price. At least that’s been my experience in the US with ASUS. I would look up the RMA process for the U.K. on ASUS’s website.
____________

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 1620
Credit: 8,825,716,430
RAC: 19,538,814
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 56688 - Posted: 24 Feb 2021 | 19:19:51 UTC

In the UK, the retailer is given the legal responsibility for sorting all that out. In this case, it's a local independent system builder / gamer hangout / business support / trade counter: I've used them for decades, and they have a good reputation. No problems there.

At the moment, I'm leaving the box undisturbed under my workbench until this task is finished - I haven't eyeballed the fans yet. It's moved on two checkpoints since I posted, so I may get a chance to look later tonight - otherwise tomorrow morning. The machine also has a GTX 1650, so I'll move that to the 16x slot: it can keep crunching smaller jobs while I think about it.

Ian&Steve C.
Avatar
Send message
Joined: 21 Feb 20
Posts: 1069
Credit: 40,231,533,983
RAC: 527
Level
Trp
Scientific publications
wat
Message 56689 - Posted: 24 Feb 2021 | 19:44:27 UTC - in response to Message 56688.
Last modified: 24 Feb 2021 | 19:49:15 UTC

while the retailer having the legal responsibility to sort it out for you, do you have the CHOICE to instead send it directly to ASUS for RMA?

i do believe the problem is likely with the fans. the % is just what it's set to, but the tach reading shows how fast it's actually spinning. and in the case of your two pics, it's indicating that the fans aren't spinning or are intermittently spinning.

bumping the fan percentage is probably providing enough power in the motor to overcome some stiction, allowing the fans to spin up, cooling the GPU below thermal throttling limit, allowing the GPU to increase the clocks and this is likely why this results in observed increase in power consumption.
____________

Profile ServicEnginIC
Avatar
Send message
Joined: 24 Sep 10
Posts: 581
Credit: 9,770,362,024
RAC: 21,500,013
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 56691 - Posted: 24 Feb 2021 | 20:29:58 UTC - in response to Message 56686.

I've liked very much the monitoring tool you're showing pics of. Thank you!
It is very complete and comprehensive.
I didn't know about it, since I'm lately crunching under Ubuntu Linux OS most of the time. There I use Psensor utility.
I take note to test it on my dual Linux/Windows 10 host, the next time I enter Windows for my periodic update schedule.

Regarding the problem itself, I agree you both that fans are not working as expected for those GPU temperatures and Fan % driving.

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 1620
Credit: 8,825,716,430
RAC: 19,538,814
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 56692 - Posted: 24 Feb 2021 | 21:33:27 UTC

Well, the plucky little guy has crunched what may very well be its last WU - task 32541017. Valid, and no reported errors: though the 3 day+ runtime will mess up rod's statistics!

I've pulled the card, and the host is flying again on one wing. No obvious signs of damage, and the fans turn to the finger without stiffness. I couldn't see them in situ because the second card was too close.

Yes, ASUS do have a UK centre with a direct RMA procedure. But it handles so many classes of equipment it's hard to navigate, and when you get there it says both barcodes aren't valid. And neither matches the serial number on the invoice. I'll leave it for tonight, and try again tomorrow.

mmonnin
Send message
Joined: 2 Jul 16
Posts: 337
Credit: 7,617,757,013
RAC: 10,860,147
Level
Tyr
Scientific publications
watwatwatwatwat
Message 56693 - Posted: 24 Feb 2021 | 22:00:13 UTC

Pull the cooler off to see if there is good contact. Reapply paste, etc. Be careful of any thermal pads on GDDR. Warranty void if removed stickers are illegal.

https://www.ifixit.com/News/11748/warranty-stickers-are-illegal#:~:text=Most%20consumers%20don't%20know,language%20of%20your%20warranty%20says.

Ian&Steve C.
Avatar
Send message
Joined: 21 Feb 20
Posts: 1069
Credit: 40,231,533,983
RAC: 527
Level
Trp
Scientific publications
wat
Message 56694 - Posted: 24 Feb 2021 | 22:30:17 UTC - in response to Message 56693.

Pull the cooler off to see if there is good contact. Reapply paste, etc. Be careful of any thermal pads on GDDR. Warranty void if removed stickers are illegal.

https://www.ifixit.com/News/11748/warranty-stickers-are-illegal#:~:text=Most%20consumers%20don't%20know,language%20of%20your%20warranty%20says.


They’re illegal in the US. But maybe not in the U.K. where Richard is located.

I still think it’s a fan issue since the % shows 88% but the tach wasn’t showing that the fan was actually spinning
____________

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 1340
Credit: 7,653,273,724
RAC: 13,371,735
Level
Tyr
Scientific publications
watwatwatwatwat
Message 56696 - Posted: 25 Feb 2021 | 0:52:51 UTC - in response to Message 56691.
Last modified: 25 Feb 2021 | 0:55:29 UTC

While it isn't as snazzy looking as GPU-Z, you do have a pretty good looking gpu GUI monitoring application called gpu-mon available in Linux. It is part of the gpu-utils suite by Ricks-Lab.

https://github.com/Ricks-Lab/gpu-utils

Profile ServicEnginIC
Avatar
Send message
Joined: 24 Sep 10
Posts: 581
Credit: 9,770,362,024
RAC: 21,500,013
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 56698 - Posted: 25 Feb 2021 | 6:16:23 UTC - in response to Message 56696.

While it isn't as snazzy looking as GPU-Z, you do have a pretty good looking gpu GUI monitoring application called gpu-mon available in Linux. It is part of the gpu-utils suite by Ricks-Lab.

Certainly, it is very attractive for a tight monitoring of GPU. Thank you very much.

bozz4science
Send message
Joined: 22 May 20
Posts: 109
Credit: 102,786,176
RAC: 103,187
Level
Cys
Scientific publications
wat
Message 56699 - Posted: 25 Feb 2021 | 15:29:52 UTC
Last modified: 25 Feb 2021 | 15:44:16 UTC

I am also in need for a GPU diagnosis. Same symptom as before. WU downloaded, compuation starts successfully, randomly during the run I sometimes hear fans suddenly spinning down (for no apparent reason) and looking at GPU-Z, the GPU just stops computing... Almost as if it is just exhausted after a couple hours without any interruption and needs some rest. If I pause/suspend the WU and then restart from the last checkpoint, it immediately starts computing again.

Today I let it rest for 20 min, as I was curious if it’d start again on its own, but unfortunately it didn't. Continiously suspending/unsuspending feels like it cannot be my remedy forever....
Kind of bummed with this.

GPU suddenly stops

GPU starts back up again after suspension/restart

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 1340
Credit: 7,653,273,724
RAC: 13,371,735
Level
Tyr
Scientific publications
watwatwatwatwat
Message 56700 - Posted: 25 Feb 2021 | 18:05:23 UTC

Some sort of Windows sleep/hibernate/idle detection going on?

Windows doesn't see any mouse/keyboard activity and so idles the gpu?

Some other Windows monitoring software idling the gpu?

Ian&Steve C.
Avatar
Send message
Joined: 21 Feb 20
Posts: 1069
Credit: 40,231,533,983
RAC: 527
Level
Trp
Scientific publications
wat
Message 56701 - Posted: 25 Feb 2021 | 18:43:34 UTC - in response to Message 56700.

Some sort of Windows sleep/hibernate/idle detection going on?

Windows doesn't see any mouse/keyboard activity and so idles the gpu?

Some other Windows monitoring software idling the gpu?


in addition, maybe some driver wonkiness? with strange issues like this I always recommend trying to wipe out (use DDU on windows, from Safe Mode), and a full re-install with the package from Nvidia, not allowing windows to install drivers automatically.

also what are the BOINC compute settings? are you giving it 100% CPU "time"? do you have anything setup to pause the GPU crunching? perhaps a setting to pause when the computer is in use? or an exclusive app setting that stops boinc when some other app is running? are you running other projects? is BOINC switching the computation to another project? all things you should check off the list.

something strange that caught my eye when looking at his pics, was that when the GPU load stops, the PCIe bus load shoots up, and vice versa when computation restarts. I'm having a hard time explaining what might cause that.
____________

Profile ServicEnginIC
Avatar
Send message
Joined: 24 Sep 10
Posts: 581
Credit: 9,770,362,024
RAC: 21,500,013
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 56702 - Posted: 25 Feb 2021 | 19:06:07 UTC - in response to Message 56699.

WU downloaded, compuation starts successfully, randomly during the run I sometimes hear fans suddenly spinning down (for no apparent reason) and looking at GPU-Z, the GPU just stops computing...

I've noticed recently a similar behavior while experiencing with overclocking with my reworked GTX 1660 Ti graphics card, as related at previous posts.
When power consumption reached rated TDP, and trying clock frequencies for GPU above a certain limit, suddenly GPU clock slowed down to a minimum value of 300 MHz.
The only solution being a system restart, and trying lower frequencies.
At borderline frequencies, this didn't happen immediately, sometimes after few seconds, and sometimes after few minutes.
As this system is running under Linux, and yours under Windows, I guess that there might be some kind of self protection based on graphics card firmware, and thus being OS independent.

As can be seen at GTX 1660 SUPER specifications, stock Boost clock for this GPU is 1785 MHz.
At your images, your GPU clock frequency at full performance is getting 1950 MHz, pretty higher than that...

Profile ServicEnginIC
Avatar
Send message
Joined: 24 Sep 10
Posts: 581
Credit: 9,770,362,024
RAC: 21,500,013
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 56703 - Posted: 25 Feb 2021 | 19:08:23 UTC - in response to Message 56701.

something strange that caught my eye when looking at his pics, was that when the GPU load stops, the PCIe bus load shoots up, and vice versa when computation restarts. I'm having a hard time explaining what might cause that.

I wonder about the same.

bozz4science
Send message
Joined: 22 May 20
Posts: 109
Credit: 102,786,176
RAC: 103,187
Level
Cys
Scientific publications
wat
Message 56704 - Posted: 25 Feb 2021 | 20:39:35 UTC
Last modified: 25 Feb 2021 | 20:49:04 UTC

A bit of background that might add some value to exploring this further.

The monitor is connected to the 1660 Super card, being the primary one as opposed to the 750 Ti. While the card was overclocked from when I initially deployed the system here up until a week ago when Ian&steve advised to scale it back or even forfeit it altogether. So I guess this is not the issue with my card ServicEginIC. It is only running at moderate clock speeds (+40 MHz Core / + 200 MHz mem) as opposed to ~125 MHz/+250 MHz offset on the core/mem clock on other projects. Additionally, I adjusted the thermal limit down to 67 C to keep the fans down as I am directly sitting next to the computer.

Never experienced this problem on other projects except for MLC (Boinc project) and F@H. Those problems occured around the same time when I installed the CUDA development toolkit and CUDA runtime back in January, when I tinkered a bit with NN programming with Keras. After this caused said issues across multiple projects, I uninstalled it as soon as finished my project.

Compute settings are set to 100% CPU time, keeping tasks in memory upon suspension, no exclusive applications set to date, suspend computation if application uses > 65% of CPU (programming stuff mainly). The last aspect is however completetly unrelated to my issue, as every time this happend, I was not running any application at or over the defined threshold and all other running WUs were crunching along just fine.

I am beginning to suspect that this might be the culprit here. Maybe a clean install of the lastest driver is the way to go... I'll tackle this as soon as my GPU Grid tasks finish up.

Might the high PCIe bus load be caused by the CPU trying to take over and continue computing the GPU task?


If this won't solve it, I'll continue looking into the possibility of some software or Windows setting putting the GPU into hibernation mode like Keith suggested.

Thx for sharing your thoughts with me!

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2343
Credit: 16,206,655,749
RAC: 261,147
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 56705 - Posted: 26 Feb 2021 | 13:11:17 UTC - in response to Message 56704.

While the card was overclocked from when I initially deployed the system here up until a week ago when Ian&steve advised to scale it back or even forfeit it altogether. So I guess this is not the issue with my card ServicEginIC. It is only running at moderate clock speeds (+40 MHz Core / + 200 MHz mem)
You should revert back to factory overclock, or even set lower frequencies / power limit.
Overclocking the GPU memory is highly not recommended for GPUGrid.

as opposed to ~125 MHz/+250 MHz offset on the core/mem clock on other projects.
Stable GPU overclocking achiveved with other projects is deceitful.
90% GPU usage by the GPUGrid app results in higher GPU power draw at lower GPU frequency than 90% GPU usage by other projects. Therefore you should not consider the apps of other projects as a reference of calibrating GPU overclock for GPUGrid.

Additionally, I adjusted the thermal limit down to 67 C to keep the fans down as I am directly sitting next to the computer.
Lowering the thermal limit would increase the fan speed (at the same GPU frequencies/voltages). Or maybe I'm missing something?

Never experienced this problem on other projects except for MLC (Boinc project) and F@H.
That doesn't matter.

Those problems occured around the same time when I installed the CUDA development toolkit and CUDA runtime back in January, when I tinkered a bit with NN programming with Keras. After this caused said issues across multiple projects, I uninstalled it as soon as finished my project.
A full uninstall is recommended in this case.
Download the latest driver, download DDU, then disable the networking on your PC and start DDU (it will restart in safe mode, do the cleaning, then restart in normal mode), then install the latest driver, lastly enable the networking.

Maybe a clean install of the lastest driver is the way to go...
Do not set any overclocking after you installed the latest driver. Let it crunch 10 tasks, then you can try increasing GPU clocks.

Might the high PCIe bus load be caused by the CPU trying to take over and continue computing the GPU task?
The GPUGrid app is constantly polling the GPU. When the GPU is in normal state, it will return some subresult to the CPU. The CPU does some Double Precision calculations with it, then puts it back to the GPU. When the GPU locks up, it doesn't return anything, so the polling is repeated at a much higher rate, which results in higher PCIe bus load.

If this won't solve it, I'll continue looking into the possibility of some software or Windows setting putting the GPU into hibernation mode like Keith suggested.
It's not the Windows, it's the overclocking, or the interference of some GPU tool/app with the GPUGrid app.

Ian&Steve C.
Avatar
Send message
Joined: 21 Feb 20
Posts: 1069
Credit: 40,231,533,983
RAC: 527
Level
Trp
Scientific publications
wat
Message 56706 - Posted: 26 Feb 2021 | 13:45:29 UTC - in response to Message 56705.
Last modified: 26 Feb 2021 | 14:28:04 UTC

Overclocking the GPU memory is highly not recommended for GPUGrid.

I've not had a single issue with mild overclocks on GPU memory with GPUGrid. Personally I only overclock the memory to the default P0 state clocks, On Turing this is +400MHz. this has never caused an issue across many many GPUs, and I know Keith OCs his memory even further without issue. the OP doesn't appear to be even pushing the clocks that far, so I doubt this is an issue for him, unless there is something defective with the GPU hardware.

Lowering the thermal limit would increase the fan speed (at the same GPU frequencies/voltages). Or maybe I'm missing something?

You're missing something. In Windows overclocking with newer nvidia GPUs, you can set thermal limits for the overclock with certain software. it will limit the clock speeds based on temperatures. fan speeds are ONLY controlled by the fan curves that are set, whether it be the default or a custom user curve.


The GPUGrid app is constantly polling the GPU. When the GPU is in normal state, it will return some subresult to the CPU. The CPU does some Double Precision calculations with it, then puts it back to the GPU. When the GPU locks up, it doesn't return anything, so the polling is repeated at a much higher rate, which results in higher PCIe bus load.

do you know for a fact that the application operates this way? In regards to the shuffling of data to the CPU for DP processing. that sounds like a waste of resources when any GPU that is capable of processing GPUGRID tasks, also are capable of DP processing. it would make a lot more sense to have the GPU do that, and it would be faster. Do you have information from the devs about this? Can you link to it please? polling certainly is the reason for the CPU thread being pegged to 100% for each GPU tasks, as you see this with most nvidia/CUDA loads in other projects, but only GPUGRID has the high PCIe bus use. and from my experience, PCIe bus load is only high when computation is actually happening (at least under Linux, I can't really attest to PCIe load under the windows app). i see about 25% PCIe bus load on a PCIe 3.0 x16 link, or around 40% on a PCIe 3.0 x8 link. 80-90% on a 3.0x4 link. My interpretation of the PCIe bus use for GPUGRID is that it's constantly reading data out of the disk into memory and sending it over to the GPU, or constantly swapping data between the system memory and GPU, across the PCIe bus. it's clear that the app doesnt cache all the necessary data for each task since very little GPU memory is used. And certain beta tasks that have popped up in the past which used a large portion of GPU memory, also saw a huge reduction in PCIe use.

but this is all contrary to what the OP is seeing, he's seeing low PCIe use during computation, and high use when not. That's completely opposite how these tasks usually operate.
____________

bozz4science
Send message
Joined: 22 May 20
Posts: 109
Credit: 102,786,176
RAC: 103,187
Level
Cys
Scientific publications
wat
Message 56707 - Posted: 26 Feb 2021 | 13:57:48 UTC - in response to Message 56706.
Last modified: 26 Feb 2021 | 14:03:11 UTC

Overclocking the GPU memory is highly not recommended for GPUGrid
My motivation for this was just to reduce NVIDIA's P0-state memory clock penalty whenever using it for CUDA-enabled applications that was discussed here some time ago.

Stable GPU overclocking achiveved with other projects is deceitful. ...Therefore you should not consider the apps of other projects as a reference of calibrating GPU overclock for GPUGrid.
I guess you are most certainly right on this point Zoltan!

A full uninstall is recommended in this case.
That's what my gut feeling is telling me. I will probably look into the DDU-tool as soon as my task pipeline has emptied.

Very interesting to hear about the app inner workings and the GPU polling by the CPU. Never thought about it this much before.

do you know for a fact that the application operates this way?
Would be highly interested to hear more about this!

It's not the Windows, it's the overclocking, or the interference of some GPU tool/app with the GPUGrid app
I still believe it's the latter one. I'll be smarter in a few days when I can analyse whether a clean install of the driver solved the issue.

the OP doesn't appear to be even pushing the clocks that far, so I doubt this is an issue for him.
I am 100% with you on this one. Especially, as I have intermittently been running at full stock speeds and the same issue occurred.

he's seeing low PCIe use during computation, and high use when not. That's completely opposite how these tasks usually operate.
I am getting ever more confused .... I'll see if reinstalling drivers will do the trick for me.

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 1620
Credit: 8,825,716,430
RAC: 19,538,814
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 56708 - Posted: 26 Feb 2021 | 14:36:39 UTC - in response to Message 56692.

ASUS do have a UK centre with a direct RMA procedure. But it handles so many classes of equipment it's hard to navigate, and when you get there it says both barcodes aren't valid. And neither matches the serial number on the invoice. I'll leave it for tonight, and try again tomorrow.

Couldn't get the ASUS site to co-operate, so went the reseller route. Phone call: they looked up the order number in seconds, confirmed warranty status, and issued an RMA without quibble. And emailed me a label for courier collection, valid at my local convenience store.

So its up to Asus now. I'm not holding my breath that the rest of the procedure will be so slick.

Ian&Steve C.
Avatar
Send message
Joined: 21 Feb 20
Posts: 1069
Credit: 40,231,533,983
RAC: 527
Level
Trp
Scientific publications
wat
Message 56709 - Posted: 26 Feb 2021 | 15:04:04 UTC - in response to Message 56708.

ASUS do have a UK centre with a direct RMA procedure. But it handles so many classes of equipment it's hard to navigate, and when you get there it says both barcodes aren't valid. And neither matches the serial number on the invoice. I'll leave it for tonight, and try again tomorrow.

Couldn't get the ASUS site to co-operate, so went the reseller route. Phone call: they looked up the order number in seconds, confirmed warranty status, and issued an RMA without quibble. And emailed me a label for courier collection, valid at my local convenience store.

So its up to Asus now. I'm not holding my breath that the rest of the procedure will be so slick.


good luck with the RMA!

I've only dealt with ASUS RMA a few times (and of course my experience is with their North America system) for a couple GPUs and a motherboard. all times were relatively painless, but they didn't give a lot of communication during the process. but hopefully you should get a replacement product instead of refund.

____________

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2343
Credit: 16,206,655,749
RAC: 261,147
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 56710 - Posted: 26 Feb 2021 | 19:35:40 UTC - in response to Message 56706.
Last modified: 26 Feb 2021 | 19:38:33 UTC

Lowering the thermal limit would increase the fan speed (at the same GPU frequencies/voltages). Or maybe I'm missing something?
You're missing something. In Windows overclocking with newer nvidia GPUs, you can set thermal limits for the overclock with certain software. it will limit the clock speeds based on temperatures. fan speeds are ONLY controlled by the fan curves that are set, whether it be the default or a custom user curve.
I thought that it works that way, but in this case what is the point of overclocking the GPU?
Does the thermal limit not cancel the GPU frequency increase? (rhetorical question)

The GPUGrid app is constantly polling the GPU. When the GPU is in normal state, it will return some subresult to the CPU. The CPU does some Double Precision calculations with it, then puts it back to the GPU. When the GPU locks up, it doesn't return anything, so the polling is repeated at a much higher rate, which results in higher PCIe bus load.
do you know for a fact that the application operates this way?
We know nothing about the new app. This is intentional from the project's part.
However the previous app worked that way, and I have a strong feeling that the present one works the same way.

In regards to the shuffling of data to the CPU for DP processing. that sounds like a waste of resources when any GPU that is capable of processing GPUGRID tasks, also are capable of DP processing. it would make a lot more sense to have the GPU do that, and it would be faster. Do you have information from the devs about this?
Yes I have.
Double Precision is intentionally crippled in the gaming GPUs.
It was crippled by the driver software before, but in current GPUs it's done by the hardware.

Can you link to it please?
Sure. Here you are: https://www.gpugrid.net/forum_thread.php?id=4259&nowrap=true#42862

EDIT:
note that different batches of workunits could behave different ways (i.e. tolerate different overclocking), regardless that the same app processes them.

Ian&Steve C.
Avatar
Send message
Joined: 21 Feb 20
Posts: 1069
Credit: 40,231,533,983
RAC: 527
Level
Trp
Scientific publications
wat
Message 56711 - Posted: 26 Feb 2021 | 20:39:20 UTC - in response to Message 56710.
Last modified: 26 Feb 2021 | 21:00:16 UTC

Yes I have.
Double Precision is intentionally crippled in the gaming GPUs.
It was crippled by the driver software before, but in current GPUs it's done by the hardware.

Sure. Here you are: https://www.gpugrid.net/forum_thread.php?id=4259&nowrap=true#42862


maybe you didn't understand the question.

what you linked states that certain calculations "can't be done by the GPU" and some loads are being sent to the CPU. absolutely nothing states that these calculations are double precision. this appears to be speculation on your part. as even GPUs from 5 years ago were more than capable of doing DP loads, and doing it on the GPU, even if crippled artificially by nvidia, is still faster than DP on a single CPU thread. CPUs from that time could do like 100-200GFlops of FP64 (multithreaded), but GPUs are on the order of TFlops. and at least as fast as a CPU on the lower end cards.

unless the developer simply didnt have the knowledge to be able to code in GPU DP, but I doubt that's the case.

and you're comparing the old apps, so I don't think you can logically compare to the new apps. they've made great efficiency improvements from those days if they've been able to get an app from 30-40% GPU utilization to up to 90+% on MUCH faster hardware.
____________

Profile ServicEnginIC
Avatar
Send message
Joined: 24 Sep 10
Posts: 581
Credit: 9,770,362,024
RAC: 21,500,013
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 56712 - Posted: 26 Feb 2021 | 20:53:12 UTC

Anyway, please, note the first bold recommendation at Gpugrid's Performance tab, currently still broken.

bozz4science
Send message
Joined: 22 May 20
Posts: 109
Credit: 102,786,176
RAC: 103,187
Level
Cys
Scientific publications
wat
Message 56713 - Posted: 26 Feb 2021 | 21:00:19 UTC
Last modified: 26 Feb 2021 | 21:01:55 UTC

True! That's why I approached this very slowly and carefully, working myself upwards to find a stable clock rate. I first ran a couple WUs after slightly offsetting clock speeds to make sure there weren't any issues and continued only after the runs with the prior OC setting were successful.

While my 750Ti has crunched many WU already in the past, my "newer" card only has crunched ~10 WU so far, due to the recent WU drought since the beginning of this year. But I must have scaled it upwards too quickly (probaly impatience after the long pause + way longer runtimes) as evident by 2 WUs having thrown errors in the past week. That's when I approached you all for advice.

Since then, I scaled the OC way back as can be seen on the provided screenshots from GPU-Z in my prior post.

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2343
Credit: 16,206,655,749
RAC: 261,147
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 56723 - Posted: 28 Feb 2021 | 10:44:41 UTC - in response to Message 56711.

Sure. Here you are: https://www.gpugrid.net/forum_thread.php?id=4259&nowrap=true#42862

what you linked states that certain calculations "can't be done by the GPU" and some loads are being sent to the CPU. absolutely nothing states that these calculations are double precision.
You are right about that post, but the actual precision of the calculations done by the CPU makes no difference regarding the question in the OP (i.e. the reason behind the higher bus usage when there's no GPU usage).

this [i.e. FP64 is done by the CPU] appears to be speculation on your part.
Everything about the app is a speculation, as there's no available documentation of it.
We speculate the way it works by the errors we encounter and by the way we fix / avoid those errors.

as even GPUs from 5 years ago were more than capable of doing DP loads, and doing it on the GPU, even if crippled artificially by nvidia, is still faster than DP on a single CPU thread. CPUs from that time could do like 100-200GFlops of FP64 (multithreaded), but GPUs are on the order of TFlops. and at least as fast as a CPU on the lower end cards.

unless the developer simply didnt have the knowledge to be able to code in GPU DP, but I doubt that's the case.
They know how to do SP on the GPU and the ratio of the FP32 processing speed on GPU vs on CPU is way much higher than the FP64 processing speed on GPU vs on CPU. So it makes less likely that the calculations need to be done on the CPU are FP32.

and you're comparing the old apps, so I don't think you can logically compare to the new apps. they've made great efficiency improvements from those days if they've been able to get an app from 30-40% GPU utilization to up to 90+% on MUCH faster hardware.
The 30-40% GPU utilization was a rare and extreme exception. It was always in the 83-93% range (on a not CPU overcomitted system).

I remember that many years ago some developer posted something like "FP64 is done on the CPU", but I couldn't find it, as I don't remember the exact wording. Maybe this post got hidden in the meantime.

However there's a post by GDF from the GTX580->GTX680 times (it was 9 years ago):
https://www.gpugrid.net/forum_thread.php?id=2776&nowrap=true#24089

Profile ServicEnginIC
Avatar
Send message
Joined: 24 Sep 10
Posts: 581
Credit: 9,770,362,024
RAC: 21,500,013
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 56724 - Posted: 28 Feb 2021 | 15:59:11 UTC
Last modified: 28 Feb 2021 | 16:16:53 UTC

Experiences of a newbie overclocker

A long time ago, I concluded that overclocking related activities were very time and effort consuming.
Since then, I decided to get (if any) factory overclocked hardware, and yield all the related tests to the manufacturer.
Well, I confess that I've suffered a relapse.

Encouraged by the good results of reworking my GTX 1660 Ti graphics card, a challenge arised: To achieve this 120 Watts TDP, 1536 CUDA cores graphics card, getting full bonus with current batch of heavy ADRIA tasks.

My special thanks to:
Keith Myers for this post.
OCNfranz for this post
Ian&Steve C. for this post
Retvari Zoltan for this post

Medium times for this card on its first ten (10) ADRIA tasks of current batch was 93370 seconds.
Full bonus right is got if time elapsed between the task is sent by the server and a valid result is uploaded and reported is less than 24 hours (86400 seconds).
There is a difference of 6970 seconds in processing time (1.94 hours), plus download time, plus time to start task, plus time to upload result and report. A minimum 8.1 % performance increase must be achieved.
Too much difference, it is impossible! :-(a regular mind thinking).
Squeeze it to the maximum and give it a try! (an hardware enthusiast mind thinking otherwise ;-)

As I related at mentioned previous "Recycling to win" post, My first attempt resulted in a processing time of 86892.38 seconds on this card, running under Linux Operating System.
Actions taken for this:
- I opened a Terminal window and executed the following command, as of Keith Myers suggestion:

sudo nvidia-xconfig --thermal-configuration-check --cool-bits=28 --enable-all-gpus

After that and a reboot, entering NVIDIA X Server Settings, a new option "Enable GPU Fan Settings" will be available at "Thermal Settings" section.
- I enabled this option and and set both fans to 80%. An appreciable temperature decrease was observed, from 65 ºC to 56 ºC. This is a good starting point for temperature not to be a problem to start overclocking.

Also, new options for applying frequency offsets to both GPU and memory clocks are now available at "PowerMizer" section. These offsets are set to "0" by default, and will return to this value every time that system is rebooted.
- Following Ian&Steve C. advice, an offset of 500 MHz is applied to memory clock, thus raising from P2 11500 MHz default frequency to 12000 MHz. This is the default value for memory clock when working at P0 level, so it should be able to manage it without problems.
- An offset of 100 MHz is applied to GPU clock. This results on GPU clock frequency rising from default 1920 MHz to 2010 MHz.
At this point, GPU is hitting its rated TDP of 120 Watts, or even a bit more, as shown by nvidia-smi.

At this conditions, one ADRIA task was processed, resulting in a process time reduction down to already mentioned 86892 seconds.

Not enough to get full bonus.
And if trying to raise GPU clock frequency to a value higher than 100, it causes to trip some kind of protection that lowers the frequencies to minimum values: 300MHz for GPU clock, and 810 MHz for memory clock.
Well... This time, achieving the challenge is impossible...
...Or not?

I had read the above mentioned OCNfranz post with the recipe to modify TDP limit value for the GPU. And I thought: Is it possible not only to lower this value, but also raise it?
Rated TDP for GTX 1660 Ti GPU is 120 Watts. A good engineer should apply a minimum 5% margin for driver electronics and related circuitry.
So let's bet for Nvidia and Asus engineers being good ones. 5% of 120 watts is 6 Watts.
I raise my bet for 126 Watts by means of the following command:

sudo nvidia-smi --power-limit=126

With the following response. It seems to work!
Important note: Please, be careful I don't recommend doing this. It might cause the GPU or its associated electronics to get burnt with no way back!

Will this allow to increase the GPU clock frequency offset without tripping the protections?
Now it was possible to raise GPU clock offset by 130 MHz.
A new Gpugrid ADRIA task was started processing, and nvidia-smi reported as this.
This time, following also the Retvary Zoltan advice, the GPU task was being processed in exclusive, with no CPU tasks in execution.

And the result: Task #32548599 was processed in 86322.69 seconds. Below 86400 for the first time!...
...But adding download and upload times for that task, full bonus was missed by... 29 seconds.

One last fine tuning: I was made my tests by raising GPU clock offset in 10 MHz increments. An offset of 140 MHz was found to be unstable. Let's try 135 MHz...
With this setup, Task #32548977 was processed and finished on 85871.06 seconds, getting full bonus for the first time, with a time margin of 8 minutes and one second left. Challenge completed!
A new Task #32549543 was processed in the same conditions, resulting in a processing time of 85845.95 seconds. This is my absolute record for this card, and also received full bonus, with a time margin of 8 minutes and 25 seconds left.
A third Task #32550212 was started to process, but it finished prematurely after 2509.29 seconds, with error "CUDA_ERROR_INVALID_PC (718)". This might indicate that this setup is at the very limit of hardware capabilities... (?)

Conclusions:
- I've performed those tests on a reworked graphics card, resulting in an overdimensioned heat dissipation capability, comparing to the original card. This make me feel reinforced on the work was worth it.
The original DUAL-GTX1660TI-O6G Asus card is endowed wit an excellent electronics, but I've got the impression that the stock heatsink is not at the same level.

My empirical tests seem to confirm some Retvari Zoltan assertions:
- Overcommiting the CPU decreases the performance on highly demanding GPU tasks as are the Gpugrid ones.
I've reduced to one only concurrent CPU task as my standard for this system, thus leaving available 75% of CPU cores.
- The same overclocking setup is not necessarily valid for all kind of tasks.
Testing with PrimeGrid tasks (currently Gpugrid tasks are very scarce), a maximum GPU clock frequency offset of 115 MHz was found to be stable. Higher offsets than this, caused the frequencies to drop down spontaneously as previously mentioned.
I'm leaving 115 MHz offset as my standard for this graphics card, at least until an steady supply for Gpugrid tasks be available again.
With this setup, a new Task #32550279 is started, and I'm waiting for it to finish. No hope for getting full bonus this time.

Please, feel free to share your own experiences, or ask for any clarification.

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 1340
Credit: 7,653,273,724
RAC: 13,371,735
Level
Tyr
Scientific publications
watwatwatwatwat
Message 56726 - Posted: 28 Feb 2021 | 19:42:19 UTC
Last modified: 28 Feb 2021 | 19:42:46 UTC

Great post. Yes, tuning for overclocking does take time for the trial and error.

Best outcome is a 100% stable OC for all the projects you are running for a set and forget configuration.

BTW, you can set up a BASH script to automate your overclock. You can run it before you start BOINC and all your gpu overclocks will automatically be applied.

Here is mine from this daily driver. Ask for clarification of anything you don't understand.


#!/bin/bash

/usr/bin/nvidia-smi -pm 1

/usr/bin/nvidia-smi -acp UNRESTRICTED

nvidia-smi -i 0 -pl 200
nvidia-smi -i 1 -pl 200
nvidia-smi -i 2 -pl 200

/usr/bin/nvidia-settings -a "[gpu:0]/GPUPowerMizerMode=1"
/usr/bin/nvidia-settings -a "[gpu:1]/GPUPowerMizerMode=1"
/usr/bin/nvidia-settings -a "[gpu:2]/GPUPowerMizerMode=1"

/usr/bin/nvidia-settings -a "[gpu:0]/GPUFanControlState=1"
/usr/bin/nvidia-settings -a "[fan:0]/GPUTargetFanSpeed=100"
/usr/bin/nvidia-settings -a "[fan:1]/GPUTargetFanSpeed=100"
/usr/bin/nvidia-settings -a "[gpu:1]/GPUFanControlState=1"
/usr/bin/nvidia-settings -a "[fan:2]/GPUTargetFanSpeed=100"
/usr/bin/nvidia-settings -a "[fan:3]/GPUTargetFanSpeed=100"
/usr/bin/nvidia-settings -a "[gpu:2]/GPUFanControlState=1"
/usr/bin/nvidia-settings -a "[fan:4]/GPUTargetFanSpeed=100"
/usr/bin/nvidia-settings -a "[fan:5]/GPUTargetFanSpeed=100"

/usr/bin/nvidia-settings -a "[gpu:0]/GPUMemoryTransferRateOffset[4]=800" -a "[gpu:0]/GPUGraphicsClockOffset[4]=80"
/usr/bin/nvidia-settings -a "[gpu:1]/GPUMemoryTransferRateOffset[4]=800" -a "[gpu:1]/GPUGraphicsClockOffset[4]=80"
/usr/bin/nvidia-settings -a "[gpu:2]/GPUMemoryTransferRateOffset[4]=800" -a "[gpu:2]/GPUGraphicsClockOffset[4]=80"

Profile ServicEnginIC
Avatar
Send message
Joined: 24 Sep 10
Posts: 581
Credit: 9,770,362,024
RAC: 21,500,013
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 56728 - Posted: 1 Mar 2021 | 6:15:33 UTC - in response to Message 56726.

Thank you for sharing your knowledge.
It is a condensed manual of overclocking related commands.
I appreciate it very much.

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 1340
Credit: 7,653,273,724
RAC: 13,371,735
Level
Tyr
Scientific publications
watwatwatwatwat
Message 56729 - Posted: 1 Mar 2021 | 6:30:12 UTC - in response to Message 56728.

The fan control interfaces changed from Pascal to Turing so you have to be aware of that. Two individual interfaces on Turing compared to one for Pascal

Also the number of power levels changed from 3 to 4 so you have configure for that also.

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 1620
Credit: 8,825,716,430
RAC: 19,538,814
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 56737 - Posted: 2 Mar 2021 | 16:48:09 UTC - in response to Message 56709.

good luck with the RMA!

Despatched the parcel on Friday.

Monday, 13:26
We have received the following items from you on RMA ...

Tuesday, 14:49
Test Result: Tested
FAIL - FAN CLOSEST TO THE PORTS KEEPS STOPPING AND STARTING WHERE AS THE OTHER ONE IS CONSISTANT
Action Taken: Item credited back to original card on credit note Mxxxxxx.

Can't quibble with that for legal compliance, but where do I get a replacement card from ??!!

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 1340
Credit: 7,653,273,724
RAC: 13,371,735
Level
Tyr
Scientific publications
watwatwatwatwat
Message 56738 - Posted: 2 Mar 2021 | 17:33:37 UTC - in response to Message 56737.

That's why I have been sitting on doing nothing for two cards that have one of the fans stop running. They still are able to keep fairly cool and the clocks don't drop too badly.

Still within their warranty period for another year but without any replacement stock available, I don't need the cash, I need to keep using the cards.

I'll revisit the issue when I get closer to the warranty period ending and hope there will be stock available for replacement.

Ian&Steve C.
Avatar
Send message
Joined: 21 Feb 20
Posts: 1069
Credit: 40,231,533,983
RAC: 527
Level
Trp
Scientific publications
wat
Message 56739 - Posted: 2 Mar 2021 | 18:28:26 UTC - in response to Message 56738.

That's why I have been sitting on doing nothing for two cards that have one of the fans stop running. They still are able to keep fairly cool and the clocks don't drop too badly.

Still within their warranty period for another year but without any replacement stock available, I don't need the cash, I need to keep using the cards.

I'll revisit the issue when I get closer to the warranty period ending and hope there will be stock available for replacement.



Keith, in your case, EVGA is a lot better about this. they will replace the card with the same thing, or something comparable (even giving upgrades to whatever they have).
____________

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 1340
Credit: 7,653,273,724
RAC: 13,371,735
Level
Tyr
Scientific publications
watwatwatwatwat
Message 56740 - Posted: 2 Mar 2021 | 19:55:27 UTC - in response to Message 56739.

Yes, true in the past . . . but in this new reality, they have just been refunding your purchase.

That is what I have been reading in the EVGA forums.

Unless you have specific recent experiences within the last few months.

Ian&Steve C.
Avatar
Send message
Joined: 21 Feb 20
Posts: 1069
Credit: 40,231,533,983
RAC: 527
Level
Trp
Scientific publications
wat
Message 56744 - Posted: 2 Mar 2021 | 21:19:53 UTC - in response to Message 56740.
Last modified: 2 Mar 2021 | 21:27:15 UTC

I've never seen EVGA refund on a one-time RMA. they always replace it with SOMETHING. I would call their support line to double check. I haven't seen any reports of them issuing refunds recently, at least not for a one-time RMA. i've seen some exceptions where someone RMA'd a 3080 or a 3090 recently like 3-4 times in a very short amount of time (1-2 months) and EVGA offered to just refund at that point. but that's an extraodinary situation. Their normal process is to always replace. their RMA info doesnt even mention giving refunds (outside of refunding collateral for advanced RMA)

I sent in a 2080ti last month, they sent me the exact same model back. I guess they had it in stock. EVGA does keep many models on hand (not for sale) for RMA exchanges. I wouldnt be surprised if they have your model available too (but they wont tell you for sure if they do, they'll just tell you "if we don't have the exact model, we'll replace it with something comparable").
____________

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 1340
Credit: 7,653,273,724
RAC: 13,371,735
Level
Tyr
Scientific publications
watwatwatwatwat
Message 56746 - Posted: 2 Mar 2021 | 21:41:33 UTC

I actually haven't started any RMA process. I have a RTX 2070 with the fan over the VRM's not running.

I have the blower fan on a RTX 2080 Hybrid not running.

Ian&Steve C.
Avatar
Send message
Joined: 21 Feb 20
Posts: 1069
Credit: 40,231,533,983
RAC: 527
Level
Trp
Scientific publications
wat
Message 56749 - Posted: 2 Mar 2021 | 21:56:31 UTC - in response to Message 56746.
Last modified: 2 Mar 2021 | 21:56:50 UTC

I know, I'm just saying that you shouldn't be all that scared about getting a refund instead of an actual replacement product with EVGA in the US. no need to really hold off. just make sure you send it in before the warranty expires lol.
____________

Jim1348
Send message
Joined: 28 Jul 12
Posts: 819
Credit: 1,591,285,971
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 56750 - Posted: 2 Mar 2021 | 23:54:42 UTC - in response to Message 56749.

I know, I'm just saying that you shouldn't be all that scared about getting a refund instead of an actual replacement product with EVGA in the US. no need to really hold off. just make sure you send it in before the warranty expires lol.

I sent in a bad EVGA GTX 980 and got a new GTX 1070 in return.
I wish I could do it again.

Profile ServicEnginIC
Avatar
Send message
Joined: 24 Sep 10
Posts: 581
Credit: 9,770,362,024
RAC: 21,500,013
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 56761 - Posted: 6 Mar 2021 | 23:34:09 UTC



Profile ServicEnginIC
Avatar
Send message
Joined: 24 Sep 10
Posts: 581
Credit: 9,770,362,024
RAC: 21,500,013
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 56762 - Posted: 8 Mar 2021 | 22:20:21 UTC

Computer Gremlins

I received today a whatsapp from my sister, reporting strange behavior of her home computer.
She sent me the image that can be seen below, with an error regarding overcurrent at an USB device, while none devices were connected at that moment.
I was writing a return message indicating that I couldn't help, when I realized that today is a local rainy day at Tenerife.
It can be seen at the second image, taken directly from my home weather station.
Problems like this, are typically caused by wetted dust at inconvenient places over motherboard electronics. In this case, most likely short circuiting the USB over current signal.
Sometimes those errors mysteriously resolve themselves... until a new rainy day comes.
If this was the problem, the true solution is a thorough motherboard cleaning.

Error


Weather

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 1340
Credit: 7,653,273,724
RAC: 13,371,735
Level
Tyr
Scientific publications
watwatwatwatwat
Message 56763 - Posted: 9 Mar 2021 | 2:10:52 UTC

Or spiderwebs on the electronics allowing dust to glom onto parts and connections.

I had to "debug" my weather station outside sensor array last summer to get rid of spurious readings and glitches.

Profile ServicEnginIC
Avatar
Send message
Joined: 24 Sep 10
Posts: 581
Credit: 9,770,362,024
RAC: 21,500,013
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 56769 - Posted: 13 Mar 2021 | 23:25:40 UTC

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 1340
Credit: 7,653,273,724
RAC: 13,371,735
Level
Tyr
Scientific publications
watwatwatwatwat
Message 56770 - Posted: 14 Mar 2021 | 2:51:56 UTC - in response to Message 56769.

Hahah ha LOL. I would have changed the labeling of "patiently" to "im-patiently"

What else ya gonna do . . . . twiddle your thumbs . . . . or dig around in the bit box and scramble something up to keep your mind active.

Profile ServicEnginIC
Avatar
Send message
Joined: 24 Sep 10
Posts: 581
Credit: 9,770,362,024
RAC: 21,500,013
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 56774 - Posted: 16 Mar 2021 | 23:00:19 UTC

Laptops Design Engineers LOL

I recently discovered the meaning of LOL (Thank you, Keith)
I had seen it mentioned here and there, but I entered Wikipedia in search of its meaning, and it is the acronym of "Laughing Out Loud"
Then I thought that perhaps it might have been developed by laptops Design Engineers, thinking about Field Service Engineers for laptops: "I managed to compact all these small things inside this, now I want to see how are you managing to service it, hahahaaah LOL"
One example can be seen at the next HP Support video, regarding how to exchange the Real Time Clock and BIOS setup backup battery.
https://www.youtube.com/watch?v=nuy0gA38G1I

I had to exchange this battery in a laptop that was causing problems due to spontaneous CMOS memory corruption and RTC returning to 1/1/2000 when main battery was unplugged.
Life for these backup batteries is about three to five years. Therefore, when they become exhausted, usually product warranty has expired...
Here we have the problematic laptop.
After removing the bottom cover by actuating over the opening levers, we get this inside panorama.
A close look shows that the RTC battery is enclosed under an intermediate frame, as seen in previous video.
If I can evite to dismount the whole laptop to gain access to replace the battery... I'll do! ... How?
I'm opening an imaginary window on the intermediate frame, wide enough to easily remove the old battery and replace it by the new one.
Here we have the window opened, and battery to be replaced is at sight, protected with an insulating shroud.
This is the battery out of home through the window. It is a 3V CR2032 lithium battery, not a regular, but a wired one.
In a quick check, voltage for the old battery shows 0.498 Volts: It is completely dead.
Checking the replacement new battery, the reading is 3.310 Volts. This is the normal value for a new one.
The second half of the intervention consists of recovering the original cables and connector and implant them to the new battery.
Here is half the way, and later a covering of thermo shrinking sleeve is applied for a perfect isolation.
The new battery is then replaced into position, and ready for the photo finish.
This time there were neither leftover nor missing screws... because it wasn't necessary to unscrew any!

L😂️L

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 1340
Credit: 7,653,273,724
RAC: 13,371,735
Level
Tyr
Scientific publications
watwatwatwatwat
Message 56775 - Posted: 16 Mar 2021 | 23:12:27 UTC - in response to Message 56774.

You were fortunate for having a little glimpse of the battery and cable through that aperture.

If it would have not been at least partially exposed you would have had to dismantle the backside covering to find the battery location.

And now future battery replacement is just as easy. Smart idea to notate the date of replacement.

Profile ServicEnginIC
Avatar
Send message
Joined: 24 Sep 10
Posts: 581
Credit: 9,770,362,024
RAC: 21,500,013
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 56789 - Posted: 21 Mar 2021 | 12:59:14 UTC - in response to Message 56769.
Last modified: 21 Mar 2021 | 13:34:28 UTC


Profile ServicEnginIC
Avatar
Send message
Joined: 24 Sep 10
Posts: 581
Credit: 9,770,362,024
RAC: 21,500,013
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 56805 - Posted: 28 Mar 2021 | 15:37:27 UTC

Profile ServicEnginIC
Avatar
Send message
Joined: 24 Sep 10
Posts: 581
Credit: 9,770,362,024
RAC: 21,500,013
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 56827 - Posted: 8 Apr 2021 | 21:20:31 UTC - in response to Message 56805.


A couple of notes about the previous image:

-1) If you thought that the main motif is familiar to you, you're right.
It is the back of the package of a LGA775 socket Intel processor. More specifically, an Intel Core 2 Quad Q8400

-2) Regarding the score shown: 02021
2021 might appear to be a prime number, but it is not. It is composed by two factors: 43 and 47
Pacman has eaten 43 round terminals, 47 points each, this results in the 02021 score shown.
Why a leading zero?.. If pacman ate all the 775 terminals, the final score would be 36425 points.
2021?.. What does that sound like to me?

-3) It is proven that a lack of work like current, leads me to think of strange things.
The dead time otherwise employed watching statistics...


Yes! You're a really sharp observer: You have noticed that I said "a couple of notes", and finally it was three ;-)

Pop Piasa
Avatar
Send message
Joined: 8 Aug 19
Posts: 252
Credit: 458,054,251
RAC: 0
Level
Gln
Scientific publications
watwat
Message 56895 - Posted: 26 May 2021 | 17:41:10 UTC

Can anyone edify me on what effect the anti-mining feature now included on RTX30xx GPUs has on CUDA capability? They only talk about hash rate and my knowledge is lacking. Will these GPUs be slowed down running ACEMD tasks (once they are CUDA 1.2 updated)

TIA for any tutoring!
____________
"Together we crunch
To check out a hunch
And wish all our credit
Could just buy us lunch"


Piasa Tribe - Illini Nation

Profile ServicEnginIC
Avatar
Send message
Joined: 24 Sep 10
Posts: 581
Credit: 9,770,362,024
RAC: 21,500,013
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 56896 - Posted: 26 May 2021 | 18:33:43 UTC - in response to Message 56895.

Can anyone edify me on what effect the anti-mining feature now included on RTX30xx GPUs has on CUDA capability? ...

I'd be interested on the same topic.
Merely for information, since I'm not owning any graphics card based on Ampere GPUs so far.
And no expectation of any coming in a near future, given that I've noted a crazy lack of last generation cards at local market.
I took a look yesterday, and the most powerful card I found available was a GTX 1650 (Turing GPU series) graphics card...

Pop Piasa
Avatar
Send message
Joined: 8 Aug 19
Posts: 252
Credit: 458,054,251
RAC: 0
Level
Gln
Scientific publications
watwat
Message 56897 - Posted: 26 May 2021 | 21:21:57 UTC - in response to Message 56896.

...the most powerful card I found available was a GTX 1650 (Turing GPU series) graphics card...


Same here in the US for the most part. You can get better cards if you want to pay way more than they used to be worth. Even the 1650s are overpriced, as mine were $140 US before all this madness.

It seems the only way to get a powerful GPU is to buy a pre-built machine that includes it in the package, and very few appear to have Turing GPUs. Mostly Ampere in that market sector.
Do they even still make Turing GPUs?
____________
"Together we crunch
To check out a hunch
And wish all our credit
Could just buy us lunch"


Piasa Tribe - Illini Nation

Ian&Steve C.
Avatar
Send message
Joined: 21 Feb 20
Posts: 1069
Credit: 40,231,533,983
RAC: 527
Level
Trp
Scientific publications
wat
Message 56898 - Posted: 26 May 2021 | 22:15:40 UTC - in response to Message 56895.
Last modified: 26 May 2021 | 22:17:23 UTC

Can anyone edify me on what effect the anti-mining feature now included on RTX30xx GPUs has on CUDA capability? They only talk about hash rate and my knowledge is lacking. Will these GPUs be slowed down running ACEMD tasks (once they are CUDA 1.2 updated)

TIA for any tutoring!


It wont have any impact on anything other than the Ethereum algorithm. The ETH algorithm (also known by its name, Dagger-Hashimoto, or sometimes referred to as ethash. this algorithm is very unique in how it works and makes it easy for nvidia to program detection into their GPUs.

note, that Nvidia's "anti-mining" feature ONLY targets this algorithm, and other forms of mining on GPUs with other algorithms are NOT targeted. you can absolutely still use these GPUs with other mining algorithms unrestricted.

this feature will not affect any form of CUDA/OpenCL compute workloads that are not using the ETH algorithm, that means that GPUGRID (or any other BOINC project) will not be affected.

but of course, right now, it's all moot anyway, since the only cards that have this feature are Ampere cards, and Ampere cards dont work here on GPUGRID yet since the developers haven't updated the application to support them.
____________

Pop Piasa
Avatar
Send message
Joined: 8 Aug 19
Posts: 252
Credit: 458,054,251
RAC: 0
Level
Gln
Scientific publications
watwat
Message 56900 - Posted: 26 May 2021 | 22:52:49 UTC - in response to Message 56898.

It wont have any impact on anything other than the Ethereum algorithm.


That, hombre, is wonderful news for this project. The only missing factor is available time to upgrade ACEMD tasks to the latest CUDA.
Or will that have to come from the software vendor?

(As always, thanks for sharing your knowledge)
____________
"Together we crunch
To check out a hunch
And wish all our credit
Could just buy us lunch"


Piasa Tribe - Illini Nation

Profile ServicEnginIC
Avatar
Send message
Joined: 24 Sep 10
Posts: 581
Credit: 9,770,362,024
RAC: 21,500,013
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 56906 - Posted: 28 May 2021 | 10:28:10 UTC
Last modified: 28 May 2021 | 10:30:25 UTC

Can anyone edify me on what effect the anti-mining feature now included on RTX30xx GPUs has on CUDA capability?
It wont have any impact on anything other than the Ethereum algorithm.

Nice to know.
Thank you very much you both.

Profile ServicEnginIC
Avatar
Send message
Joined: 24 Sep 10
Posts: 581
Credit: 9,770,362,024
RAC: 21,500,013
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 56937 - Posted: 6 Jun 2021 | 21:17:38 UTC

Graphics card cleaning

During a temperature survey, I noticed high values for one of the graphics cards on a triple GPU system.
Psensor monitoring application refers to this graphics card as "NVIDIA GeForce GTX 1650 2"
It is physically the middle one card on this 3x GTX 1650 blended GPU system, having the worst air circulation of them.
Temperature for this card was hitting 81 ºC, and then stabilizing at 80 ºC.
This is enough reason for me to discard some problem on its refrigeration system.

Both fans are turning, no problem on them.
A full maintenance consists of dismounting heatsink and fans, cleaning them and circuitry thoroughly, and replacing the thermal paste.
Here is the affected graphics card on the table.
This other image shows the heatsink and fans frame dismounted apart.
I'm starting by cleaning the anodized aluminum heatsink with a rigid bristles brush.
I usually remove any labels attached to fins, to improve air circulation as much as possible.
I also removed any aged thermal paste leftovers by means of cotton swabs and alcohol.
Here is the heatsink appearance before and after cleaning.
Now is time for the fans frame.
I'll use a soft bristles brush and optics blower for cleaning the whole assembly.
Here is the fans frame before and after cleaning.
And lastly, I'm cleaning the circuitry from stacked dust, and removing thermal paste leftovers from GPU chip and silicon. Here is its final look.
Now is time to remount everything again, starting by dispensing some self-spreading thermal paste over GPU chip's silicon surface.
My favorite way to remount heatsink is by lying it upwards on the border of the table, and then aproaching the graphics card circuitry while watching a perfect match by looking coincidence through card's fixing holes.
Then I slightly attach screws on two corners, then on the other two, and fix them completely by turning gradually to the end, following a cross pattern.
Now the graphics card is ready to attach the fans frame again (please, don't forget to reattach fan's electrical connector first ;-)
Finally, graphics card is again at its working position.

Currently, this graphics card is running slightly overclocked (+55 MHZ), with a fixed 70% fan rate to enhance air circulation. Enough to get 48H bonus, that it was missing earlier.
Differences in working temperatures can be appreciated by comparing the before and after graphics.

Profile ServicEnginIC
Avatar
Send message
Joined: 24 Sep 10
Posts: 581
Credit: 9,770,362,024
RAC: 21,500,013
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 56948 - Posted: 12 Jun 2021 | 0:46:04 UTC

Desoldering - Soldering Practices



We're using this old wireless mouse to take a look to some electrical desoldering - soldering basics.
It has a true problem: Left key switch activation has been getting harder and harder, making it uncomfortable for double click, or even for single click push.
This mouse has three switches, being the central one in perfect condition and nearly unused in daily routine.
The main objective is to see how to unsolder for swapping both switches and how to solder again.
The secondary objective is to recover this mouse to a comfortable use, instead of discarding it. (Yes, I'm a hard recycler, you know ;-)
And a tertiary goal would be to encourage someone to take his/her own desoldering - soldering practices. Will you? Any electronics scrap would be useful as practicing material...

Let's start...
Ooops! Two philips screws removed, and we have the mouse torn apart into pieces.
Here we have a close look of the main circuit. The two microswitches for left and right keys are clearly visible, and the third one is partially hidden under the wheel mechanism.
The wheel mechanism is esasily removed by grasping two plastic tabs, and we have at sight the two implied microswitches. I've marked the defective left key microswitch with a black dot, for not giving any chance to get confused with the good one after desoldering both.
For a better handling while desoldering, I've held the circuit at a holding clamp.
Now we are desoldering the three electrical terminals for both microswitches. Several tips here to take in mind:
- Be patient to wait for the soldering iron to reach its right working temperature. If it is not hot enough, soldering tin will not get completely fluid.
- I'm using a desoldering pump. Charge the desoldering pump plunger by pressing it to its latching position.
- Heat each terminal until the soldering tin is molten, testing it by pushing the terminal and checking that it tilts. But not for too long: the microswitches have a plastic body that would melt if averheated.
- Bring the desoldering pump nozzle as close as possible to the soldering point, separate the soldering iron tip, and immediately press the desoldering pump aspirating button.
These operations can be seen in sequence at the following video:


After desoldering the terminals, push them gently to check that they are moving free on its mounting hole at printed circuit board (PCB).
It can be seen at the following video:

If any of them doesn't get free, it is usually better to re-solder with fresh soldering tin, and then desoldering again.

When every terminals are free, it will be possible to extract the microswitch by pulling it and tilting with a zigzag movement, as shown at this video:


Following the related procedure, both microswitches are released now.
At this other image can be seen the employed desoldering pump, the mouse PCB, the holding clamp, the soldering tin reel, and soldering tin leftovers from desoldering.
Now we have swapped the two microswitches, and both are inserted at their new locations, ready for soldering.
Several other tips:
- I'm temporary fixing the microswitch central terminal by means of a small amount of fresh soldering tin, while pushing it at its final position.
- I'm soldering definitely the other two terminals, melting fresh soldering tin at each terminal base, then ending with an ascending movement of the soldering iron tip along the terminal. I'm being brief in that operations, for not heating the terminals too much. Otherwise, microswitch plastic body might overheat and deform.
- Then, I'm soldering definitely the central terminal the same way.
All these steps can be seen at the next video:


Here are the finished electrical connections after soldering.
And here is the final look of mouse PCB after swapping micrositches.

It's moment for re-mounting and testing: Ok!


And finally, a listing image of the employed material, hoping you have enjoyed it as I did 🤗️

Profile ServicEnginIC
Avatar
Send message
Joined: 24 Sep 10
Posts: 581
Credit: 9,770,362,024
RAC: 21,500,013
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 57207 - Posted: 17 Jul 2021 | 21:58:09 UTC

Well... I'm still learning about publishing contents 😊
Excuse me for repeating the same previous post, this time the videos are re-coded for a better compatibility with more platforms (I hope)
After trying several combinations, I found the best compatibility in .mp4 format, x264 video codec.
Utility used for re-code: Any Video Converter


Desoldering - Soldering Practices



We're using this old wireless mouse to take a look to some electrical desoldering - soldering basics.
It has a true problem: Left key switch activation has been getting harder and harder, making it uncomfortable for double click, or even for single click push.
This mouse has three switches, being the central one in perfect condition and nearly unused in daily routine.
The main objective is to see how to unsolder for swapping both switches and how to solder again.
The secondary objective is to recover this mouse to a comfortable use, instead of discarding it. (Yes, I'm a hard recycler, you know ;-)
And a tertiary goal would be to encourage someone to take his/her own desoldering - soldering practices. Will you? Any electronics scrap would be useful as practicing material...

Let's start...
Ooops! Two philips screws removed, and we have the mouse torn apart into pieces.
Here we have a close look of the main circuit. The two microswitches for left and right keys are clearly visible, and the third one is partially hidden under the wheel mechanism.
The wheel mechanism is esasily removed by grasping two plastic tabs, and we have at sight the two implied microswitches. I've marked the defective left key microswitch with a black dot, for not giving any chance to get confused with the good one after desoldering both.
For a better handling while desoldering, I've held the circuit at a holding clamp.
Now we are desoldering the three electrical terminals for both microswitches. Several tips here to take in mind:
- Be patient to wait for the soldering iron to reach its right working temperature. If it is not hot enough, soldering tin will not get completely fluid.
- I'm using a desoldering pump. Charge the desoldering pump plunger by pressing it to its latching position.
- Heat each terminal until the soldering tin is molten, testing it by pushing the terminal and checking that it tilts. But not for too long: the microswitches have a plastic body that would melt if averheated.
- Bring the desoldering pump nozzle as close as possible to the soldering point, separate the soldering iron tip, and immediately press the desoldering pump aspirating button.
These operations can be seen in sequence at the following video:


After desoldering the terminals, push them gently to check that they are moving free on its mounting hole at printed circuit board (PCB).
It can be seen at the following video:

If any of them doesn't get free, it is usually better to re-solder with fresh soldering tin, and then desoldering again.

When every terminals are free, it will be possible to extract the microswitch by pulling it and tilting with a zigzag movement, as shown at this video:


Following the related procedure, both microswitches are released now.
At this other image can be seen the employed desoldering pump, the mouse PCB, the holding clamp, the soldering tin reel, and soldering tin leftovers from desoldering.
Now we have swapped the two microswitches, and both are inserted at their new locations, ready for soldering.
Several other tips:
- I'm temporary fixing the microswitch central terminal by means of a small amount of fresh soldering tin, while pushing it at its final position.
- I'm soldering definitely the other two terminals, melting fresh soldering tin at each terminal base, then ending with an ascending movement of the soldering iron tip along the terminal. I'm being brief in that operations, for not heating the terminals too much. Otherwise, microswitch plastic body might overheat and deform.
- Then, I'm soldering definitely the central terminal the same way.
All these steps can be seen at the next video:


Here are the finished electrical connections after soldering.
And here is the final look of mouse PCB after swapping micrositches.

It's moment for re-mounting and testing: Ok!


And finally, a listing image of the employed material, hoping you have enjoyed it as I did 🤗️



Pop Piasa
Avatar
Send message
Joined: 8 Aug 19
Posts: 252
Credit: 458,054,251
RAC: 0
Level
Gln
Scientific publications
watwat
Message 57208 - Posted: 18 Jul 2021 | 0:19:10 UTC

I find that Noctua and Be-Quiet fans are excellent in my rigs on all three aspects of fan performance- lifespan, CFM/watt and noise rating.
Neither manufacturer uses anything but the latest bearing types. They are a bit more expensive, but you get what you pay for.

I'm sure there are other brands that are as highly recommendable also in that same price range. Just my take.
____________
"Together we crunch
To check out a hunch
And wish all our credit
Could just buy us lunch"


Piasa Tribe - Illini Nation

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2343
Credit: 16,206,655,749
RAC: 261,147
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 57216 - Posted: 19 Jul 2021 | 11:28:26 UTC - in response to Message 57208.

The new fan series (P14, P12, P8) of Arctic cooling are excellent.
I use Noctua (NH-U12A with NF-A12x25 plus some older models) as well, but the Arctic P series is even better.
They have very low power consumption, they are silent, theny have very good bearing and produce "beamed" airflow.

bozz4science
Send message
Joined: 22 May 20
Posts: 109
Credit: 102,786,176
RAC: 103,187
Level
Cys
Scientific publications
wat
Message 57217 - Posted: 19 Jul 2021 | 11:49:28 UTC - in response to Message 57216.
Last modified: 19 Jul 2021 | 11:50:06 UTC

I second that! They offer great value and especially their 3 and 5 fan boxes (depending on availability and where you live) are a great deal. They can be a little bit more cumbersome to clean though. I've got mine (P14 PWM) in a set of 5 for as little as 25€ last year and that is only ~ 1 or 2 Noctua fans.

Arctic does also offer a new series called Bionix and I find that especially their Bionix F140 PWM fans are very powerful exhaust fans and still run quietly albeit notable. At a rated 104 CFM/ 176 m³/h they do move quite some air.

Their "Co"- branded fan series (short for continuous) come with a dual ball bearing that offers even quieter operation than their standard P-series. It should offer a longevity bonus but it is too soon for me to verify their claims based on my own just 2 years experience with them.

Profile ServicEnginIC
Avatar
Send message
Joined: 24 Sep 10
Posts: 581
Credit: 9,770,362,024
RAC: 21,500,013
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 57218 - Posted: 19 Jul 2021 | 13:27:25 UTC

Thank you for sharing your experiences.
It's also my opinion that mounting quality fans is a clever inversion.
I was searching for a slim 92 mm PWM fan for completing my spares box, and I've found available at a local provider this Noctua NF-A9x14 PWM.
It should replace any of my CPU coolers fan if needed.

Profile ServicEnginIC
Avatar
Send message
Joined: 24 Sep 10
Posts: 581
Credit: 9,770,362,024
RAC: 21,500,013
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 57226 - Posted: 25 Jul 2021 | 16:47:00 UTC - in response to Message 57218.

I received the Noctua NF-A9x14 PWM fan that I ordered.
I was curious about testing it in replacement to one original CPU cooler fan.
And I decided to test it in my most challenging host for this, this triple GPU system.
I'm replacing the original fan at its Arctic Freezer 13 CO CPU cooler.
Under it, there is an Intel Core i7-9700F CPU.
Here is how it originally looks like.
First step consists of dismounting the CPU cooler fan frame. It is easy at this model, since fixings are by means of four plastic tabs, two at each side. Nice for periodical maintenance.
Noctua fan packaging is very complete in accessories.
And fan appearance is very attractive.
I'm reserving untouched the original fan, and I'll attach the new one by means of cable ties.
I find that cable ties are a very useful wildcard for many situations like this...
This is a front view of the new fan mounted, and this other is a rear view.
Now it is time for testing the cooling performance.
PrimeGrid is the most CPU power demanding project that I handle. During tests, the system was processing three Genefer 21 GPU tasks, four PPS-Mega (LLR) CPU tasks, and one CPU core intentionally left free for not overcommitting the processor.
I took screenshots of Psensor readings before and after replacing the fan. Lets compare them:

Arctic original regular fan: 65 ºC medium temperature at stability.

Noctua replacement slim fan: 67 ºC medium temperature at stability.

My conclusion:
Air flux in original fan is stronger, thus resulting in about two ºC lower working temperature at CPU full load, comparing to the one achieved with the slim fan.
Therefore, I'm better reserving the slim fan for any situation where there is some physical limitation for using a regular one.
Now, the system under test is working again with the original CPU cooler fan.
Performance in new processors is dependent on working temperatures, as higher turbo frequencies are applied when lower temperatures are kept.
As a general rule, the lower temperatures achieved, the better for performance and hardware lifespan.

Profile ServicEnginIC
Avatar
Send message
Joined: 24 Sep 10
Posts: 581
Credit: 9,770,362,024
RAC: 21,500,013
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 57229 - Posted: 26 Jul 2021 | 22:44:29 UTC

🔥

Local summer at Canary Islands, good season to continue with temperature tests...
After playing with CPU cooler fan at this system in my previous post, I thought: I should be able to improve heat dissipation, let's give a try!
I remembered that I had some amount of this high performance thermal compound left. It's time to squeeze it all out.
Arctic Freezer 13 CO is a CPU cooler based on a solid copper core, so it meets Conductonaut thermal compound requirements.
Current thermal paste used at CPU cooler was NOX TG-1. It is a general purpose, non-conductive, high quality thermal paste, easy to apply.
The first step for replacing the existing paste with Conductonaut thermal compound is removing the old paste leftovers and thoroughly cleaning with alcohol, both processor and cooler surfaces.
I've extracted Intel Core i7-9700F CPU from motherboard, and I've put it in an anti-static box for a more comfortable and safe handling.
Important to be kept in mind: Conductonaut compound is based on a liquid conductive metal alloy. Any spillage over motherboard must be avoided.
And specially for me at this motherboard, my favorite one, a Gigabyte Z390 UD.
I would recommend the Ultra Durable line of Gigabyte motherboards: None of the ones I'm using has failed yet, after several years of non-stop 24/7 working...
Following Conductonaut applying instructions, the next step consists of dispensing a little amount at the center of processor metallic surface.
It is curious how it arranges in a ball-shaped drop.
Now it's time to spread it over the whole surface. For this, I'm using one of the special swabs included in Conductonaut kit.
After patiently spreading, dispensing more compound litle by little when necessary, the whole surface becomes covered.
The same procedure is to be applied over CPU cooler surface.
Starting with a little amount of Conductonaut thermal compound, and finishing when the whole surface is covered.
Now, the processor is installed again at its socket, and also the mounting bracket for the CPU cooler.
I like these kind of fixings that can be mounted from above, like are the Arctic Freezer 13 CO ones, because dismounting motherboard is not required.
Finally, the CPU cooler is carefully replaced and fixed at working position.
This image shows one of the cooler corners. A ball-shaped, little amount of thermal compound was found to be overflowing at this corner, and it was carefully removed by aspirating it with the thin curved needle and syringe supplied with the kit.
And the system is ready again for working hard.

Lets test it!

Initial NOX TG-1 thermal paste: 65 ºC medium temperature at stability.


Replacement Conductonaut thermal compound: 62 ºC medium temperature at stability.


For me, this 3 ºC temperature reduction at processor full load was worth it. 👍️

Notes:
- The original idea for me to try Thermal Grizzly Conductonaut thermal compound came from a Retvari Zoltan advice, Message #53412. I'm still grateful for this.
- I like to say that I'm involved in a continuous learning process. My special thanks to Gpugrid platform, for harboring threads like this one and many others that make sharing experiences possible.

bozz4science
Send message
Joined: 22 May 20
Posts: 109
Credit: 102,786,176
RAC: 103,187
Level
Cys
Scientific publications
wat
Message 57258 - Posted: 2 Sep 2021 | 16:39:22 UTC
Last modified: 2 Sep 2021 | 17:18:51 UTC

Finally invested a little time to optimise my GPU (Asus Strix 1660S) for power efficiency. Although I went through the whole process through using E@H FGRP app, the same implications are valid for other projects. Additionally, the shorter runtimes are great for a larger sample to test your GPU settings and validate your claims against.

I started by running 1 task initially, but never got to 100% load, thus went for 2 units that were running concurrently. That brought avg. utilisation up to 99%.

Moving on to my testing results using 2 concurrent WUs. I did gather the data using GPU-Z and HWinfo
Before:
Power limit (PL): no
Power usage: 118W (avg)
V core: 1.125V (max) / 1.037V (avg) / 0.997V (min)
Core freq: 2025MHz (max.) / 2015 MHz (avg) / 1995 MHz (min)
Mem freq: 7100 MHz (all)
Fan: 40%
T GPU: 63.2C (max) / 62.8C (avg) / 61.8C (min)
T VRM: 54C (all)
T Hotspot: 75.8C (max) / 74.5 (avg)

Same data reported after I incorporated the following changes:
- upward revision of fan curve (little more than 1% for every degree C to still keep noise down)
- lower OC on core
- lowered PL

Power limit (PL): yes @83%
Power usage: 103W (avg)
V core: 1.012V (max) / 0.991V (avg) / 0.975V (min)
Core freq: 2010MHz (max.) / 1998 MHz (avg) / 1980 MHz (min)
Mem freq: 7100 MHz (all)
Fan: 58%
T GPU: 54.5C (max) / 53.7C (avg) / 53.3C (min)
T VRM: 49C (all)
T Hotspot: 67.9C (max) / 66.5 (avg)

Average runtimes (estimated from WU results log) based on running 1 or 2 units concurrently:
- for 1 WU: 455 sec
- for 2 WUs (original): 880 sec
- for 2 WUs (new): 890 sec

That is roughly 1.13% slower for 12.71% lower power draw. This large gap also leaves a lot of room for any inaccuracies in the estimates of average runtimes for the 3 tiers listed above.

In terms of throughput efficiency, I consult the following table. First I list the estimated avg. WUs computed per hours. Then I compute the resulting credits (1 WU awards 3,465 credits). The third table lists credit per W (normalised efficiency):

WUs/h
- for 1 WU: 7.91 WUs/h
- for 2 WUs (original): 8.18 WUs/h
- for 2 WUs (new): 8.09 WUs/h

credit/h
- for 1 WU: 27,408
- for 2 WUs (original): 28,344
- for 2 WUs (new): 28,032

normalised credit/h/W
- for 1 WU (avg. 115W): 238
- for 2 WUs (original): 240
- for 2 WUs (new): 272

In the end, my GPU card is now running on average 9.1C cooler on the core, drawing 15W less power, with mere 17MHz average core clock ∆, while VRMs are 5C cooler, fans still run moderately silent, core clock does not frequently boost up and down, leading to a more stable boost clock and voltage, and generates 13.3% more credits (normalised for avg power draw). This experiment was well worth it for me!

Profile ServicEnginIC
Avatar
Send message
Joined: 24 Sep 10
Posts: 581
Credit: 9,770,362,024
RAC: 21,500,013
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 57263 - Posted: 2 Sep 2021 | 21:32:21 UTC - in response to Message 57258.

Nice study, very well documented and grounded.
Good job!

bozz4science
Send message
Joined: 22 May 20
Posts: 109
Credit: 102,786,176
RAC: 103,187
Level
Cys
Scientific publications
wat
Message 57419 - Posted: 1 Oct 2021 | 15:18:35 UTC

Recently I have been cleaning my computer and stumbled upon several stains on the backplate of my GPU (1660Super). When I touched it, it felt oily/fatty like grease and smelled a bit like rubber. This stuff is leaking only on the back of the card as far as I can tell... The GPU works fine and never experienced any issues. Research on the web suggested that continued heat on the VRM might lead to evaporation of several key substances from the VRM cooling pads and can easily be cleaned with some alcohol. Is that so or could you think of something else? The process of cleaning is trivial but requires the disassembly of the card and that in turn would require to break the "warranty void" stickers on the backplate screws. I know this is an ancient topic and has been discussed numerous times here but it still causes some hesitation on my end.

1. What do you think caused this? Have you experienced sth similar yourself? Could it really just be evaporated substances from the cooling pads?
2. If the card works just fine, would you even care to clean the stains off or would you consider this just cosmetics? Could this also be corrosive on the card's components?
3. If cleaning it means breaking the warranty seals, would you consider the official RMA process or just do it yourself? I really don't want this process to somehow interfere with the warranty with ASUS.

See attached for some pictures of the stains.
Backplate #1 (Imgur)
Backplate #2 (Imgur)

Ian&Steve C.
Avatar
Send message
Joined: 21 Feb 20
Posts: 1069
Credit: 40,231,533,983
RAC: 527
Level
Trp
Scientific publications
wat
Message 57420 - Posted: 1 Oct 2021 | 15:50:24 UTC - in response to Message 57419.

It's silicone oil seeping out of the thermal pads from the constant heat load. it's totally normal and non-conductive so it will not damage anything.

you can clean it up with a towel or q-tip soaked in alcohol if you want. not necessary though for anything other than aesthetic reasons.
____________

bozz4science
Send message
Joined: 22 May 20
Posts: 109
Credit: 102,786,176
RAC: 103,187
Level
Cys
Scientific publications
wat
Message 57421 - Posted: 1 Oct 2021 | 15:51:25 UTC - in response to Message 57420.

Thanks! That’s all I need to hear :) very reassuring!

Profile ServicEnginIC
Avatar
Send message
Joined: 24 Sep 10
Posts: 581
Credit: 9,770,362,024
RAC: 21,500,013
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 57422 - Posted: 1 Oct 2021 | 17:41:16 UTC

It's silicone oil seeping out of the thermal pads from the constant heat load. it's totally normal and non-conductive so it will not damage anything.

+1

Profile ServicEnginIC
Avatar
Send message
Joined: 24 Sep 10
Posts: 581
Credit: 9,770,362,024
RAC: 21,500,013
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 57435 - Posted: 3 Oct 2021 | 21:11:28 UTC
Last modified: 3 Oct 2021 | 21:13:32 UTC

Symptoms:
- Laptop gradually getting slower, and lately unresponsive, mouse cursor moving in jumps, applications taking a long time to start...
- Laptop Cooling fan gradually getting louder, and lately sounding like a drone ready to take off.
- Laptop's upper surface gradually getting warmer, and lately really hot.

Cause:
This time is really easy to guess. I leave it for reader's homework 😏

Solution:
Let's detail corrective actions, consisting of a full maintenance of laptop cooling system.

After removing a first upper trim cover, a narrow strip of the laptop hardware comes at sight.
Including some components involved in cooling, as copper heat pipe, cooler fan, and heatsink grill frame...
What is that strange mess between cooler fan and heatsink grill? It is surrounded in red on this close-up image.

For gaining more access, I'm removing the laptop flat keyboard, and then the cooler's fan, and the CPU and GPU locations come at sight.
The cooler fan is fixed by two screws, one of them hidden under the laptop's upper frame... But this is not the first time that I maintain it. Thinking of future interventions, I made what I call a "Service orifice" that allows to easily unscrew the hidden screw for extracting the cooler fan. Otherwise, an endless amount of screws are to be removed for accessing it "the official" way.

After extracting the cooler fan, an strange "cushion" inserted between it and the heatsink grill is released.

A further 🔎️ close-up of it reveals that it is built of a mess of pet hair 🤦‍♂️️

Now I'm opening the centrifugal blower frame and removing stacked dust from every blades. I'm using a narrow brush for this.

I usually mark a "logbook" for every of my systems, and I see that the thermal paste is more than six years old... I'll take the chance to renew it.
The heatsink body is fixed by six screws marked from 1 to 6.
I've removed it by gradually loosening the screws in descending order "6" to "1". Here it is.

Now I'm thoroughly cleaning the heatsink grill, and cleaning the old thermal paste from the square copper surface, and the silicon CPU and GPU surfaces.
I've been careful for not damaging or removing the thermal pad covering the gap over the GPU silicon body.

Finally, I've applied new self-spreading thermal paste over CPU and GPU silicon.

Now, all that's left is to reassemble in reverse order, starting by reinstalling the heatsink at its original place and gradually tightening the screws in ascending order "1" to "6".
Then, inserting and fixing the cooler fan, and reattaching its cable.
Before continuing, I'll log the maintenance operation, using the optical unit inner surface as a "bulletin board".

After reconnecting an reinstalling the flat keyboard and the upper trim cover, the maintenance operation is finished... for now.

Our pussy cat really loves this warm surface for taking her naps... 😄

bozz4science
Send message
Joined: 22 May 20
Posts: 109
Credit: 102,786,176
RAC: 103,187
Level
Cys
Scientific publications
wat
Message 57781 - Posted: 10 Nov 2021 | 13:33:21 UTC

I am currently considering an upgrade (CPU as GPU pricing is still just madness) and I am not quite sure about the cooling requirements of AMD's top desktop processors. Because I am expecting the latest CPUs that are set to launch early Q1 2022 with the 3D-V cache design to be way more expensive than the current line up, I am eying an upgrade to a 5950X coming from a 3700X. (also I don't expect much performance gain from the larger L3 cache on the chip alone for BOINC)

As I never really understood the notion of TDP, I am wondering whether a dual tower cooler, such as the NH-D15 from Noctua, will do the job. Is there a need for a water cooling solution only to get the most out of the processor or is it essential to keep temps down, longevity up and clock boost behaviour stable? Would highly appreciate your input!

According to this Noctua list, this cooler should handle a 5950X well.


Ian&Steve C.
Avatar
Send message
Joined: 21 Feb 20
Posts: 1069
Credit: 40,231,533,983
RAC: 527
Level
Trp
Scientific publications
wat
Message 57783 - Posted: 10 Nov 2021 | 14:12:26 UTC - in response to Message 57781.
Last modified: 10 Nov 2021 | 14:16:20 UTC

I am currently considering an upgrade (CPU as GPU pricing is still just madness) and I am not quite sure about the cooling requirements of AMD's top desktop processors. Because I am expecting the latest CPUs that are set to launch early Q1 2022 with the 3D-V cache design to be way more expensive than the current line up, I am eying an upgrade to a 5950X coming from a 3700X. (also I don't expect much performance gain from the larger L3 cache on the chip alone for BOINC)

As I never really understood the notion of TDP, I am wondering whether a dual tower cooler, such as the NH-D15 from Noctua, will do the job. Is there a need for a water cooling solution only to get the most out of the processor or is it essential to keep temps down, longevity up and clock boost behaviour stable? Would highly appreciate your input!

According to this Noctua list, this cooler should handle a 5950X well.




I have the 5950X, overclocked to 4.4GHz all-core, and on water cooling. My GPU (3080Ti, 300W) is in the same loop, and the loop consists of 2x 360x25mm radiators.

with little load from the GPU, the CPU runs about 72C running full tilt on Universe@home. the CPU uses about 165W in this configuration. with the GPU running full tilt @300W, the added heat to the loop brings the CPU temp up to around 80C. the problem with heat on these chips is less the total thermal power, but the power in combination with the small die area. just can't get the heat out of the small area fast enough. for comparison, my 24-core EPYC systems, which are also water cooled with a single 360mm rad, and using ~200W have no problem staying around 50C under full load thanks to the much larger die area for cooling.

IMO, I think the Noctua D15 should be able to handle the chip.
____________

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 1340
Credit: 7,653,273,724
RAC: 13,371,735
Level
Tyr
Scientific publications
watwatwatwatwat
Message 57788 - Posted: 10 Nov 2021 | 16:05:46 UTC - in response to Message 57781.

(also I don't expect much performance gain from the larger L3 cache on the chip alone for BOINC)


There is more improvement from the dual die Ryzen's than just the larger L3 cache.

The write bandwidth out to main memory is doubled from the single die Ryzen's also.

At stock configuration the Noctua cooler you mention would be totally sufficient.

When overclocked though, the cpu probably won't be able to sustain the same clocks under 24/7 distributed computing. The cpu will simply throttle down a bit to maintain its TDP spec.

Profile ServicEnginIC
Avatar
Send message
Joined: 24 Sep 10
Posts: 581
Credit: 9,770,362,024
RAC: 21,500,013
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 57806 - Posted: 11 Nov 2021 | 20:03:51 UTC - in response to Message 57781.

As I never really understood the notion of TDP, I am wondering whether a dual tower cooler, such as the NH-D15 from Noctua, will do the job.

Noctua cooler NH-D15 specifications are rather impressive.
It wil properly handle that AMD 5950X processor you are thinking of, as long as you install right at its back a good fan for extracting outwards the heated air flux, if it is intended to be installed in a closed chassis.
Also, extra height of this cooler (165 mm) must be taken into account, because not every standard chassis will have enough room. Probably, only extra-wide chassis will be able to harbor it with all covers mounted.
I had that problem with an old extra-tall Gigabyte G-Power II Pro CPU cooler, as I related in my previous Message #55129.
And checking height compatibility of memory modules will be also necessary, if they are to be mounted under the front protruding fan. It is said at main NH-D15 page, paragraph "High RAM compatibility in single fan mode".
For RAM modules taller than the standard (up to 32mm), front fan would have to be removed, and air flux would decrease with only one central fan.

bozz4science
Send message
Joined: 22 May 20
Posts: 109
Credit: 102,786,176
RAC: 103,187
Level
Cys
Scientific publications
wat
Message 57812 - Posted: 12 Nov 2021 | 10:04:48 UTC
Last modified: 12 Nov 2021 | 10:05:49 UTC

Thanks to you all! As always very much appreciated!

My guess was that the dual tower cooler should indeed be well equipped to handle the processor but just wanted to be sure. Should have indicated that I already own the cooler and that I am currently "stuck" on the AM4 platform (B550 mobo, 3700X, 1660S, 2 NVME SSDs (dual boot), large chassis, 4 intake and 3 outflow fans) and that it was just a matter on how to best upgrade at a reasonable price. My intention is to skip the next gen of AMD CPUs with the new socket as I will wait for DDR5 to mature a bit before I enter the market for some bleeding edge DRAM :)

I have the 5950X, overclocked to 4.4GHz all-core

Damn, that's impressive and still must be a reasonable beast along your EPYC CPUs, especially at that TDP rating and price point relative to the server-class chips
the CPU uses about 165W in this configuration

Didn't know that it could go that high, so I'll keep that in mind. My PSU has more than enough headroom to handle that though
just can't get the heat out of the small area fast enough. for comparison, my 24-core EPYC systems, which are also water cooled with a single 360mm rad, and using ~200W have no problem staying around 50C under full load thanks to the much larger die area for cooling

It is so interesting to learn from you. Wouldn't have though about that for ages as this is way out of my league to ever consider. But your train of thoughts is as always impeccable. Also makes sense if one considers the server-side applications in a very small closed rack. You'd want to have a chip that is easier to cool, especially with these fanless CPU heatsinks.
1.
IMO, I think the Noctua D15 should be able to handle the chip.

2.
At stock configuration the Noctua cooler you mention would be totally sufficient.

3.
Noctua cooler NH-D15 specifications are rather impressive. It wil properly handle that AMD 5950X processor you are thinking of

That is enough validation for me! Thanks
When overclocked though, the cpu probably won't be able to sustain the same clocks under 24/7 distributed computing.

For sure Keith! However, I'll not be pushing the CPU to its limits, but rather like to see it running 24/7. I never overclock my CPU manually, but let Ryzen do its thing and just watch where it'll boost to. I do have a great cooling setup and am rather happy about how well the chip handles its BOINC life so far. Interesting to me still is that the CPU under full load (~95-100%) is running considerably cooler than if just working on the desktop machine without running Boinc where it'll likely boost up to 4.3GHz and temps skyrocket, and thus fans kicking in more than under heavy load. Ryzens do boost clocks quite aggressively under stock conditions anyway it seems to me
There is more improvement from the dual die Ryzen's than just the larger L3 cache. The write bandwidth out to main memory is doubled from the single die Ryzen's also.

I'll read more about that. While it does sound impressive, I am still not sure if it is worth it to wait another 6m to see where they'll be priced at and their relative superiority over let's say the 5950X
as long as you install right at its back a good fan for extracting outwards the heated air flux, if it is intended to be installed in a closed chassis.

An Arctic Bionix P14 fan is handling most of the outflow job and is doing the job rather well. In addition I have 2 upper fans mounted for outtake fans. My chassis is not very thought out for these fans though as they have only very narrow openings (BeQuiet silent case) --> I do however not close the case with the front panel and just let it breathe air through the mesh panel directly. CPU temps in summer reach 69-73C and in winter 62-69C (with less fan effort)
extra height of this cooler (165 mm) must be taken into account

I was anxious about that when building it last December but I had everything planned out carefully over the course of weeks.
And checking height compatibility of memory modules will be also necessary [...] For RAM modules taller than the standard (up to 32mm), front fan would have to be removed

For that reason I stuck with the Corsair Vengeance RAM modules which comply to that 32 mm specification requirement.

For now my upgrade path is on the AM4 platform for the next 1.5/2 yrs and then maybe to the next gen AMD platform once prices settle down, especially for DDR5

As always, I do appreciate your feedback!

Erich56
Send message
Joined: 1 Jan 15
Posts: 1132
Credit: 10,205,482,676
RAC: 29,855,510
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwat
Message 57813 - Posted: 12 Nov 2021 | 12:04:22 UTC - in response to Message 57806.


Noctua cooler NH-D15 specifications are rather impressive.

I have this cooler in one of my systems, and I am very satiesfied with it.
A clear recommendation from my side.
As you mentioned, a look at the dimensions is essential before buying.

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 1340
Credit: 7,653,273,724
RAC: 13,371,735
Level
Tyr
Scientific publications
watwatwatwatwat
Message 57814 - Posted: 12 Nov 2021 | 18:00:15 UTC - in response to Message 57806.

For RAM modules taller than the standard (up to 32mm), front fan would have to be removed, and air flux would decrease with only one central fan.

Not necessary at all in removing the fan.

The beauty of the Noctua fan mounting with movable clips means all you have to do is move the front fan higher on the finstack array to clear the memory modules.

The cooling won't be affect much if any partially blowing over the top of the finstack, but you will have to account for even more clearance to the side panel.

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 1340
Credit: 7,653,273,724
RAC: 13,371,735
Level
Tyr
Scientific publications
watwatwatwatwat
Message 57815 - Posted: 12 Nov 2021 | 18:09:27 UTC
Last modified: 12 Nov 2021 | 18:09:48 UTC

The pricing on the 5950X is already under pressure from the announcement of the Zen3 w/V-cache cpus coming out 1Q 2022.

Seen them down to $725 currently from their $799 MSRP. I suspect they will go even lower the closer to launch of the V-cache cpus gets.

Profile ServicEnginIC
Avatar
Send message
Joined: 24 Sep 10
Posts: 581
Credit: 9,770,362,024
RAC: 21,500,013
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 57829 - Posted: 13 Nov 2021 | 20:25:29 UTC

Symptom:
- 24/7 working host spontaneously restarted.
- A preliminary visual inspection gives no clues, everything seems as usual... But every senses may be useful to diagnose:
- A typical overheated electronics smell can be appreciated at short distance.
- Touching the upper chassis surface directly over the PSU location, it is really hot.

Cause:
- PSU fan not turning at required speed.

Solution:
- PSU fan replacement.

The guest star of the show this time: My triple-GPU Host #480458.
This is its current inner appearance.
Everything is pointing to damage at PSU inside fan.

Taking a close look to PSU warnings, it is said: "Not user serviceable". But this kind of statement is not usually stopping a true hardware enthusiast without giving it a try...
The other warning, "Hazardous voltages contained within this power supply", please, take it always in mind.
Always unplug the PSU before starting to handle it. And dangerous voltages may remain inside even when disconnected.

Usually, to remove the U-shaped PSU cover it is necessary to unscrew 4 conic-headed philips screws, one at every corner.
This is the panorama that I found: typical stacked dust, but no apparently burnt electronic components at sight. Good!
As suspected, the PSU original fan propeller is abnormally hard to turn by hand, and then it vibrates and quickly stops.
A closer look with microscope camera (*) reveals a darkened overheated zone around the central sleeve bearing... Defective fan confirmed.

I usually like to have varied fans at my spares drawer, as it is one of the most prone to fail pieces.
It's time for this LUFT KLD01299120CBWH fan to come in action.
It is a high-flow ball bearing 120 x 120 x 25 mm fan, suitable for replacing the damaged one. Its specifications can be found at manufacturer's website

And I'm going to take the opportunity to improve the whole system ventilation:
Most of the manufacturers design their PSUs with a variable voltage supply for the fan. The higher te inside PSU temperature, the higher the fan's voltage supply.
There is a thermistor (temperature dependent resistor) monitoring the temperature of power stage, and dedicated circuitry controls the voltage supply for the fan according to it.
I'm bypassing this circuitry, and I'll directly connect the new fan to +12 VDC rail output, thus being permanently supplied at its maximum flow rated voltage.
The overall health of the system will benefit from this ventilation increase.

I will also take the opportunity to an overall cleaning of stacked dust at all the circuitry and heatsinks.
And to re-tighten every loose screws, to recover an optimum electric and thermal conductivity.

Ok, the work is finished.
After reinstalling the PSU, the system returned to the action.

When all of the above happens on a Sunday afternoon, as it did, It's good to have a willful hardware enthusiast available at your own home 🙋‍♂️️🤗️

(*) More archive images from microscope camera can be seen at this link: Hardware Microcosmos

bozz4science
Send message
Joined: 22 May 20
Posts: 109
Credit: 102,786,176
RAC: 103,187
Level
Cys
Scientific publications
wat
Message 58046 - Posted: 10 Dec 2021 | 12:39:54 UTC

Dear all, my desktop machine is currently driving me crazy as I suspect one of the case fans is starting to fail and makes sudden rattling noises that tend to go on for a few minutes and then suddenly subside again. While I suspect a failing fan, it is hard to pinpoint as many fans are connected in series (meaning >1 fan connected to the same mobo fan header). Are there any tricks that can help me to pinpoint which fan is the culprit here? Are case fans really this susceptible to fail even after only 1 year? What are typical symptoms I can check for (visible, audible or sensory cues) to help with this besides trying to pinpoint which fan makes the rattling sound and just replace it?

Ian&Steve C.
Avatar
Send message
Joined: 21 Feb 20
Posts: 1069
Credit: 40,231,533,983
RAC: 527
Level
Trp
Scientific publications
wat
Message 58047 - Posted: 10 Dec 2021 | 14:03:53 UTC - in response to Message 58046.

when you hear the noise, just stick your hand in there and stop the fans one by one with your hand. finding the one making the noise should be quick.
____________

Profile ServicEnginIC
Avatar
Send message
Joined: 24 Sep 10
Posts: 581
Credit: 9,770,362,024
RAC: 21,500,013
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 58048 - Posted: 10 Dec 2021 | 16:00:25 UTC - in response to Message 58046.

Are there any tricks that can help me to pinpoint which fan is the culprit here?

Fan problems are highly recurrent in 24/7 working hosts.
It usually happens that you stop one of them for a while, and when turning it on again that typical rattling noise starts to sound...
I published my own tricks at earlier Message #56323, including a descriptive video.

Aurum
Avatar
Send message
Joined: 12 Jul 17
Posts: 401
Credit: 16,755,010,632
RAC: 220,113
Level
Trp
Scientific publications
watwatwat
Message 58050 - Posted: 10 Dec 2021 | 20:18:06 UTC

My old Corsair AX1200s have been running continuously for 7 years, two more than their warranty. Some have failed for obvious reasons and been scrapped. I'm replacing dead PSUs with Seasonic TX-850s or TX-1000s. They have a 12-year warranty so they must be confident it lasts.
Has anyone tried the PassMark Inline PSU Tester?
https://www.passmark.com/products/inline-psu-tester/index.php
$590 seems exorbitant. I set an eBay search hoping to get a used one cheap but none have been listed.
I have a Thermaltake Dr Power II meter but I've yet to see a PSU fail its test even when I was convinced that the PSU was the problem.
I think it was Keith that suggested using IPMI to watch the voltages on a running computer to see if PSU is failing. Is this it?
https://github.com/ipmitool/ipmitool
If anyone knows of a good tutorial to learn how to do that it would be greatly appreciated. Swapping parts between a good computer and an ailing computer is time-consuming and not always definitive.

Profile ServicEnginIC
Avatar
Send message
Joined: 24 Sep 10
Posts: 581
Credit: 9,770,362,024
RAC: 21,500,013
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 58052 - Posted: 10 Dec 2021 | 21:30:11 UTC - in response to Message 58050.

Interesting questions.
In previous Message #53401 I published something about an off-line PSU tester that I've tried.
But as you say, its major ability is to confirm obvious problems...
To put in evidence more subtle PSU problems, an in-line PSU tester would be a very useful tool.
Any experiences regarding this will be welcome.

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2343
Credit: 16,206,655,749
RAC: 261,147
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 58053 - Posted: 10 Dec 2021 | 22:25:07 UTC - in response to Message 58050.

I think you need a good oscilloscope and a set of artificial variable loads to test a PSU. When a PSU is about to fail (making the PC unreliable), its DC output voltages are probably still within the specification under constant load (a running PC is not a constant load, even when it's idle). The unreliability is caused by the larger spikes (transients) on the DC voltage when a load (for example a GPU) is turned on or off. Such a test equipment (and the expertise to conduct the test) is way more expensive than the extra cost of a reliable PSU, however if you have an oscilloscope, you can test a live system (very carefully, as in the worst case you can break the components which is way more expensive than the PSU).
The unreliability of the PSU usually comes from the aging of the elecrtolytic (or similar) capacitors in it, as their capacity degrades very quickly when the eletcrolyte leaks or evaporates (or both) from them, making the switching elements switch more frequently, which is raising the temperature inside the PSU, which makes the elecrolyte evaporate even faster. The leaking elecrolyte usually leaves visible signs, so a technician can see which capacitor has to be replaced by new ones once the PSU is opened up (but it's a dangerous process, as there are high voltages inside the PSU even days after it's been unpugged from the wall outlet).
There are many DC-DC converters (it's similar to the PSU) on almost every component of a PC (on the motherboard for the CPU and the memory and the chipset, on the GPU, in the SSD drive etc.), as modern chips need very low voltages (around 1V) and high current. For achieving the highest possible efficiency these voltages are converted from the 12V rail (even the PSU itself supplies the 5V and the 3.3V through DC-DC converters from its own inside 12V rail). The electrolythic capactors of these DC-DC converters (if they need such) age the same way as those inside a PSU, but they are much harder to replace (as the motherboards are designed to spread the heat very well, so it's impossible to desolder a single component from them without special equipment). So an aging motherboard can tolerate lower spikes (and ripple) than the same mb when it was brand new.
A general advice for selecting PSUs for crunching:
The PSU should have at least 5 years warranty (the more the merrier).
The PSU should have at least 80+ Gold certification.
The maximum output power rating of the PSU should be 180-200% of the constant load of the cruncher PC.

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2343
Credit: 16,206,655,749
RAC: 261,147
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 58054 - Posted: 10 Dec 2021 | 22:25:08 UTC - in response to Message 58050.
Last modified: 10 Dec 2021 | 22:27:02 UTC

double post...

Ian&Steve C.
Avatar
Send message
Joined: 21 Feb 20
Posts: 1069
Credit: 40,231,533,983
RAC: 527
Level
Trp
Scientific publications
wat
Message 58062 - Posted: 11 Dec 2021 | 2:16:32 UTC - in response to Message 58050.


I think it was Keith that suggested using IPMI to watch the voltages on a running computer to see if PSU is failing. Is this it?
https://github.com/ipmitool/ipmitool
If anyone knows of a good tutorial to learn how to do that it would be greatly appreciated. Swapping parts between a good computer and an ailing computer is time-consuming and not always definitive.


ipmitool is useless without... IPMI. this is hardware built into the motherboard, if you have it. unless you have a server board, you probably don't have it. just downloading some tool wont give you IPMI. (akin to wanting to "download more RAM").

Supermicro boards have IPMI on most models, So do Asrock Rack, Tyan, Dell/HPE, and many other server manufacturers. usually if you have IPMI you'll have a dedicated LAN port specifically for that function. so you can remote into the system without it even being functional or even powered on. it's very handy. I use it to remotely power cycle a stuck system, remotely push BIOS updates, even VGA redirection.

but like I said, the system needs to have the necessary hardware built into it in the first place. there are some raspberry pi based solutions that provide similar functionality for normal consumer hardware, at least for IPKVM and remote power cycling. not sure about the other stuff.

____________

Aurum
Avatar
Send message
Joined: 12 Jul 17
Posts: 401
Credit: 16,755,010,632
RAC: 220,113
Level
Trp
Scientific publications
watwatwat
Message 58067 - Posted: 11 Dec 2021 | 10:16:19 UTC - in response to Message 58053.
Last modified: 11 Dec 2021 | 10:21:27 UTC

The maximum output power rating of the PSU should be 180-200% of the constant load of the cruncher PC.


An easier way to understand that is to say that the maximum efficiency of the PSU is at half its power rating. A computer will run best near that power. See the efficiency graph under Overview: https://seasonic.com/prime-tx#specification

I find it useful to put a label on each PSU with the date I placed it in service.

The unreliability of the PSU usually comes from the aging of the elecrtolytic (or similar) capacitors in it, as their capacity degrades very quickly when the eletcrolyte leaks or evaporates (or both) from them, making the switching elements switch more frequently, which is raising the temperature inside the PSU, which makes the elecrolyte evaporate even faster.

I find it odd how many computer trays or cases align the PSU screw holes to have the PSU fan suck in the hot air from the computer. I use open trays and always point my PSU fan away from the motherboard.

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2343
Credit: 16,206,655,749
RAC: 261,147
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 58069 - Posted: 11 Dec 2021 | 11:45:25 UTC - in response to Message 58067.

I find it odd how many computer trays or cases align the PSU screw holes to have the PSU fan suck in the hot air from the computer. I use open trays and always point my PSU fan away from the motherboard.
That odd PSU orientation is inherited from the era when the only active cooled component in a PC was the PSU. Back then the PSUs had about 60-70% efficiency, which was quite good compared to regulated linear PSUs. (not used in IBM PCs as far as I remember, but used in home computers like Sinclair ZX series and the comparable Commodore series.)

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 1340
Credit: 7,653,273,724
RAC: 13,371,735
Level
Tyr
Scientific publications
watwatwatwatwat
Message 58073 - Posted: 11 Dec 2021 | 16:57:14 UTC

Some of the latest "prosumer" motherboards are including IPMI interfaces. But usually cheaper knockoff versions of the interface normally found on real server motherboards.

Profile ServicEnginIC
Avatar
Send message
Joined: 24 Sep 10
Posts: 581
Credit: 9,770,362,024
RAC: 21,500,013
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 58082 - Posted: 12 Dec 2021 | 15:38:13 UTC

Just as a reminder:

Remember that a periodic maintenance is worth for your hosts health.
In doubt?
I'm showing true images of a "poorly maintained host".
This system was experiencing excessive overall hardware temperatures, and sporadic hang ups.
Please, if you're weak-hearted, don't click over below link...



After a thorough cleaning and CPU thermal paste replacement, problems were solved.

Profile ServicEnginIC
Avatar
Send message
Joined: 24 Sep 10
Posts: 581
Credit: 9,770,362,024
RAC: 21,500,013
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 58211 - Posted: 26 Dec 2021 | 19:43:19 UTC

The hardware enthusiast's corner

The first Post of this thread was published on 2 Nov 2019 | 21:44:13 UTC
For me, many hours of fun are involved, and a certain proud for sharing my wanderings.
Always in the aim that they can be useful to those who eventually would need it.
And I've also learned from You many invaluable knowledge that I'm grateful for.

Being such old, at the very end of this thread can be currently read the following label:

Only the first post and the last 75 posts (of the 312 posts in this thread) are displayed.
Click here to also display the remaining posts.

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 1340
Credit: 7,653,273,724
RAC: 13,371,735
Level
Tyr
Scientific publications
watwatwatwatwat
Message 58212 - Posted: 27 Dec 2021 | 21:47:58 UTC - in response to Message 58211.

It is customary at other project's forums for when threads become to long or unwieldy to close the original thread and create its successor thread with an enumeration typically.

The hardware enthusiast's corner(2)

Profile ServicEnginIC
Avatar
Send message
Joined: 24 Sep 10
Posts: 581
Credit: 9,770,362,024
RAC: 21,500,013
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 58214 - Posted: 28 Dec 2021 | 23:13:41 UTC - in response to Message 58212.
Last modified: 28 Dec 2021 | 23:29:48 UTC

It is customary at other project's forums for when threads become to long or unwieldy to close the original thread and create its successor thread with an enumeration typically.

I'm taking this Keith Myers wise advice, and this will be my last post at the original The hardware enthusiast's corner thread.
For better readability, future posts (if any) will be published at a The hardware enthusiast's corner (2) new thread.

Thank you all again for sharing experiences and giving this thread a 10K+ audience!
And special thanks to Gpugrid for harboring it.

ServicEnginIC

Post to thread

Message boards : Number crunching : The hardware enthusiast's corner

//