Author |
Message |
|
The hardware enthusiast's corner (2)
It is customary at other project's forums for when threads become to long or unwieldy to close the original thread and create its successor thread with an enumeration typically.
Taking this Keith Myers wise advice, a new The hardware enthusiast's corner (2) thread is now opened.
Previous posts at the original thread keep being accessible at The hardware enthusiast's corner |
|
|
|
Assembling a computer for working BOINC is not only building it, but also maintaining for it being in perfect shape for hard and reliable crunching.
For me, this gives sense for "The hardware enthusiast's corner (2)" thread.
It's delightful that a self-built host ends up serving for something useful!!
My preferent BOINC project is Gpugrid, but I have several other backup projects for productively filling the ACEMD3 / Python tasks scarcity periods like previous weeks.
I've been recently notified by PrimeGrid about my Gpugrid Host #557889 being discoverer of an unknown Prime number large enough for entering the Top 5000 List in Chris Caldwell's The Largest Known Primes Database.
The number: 95635202^131072+1
Here, a recent image of the inside of the lucky host.
The harboring chassis was recovered from scrap to give it a second life... ✅
It's something that we can expect from a hardware enthusiast ;-) |
|
|
Keith Myers Send message
Joined: 13 Dec 17 Posts: 1340 Credit: 7,652,816,070 RAC: 13,498,476 Level
Scientific publications
|
Congratz on the new prime number discovery.
Looking forward to your insights and knowledge in the new thread. |
|
|
|
Upgrading system RAM, It's easy
If you're able to exchange your car's wheel without calling a tow truck, then, for sure, you're able to upgrade your system's RAM.
(And even if you aren't, I'd say that you're still able to upgrade your system's RAM ;-)
When new batches of ABOU Python GPU tasks arised, I got in problems with my two Multi-GPU hosts.
My triple GTX 1650 GPU system had installed 32 GB of system RAM, while my twin GTX 1650 GPU system had 16 GB RAM installed.
Grossly, each ABOU task needs 16 GB of system RAM to expand its environment. Therefore, I had to take care for my Triple GPU system not to get more than two ABOU simultaneous tasks, and my twin GPU system more than one...
So I decided to upgrade my triple GPU system from 32 to 64 GB RAM (and by carom, my twin GPU system from 16 to 32 GB RAM).
First of all, please, seat down a few minutes to study your move. Otherwise, you may get a nasty surprise...
As an example, take a look to Gigabyte B365M H motherboard specifications. Two of my hosts are based on it.
At Memory specifications section, it's said that maximum allowed RAM size is 2 x DDR4 @2666 MHz DIMMs up to a total of 32 GB system RAM.
This system can't handle 64 GB RAM. Installing 2 x 32 GB DIMMs will cause that system even won't start!
Here are specifications for the motherboard of the two systems to upgrade: Gigabyte Z390 UD
It can handle 4 x DDR4 @2666 MHz DIMMs up to a total of 128 GB system RAM, with maximum size of 32 GB DDR4 modules. Good!
I opened my hardware piggy bank and purchased 2 x DDR4 @2666 MHz 32 GB DIMMs.
The starting configuration for my triple GPU system was this 4 x DDR4 @2666 MHz 8 GB DIMMs = 32 GB total system RAM.
Upgrading is as simple as opening the computer's case, extract the four existing DIMMs, and install the two new ones.
Here one tip, based in my experience:
When I install a new slot-based component (memory DIMM, graphics card, expansion card) I insert and extract and reinsert it three times.
This is what I call the slot and the module contacts "becoming friends". This usually will prevent present and future problems due to poor electric contacts.
If modules refuse to enter, try turning them by 180º... They are "mechanically codified".
Always verify at the end that DIMM is fully inserted into the slot, and lateral latches are all the way closed.
When in doubt, there is usually a section on motherboard's manual explaining how the memory modules are to be installed depending on each model.
Now, the new system's memory section looks like this.
As you can see, I've installed both memory modules at slots of the same (grey) color. Please, take care of doing this way for taking advantage of dual channel memory performance enhancement.
Also it is described at each motherboard user manual when multi-channel memory architecture is available.
And the four extracted DIMM modules from previous system, are now re-used to replace the existing 16 GB RAM at my twin GPU system to upgrade to 32 GB.
Here is its final new look.
After that, My triple GPU system has been able to successfully process three concurrent ABOU tasks without worrying about a lack of system RAM.
This is a screenshot of BOINC Manager while they were processing.
Also I took a nvidia-smi screenshot.
And the following Psensor screenshot, where the typical "GPU spikes" caused by the learning agents can be appreciated:
Even a 6% of system RAM remains still available at this full load situation.
In the same way, my twin GPU system is able to process two concurrent ABOU tasks without worrying about it.
It was worth it! 👍️ |
|
|
Aurum Send message
Joined: 12 Jul 17 Posts: 401 Credit: 16,755,010,632 RAC: 220,113 Level
Scientific publications
|
Woke today to find 8 Linux Mint 20.3 computers no longer able to communicate. When I put a head on them I can find nothing out of the ordinary except that it cannot communicate with the internet.
Strangely one computer can communicate over my LAN but cannot communicate with the internet.
Each has a static IP and I keep them updated.
I turned off all my switches and turned them back on starting from the hub. I've also rebooted several of them.
Any suggestions would be welcome. TIA |
|
|
|
For the commoner DHCP-allocated IP addresses, you don't just get an IP address. You also need (and get):
Subnet mask
Default gateway
DNS server
Of those three, the most likely culprit is the default gateway. Have you changed or reconfigured your router recently? |
|
|
Aurum Send message
Joined: 12 Jul 17 Posts: 401 Credit: 16,755,010,632 RAC: 220,113 Level
Scientific publications
|
For the commoner DHCP-allocated IP addresses, you don't just get an IP address. You also need (and get):
Subnet mask (255.255.255.0)
Default gateway (192.168.1.1, just checked and it still has this address)
DNS server (8.8.4.4,8.8.8.8)
Of those three, the most likely culprit is the default gateway. Have you changed or reconfigured your router recently? No changes to router or gateway but I did try rebooting them.
If the default gateway had changed then all computers would lose connection but 30 are still running fine.
All my computers are on a wired ethernet with the motherboard RJ-45 status lights on. I did unplug and plug the cables back in.
I ran Advanced IP Scanner looking for evidence of duplicate IP addresses even though I'd made no changes.
My wife and kids use DHCP wireless connections for their handbrains and laptops. All my BOINC computers have static IP addresses. DHCP has 192.168.1.2-99 available and I've never seen them try to use higher numbers.
Most of my gear is long in the tooth so every month or two I scrap out a PSU or a motherboard. One or two failing wouldn't have even gotten my attention but 8 with one that can communicate on the LAN but not the WAN is strange.
The deaf computers are split between two unmanaged 24-port switches. I'll try turning off all switches, router and gateway and powering up from gateway, to router to hub switch, then the other switches. If that doesn't bring them back to life I'll just turn them off until I can attempt a fresh build. But I'd sure welcome a better suggestion. |
|
|
|
check the cables too, just as a quick check. or even specific ports on the switch could go bad. try a different port or known/good port.
what is the result of 'ifconfig' from a terminal? or 'ip a' if you don't have the ifconfig package installed
____________
|
|
|
|
I connect to the Internet at home by means of a fiber optics line from my supplier.
From time to time, I lose the whole Internet connection with no apparent reason (not your case, a partial loss)
I then switch off everything but Ethernet switches, and start booting the Optical Node Terminal (ONT), then the Internet Router, and finally the WiFi access points. This usually solves my problem.
Recently, I noticed a linux-firmware update. If Linux OS, you could check for a common NIC at non-connecting hosts that would be affected by this update in some way.
Trying momentarily with any trusted USB - Ethernet or USB - WiFi device could help to diagnose something like that. |
|
|
Aurum Send message
Joined: 12 Jul 17 Posts: 401 Credit: 16,755,010,632 RAC: 220,113 Level
Scientific publications
|
Solved, I hope. A couple of days ago I shut down my network and powered it from the router to the hub switch... That seemed to fix things. I thought maybe a glitchy computer messed up the address tables in the switches.
Today I woke to find a big list of completed WUs that would not upload. Moved cables but problem followed the computers and not the cables. Shut down and rebooted network. Didn't fix it. For some reason I opened my router gui and clicked on one of the 7 hung computers. For some reason it had turned the parental control to Deny Monday on a few of them. I clicked Allow Monday and they worked. A couple showed Allow All. These two required that I click Deny All, wait for it to set, and then click Allow All. Then they worked. First I had tried All Devices Allow All but that only timed out before it completed applying it.
Now I'm wondering if my Charter Spectrum WiFi router model Sagemcom F@st 5260 needs to be replaced.
Thinking of buying a WiFi router with 8 LAN ports so I can get rid of the Spectrum rental and my hub switch. Something like this but it's expensive:
https://www.newegg.com/asus-rt-ax88u-ca-ieee-802-11a-ieee-802-11b-ieee-802-11g-ieee-802-11n-ieee-802-11ac-ieee-802-11a/p/N82E16833320374?Item=9SIAD6H9RM4620&quicklink=true
Or maybe this reburished one that's much cheaper: https://www.newegg.com/tp-link-archer-c5400x/p/N82E16833704584?quicklink=true
Maybe having the hub switch is okay so I could use a 4 LAN port router. Or maybe there's a WiFi router that can be attached to a non-WiFi 8 LAN router. |
|
|
Keith Myers Send message
Joined: 13 Dec 17 Posts: 1340 Credit: 7,652,816,070 RAC: 13,498,476 Level
Scientific publications
|
I have been having very slow website populations lately across the board so I just ran my two command line statements to refresh everything. Boom! now all the websites populate fast and normally.
sudo ip -s -s neigh flush all
sudo systemd-resolve --flush-caches |
|
|
|
an old router could certainly be an issue and have all kinds of weird problems.
if you feel able, you could build your own router from an old PC. it doesn't require much power. pfSense is very robust and stable router software.
I run my router on pfSense using an Intel Atom 8-core processor and 8GB ECC RAM. which is more than enough to run my VPN and several packet related services in addition to all the routing functions.
the only downside is the initial configuration, and you'll need a switch and some access points for wifi. but the plus side is you get the flexibility to choose your own wifi access points and you can put them wherever you want. getting whole home wifi coverage is a lot easier with a few well placed APs than with a single consumer grade router that usually has bad radio/antenna properties and anemic hardware.
____________
|
|
|
|
For some reason I opened my router gui and clicked on one of the 7 hung computers. For some reason it had turned the parental control to Deny Monday on a few of them. I clicked Allow Monday and they worked. A couple showed Allow All. These two required that I click Deny All, wait for it to set, and then click Allow All. Then they worked.
Glad that the mystery was solved, congratulations.
Most of the routers / access points have a management section with a configuration backup option.
I keep always an updated backup for all of these devices when everything is configured and working. I'd recommend this.
If I suspect that any parameter may have become altered, I restore configuration from backup, and doubt vanishes. |
|
|
|
Always there is a first time
On a routine temperature screening of my working hosts, today I found a graphics card based on a Nvidia GT 1030 GPU showing 82 ºC
A normal Psensor screenshot for that host looks like this. Around 54 ºC at full load for GT 1030 GPU.
That ASUS PH-GT1030-O2G graphics card is a 30 Watts low power consumption one, so the abnormally high temperature must have any explanation...
...Jumping jack fan, for example?
Once the card on the table, it can be seen that the blades had completely detached from the fan's body.
Digging in my backup fan collection, I found a PWM fan directly compatible with the damaged one.
After a thorough heatsink clean and replacing the fan, the graphics card is ready to work again.
I had never found so far anything like this... But always there is a first time. |
|
|
grepSend message
Joined: 4 May 23 Posts: 3 Credit: 3,342,500 RAC: 0 Level
Scientific publications
|
Woke today to find 8 Linux Mint 20.3 computers no longer able to communicate. When I put a head on them I can find nothing out of the ordinary except that it cannot communicate with the internet.
Strangely one computer can communicate over my LAN but cannot communicate with the internet.
Each has a static IP and I keep them updated.
I turned off all my switches and turned them back on starting from the hub. I've also rebooted several of them.
Any suggestions would be welcome. TIA
something like this happens to me occasionally and its almost always due to the network switch; if I unplug the power cable on the network switch and plug it back in, things usually go back to normal. |
|
|
grepSend message
Joined: 4 May 23 Posts: 3 Credit: 3,342,500 RAC: 0 Level
Scientific publications
|
Now I'm wondering if my Charter Spectrum WiFi router model Sagemcom F@st 5260 needs to be replaced.
Thinking of buying a WiFi router with 8 LAN ports so I can get rid of the Spectrum rental and my hub switch.
I took a different approach to this, as alluded to by others in this thread
My current setup looks like this;
- Netgate 1100 pfSense router https://shop.netgate.com/products/1100-pfsense
- some basic unmanaged network switches https://www.amazon.com/Ethernet-Splitter-Optimization-Unmanaged-TL-SG105/dp/B00A128S24/
- TP-Link AXE5400 Tri-Band WiFi 6E Router (Archer AXE75) https://www.amazon.com/gp/product/B0B3SQK74L/
The Netgate 1100 is the primary router for the apartment network. Its the first-party offering by the people who make pfsense, which I highly recommend.
The network switches extend the router's Ethernet coverage to all wired network devices.
The AXE5400 router is set to "Access Point Mode" and connected to one of the network switches, it provides the wifi coverage; I dont have a lot of space to cover so its plenty powerful. Its also kinda overkill, there are cheaper options available if all you need is a Wifi Access Point. Though I do appreciate the 6GHz coverage for the single device I own that currently supports it
If you are considering upgrades to your home network, then you might consider moving in the direction of splitting up the tasks between different devices, instead of just buying a single consumer grade router. The wireless coverage can come from a separate device than the Ethernet router, which will give you more options than searching for just a new "wifi router". Also, I chose the Netgate 1100 for this because I wanted to go with pfsense, but I did not want the hassle of having to piece together my own DIY pfsense router (there are tons of great YouTube videos out there about exactly this, highly recommended).
So far this setup is working well. I only have one device with BOINC, but I do have about 25 devices on the network, most with static IP's via the Netgate's DHCP server. |
|
|