Mysterious Internet Outages

Piyono

Storage is cool
Joined
Jan 25, 2002
Messages
599
Location
Toronto
My internet's been going out daily since I configured the network as in the image below, and I'm darn near my wits' end trying to figure out what's going on.
Network Diagram Current.png
The outages come once or twice a day, without out apparent rhyme or reason or correlation to any of my activities.

When an outage occurs the router remains responsive and the OpenWrt overview shows that the WAN connections are up and running. In fact, almost nothing appears out of the ordinary in the GUI.

The only way I've been able to restore service is with a reboot, either from within LuCI or at the power switch.

I've tried factory-resetting and re-flashing OpenWRT (currently at v 19.07.5) but that doesn't help.

I've been logging everything I can using luci-app-statistics. Below is a screencap of the RRDTOOL report.

The only erratic behaviors I can identify are
a) the sudden surge in activity on the loopback adaptor which begins immediately at the start of the outage and ceases only when the router is reset. Loopback is quiet otherwise.
Screenshot_2020-12-13 OpenWrt - Interfaces - LuCI.png
and b) the spike in CPU load at the point of outage. The router is definitely doing something, just not what I want.
cpu.png
Annotation 2020-12-13 162416.png
I'm also capturing the system log to a file but I see nothing in it that would suggest that an outage is about to occur.
For example, here's a snippet of the system log corresponding to the screenshots above:
Code:
Sun Dec 13 14:56:49 2020 daemon.info dnsmasq-dhcp[1783]: DHCPDISCOVER(br-lan) xx:xx:xx:xx:xx:xx
Sun Dec 13 14:56:49 2020 daemon.info dnsmasq-dhcp[1783]: DHCPOFFER(br-lan) 192.168.1.109 xx:xx:xx:xx:xx:xx
Sun Dec 13 14:56:49 2020 daemon.info dnsmasq-dhcp[1783]: DHCPREQUEST(br-lan) 192.168.1.109 xx:xx:xx:xx:xx:xx
Sun Dec 13 14:56:49 2020 daemon.info dnsmasq-dhcp[1783]: DHCPACK(br-lan) 192.168.1.109 xx:xx:xx:xx:xx:xx mercku
Sun Dec 13 16:16:24 2020 daemon.err uhttpd[1280]: luci: accepted login on / for root from 192.168.1.222
192.168.1.222 is me logging in to reboot the router.

What should be my next troubleshooting step? I sit here all day watching graphs and system log and kernel logs but I'm not sure that these are giving me what I need.

I'm going to try swapping the TP-Link for the D-Link router in the diagram that's currently being used as a switch and see if that changes anything.

The paranoid delusional in me is wondering whether this is the effect of malicious software operating on a device within or without my network.

Any tips?
 
Last edited:

Piyono

Storage is cool
Joined
Jan 25, 2002
Messages
599
Location
Toronto
I suppose there could also be a hardware issue with the TP-Link. I swapped it for the D-Link running DD-WRT. So far (9 hours) no dropouts.
 

fb

Storage is cool
Joined
Jan 31, 2003
Messages
726
Location
Östersund, Sweden
I almost don't know anything about networking. But at my previous job I had the same problem when an "extra" switch in an office gave up with age and gave the same symptoms as you mention.
The troubleshooting process was the same as you are doing now, disconnect equipment, one at a time, and see if the problems go away.

I hope you find the problem. It looks promising.
 

Piyono

Storage is cool
Joined
Jan 25, 2002
Messages
599
Location
Toronto
The previous configuration involved a DSL modem in the garage and there was no cable running between the office and garage so it's hard to compare. Anyway, right now we're using the Mercku as the primary router, which kind of sucks because the firmware is very basic but it also hasn't dropped the connection.
 
Top