Consistent packet loss every ~30 seconds starting at 2nd hop/neighborhood node

Options
Craig9S
Craig9S Posts: 4 Spectator
edited July 10 in Connectivity

As the subject says, we recently started noticing instances where our connection would seemingly pause for a few seconds. I've done soft reboots and hard reboots of both modem & router (Arrus SB8200 & TP-Link AC1900) with zero change. Started running tracert & ping tests on each node today and the second hop, our Spectrum neighborhood node, seems to be where the problem arises. Decided to use PingPlotter again too, and 142.254.158.153 seems to definitely be the node in question.

The ping -t tests I ran on each of those hops have shown zero issues with my internal network, but the 2nd hop and all other hops upstream have the same time-outs every ~30 seconds or so. See screenshot below for further detail. The PingPlotter results match precisely with my tracert & ping -t tests run earlier this morning as well.

We had a similar issue shortly after getting our service installed almost a year ago where I was able to provide data to the online support rather than someone ask me to reset my modem or waste all our time. They actually sent out a tech right away who verified what I provided & requested an escalation for a tech to inspect & repair the neighborhood node. They did so relatively promptly & we honestly haven't had an issue up until the past week or so.

So I suppose I'm asking if this is in-fact the issue — the neighborhood node — and if I should follow the same path, or if someone from Spectrum can just test it remotely and skip all the needless time & steps by sending someone out to fix the problem at the node right away.

Comments

  • RAIST515O
    RAIST515O Posts: 156 Contributor
    edited June 29
    Options

    Don'tmeam to sound nitpicky or anything... but may not want to rely on a high traffic DNS server for ICMP ECHO testing. Priority queueing policies in place may negatively skew the results, so may not get as much consideration as something like a robust .com page or server used in hosting games or another service that may be more accepting of ICMP ECHO requests.

  • Craig9S
    Craig9S Posts: 4 Spectator
    Options

    I understand what you're saying as a generality, but our connectively issues match accurately with how the data is shown above and I mentioned all my other testing supports this data. It's clearly happening at the 2nd hop & following every other node upstream. Regardless, here is more data showing the same issues starting at the 2nd hop.

  • Jaleesa_F
    Jaleesa_F Posts: 414 ✅ Verified Employee Moderator
    Options

    Hello @Craig9S

    Sorry to hear that you're experiencing connectivity issues with our service, I am happy to help! Are you able to bypass your router for testing purposes?

  • Craig9S
    Craig9S Posts: 4 Spectator
    Options

    Jaleesa, no need. We just heard from a neighbor behind us & a couple over that they’re having the exact same issue. Already have a tech coming to confirm tomorrow and hopefully they request someone to fix the issue at the node shortly after.

  • RAIST515O
    RAIST515O Posts: 156 Contributor
    Options

    Looks like there is actually a bigger problem at the IXP to into the third party peer CenturyLink/Level3. on that one plot.

    Which is unfortunately nothing new. It happens regularly to many last mile ISP's... and we get stuck with it until new route announcements come out or whatever is triggering the congestion otherwise gets resolved.

  • RAIST515O
    RAIST515O Posts: 156 Contributor
    edited June 30
    Options

    Looking at it closer... it appears that may actually be the point where there is a genuine concern—the third party exchange point into a peering partner's network.

    It looks more like what you have been tracking along the way is a delayed or dropped response to an ICMP ECHO request because utilization has crossed a set threshold to do so.

    They are configured to focus on forwarding packets as a priority... responding to ICMP ECHO is more towards (if not at) the bottom of the priority scale.

    If you look closer you'll see that in that first graph all 599 packets actually made it to the endpoint. In the third one all 163 again made it all the way to the endpoint. Yes the echo requests may have gotten delayed or just ignored but the packets were actually forwarded probably to the next hop all the way to the end of the route.

    That second graph however breaks from that pattern and it happens right there at the exchange point trying to get into level 3's network... all 388 packets made it to that exchange point. A substantial chunk is getting queued/delayed/ignored, but all but only one never actually made it to the final hop.

    So out of just over 1000 packets sent across these tests, only one appears to have actually been lost— and that appears to happen at the handoff into a third-party's network.

    Don't get me wrong... there is more jitter than we the users would ideally like to see... but that is more an ongoing issue all ISP's wrangle with because the automated systems that try to balance congestion sometimes cannot react fast enough. It would take a crazy amount of aggressive micromanaging to really eliminate those kinds of spikes, and that's why you often run into a pretty liberal level of tolerance for that specific issue.

    BUT... the heavy queuing at that IXP trying to get into Level3 and the hops beyond that is something someone upstream should probably look into since it's indeed a path into some measurable level of packet loss.

    But... then again, that loss may be occuring on someone else's network. Spectrum may not have any influence there other than reporting it to their peering partner (Level3/CenturyLink).

  • RAIST515O
    RAIST515O Posts: 156 Contributor
    Options

    A pathping report along the route you're actually suffering issues with may be a more useful report.

    It can be a bit tricky to format properly as a copy/paste though... a screen shot may be easier to read.

  • HT_Greenfield
    HT_Greenfield Posts: 795 Contributor
    Options

    Just curious if you could run that just to whatever the router shows as the default gateway, and then again to dns.google but with the router bypassed as in the laptop connected directly to modem WAN port.

    Ref.: 🔗 https://community.tp-link.com/en/business/forum/topic/617864

    Ref.: 🔗 https://community.tp-link.com/en/home/forum/topic/617782

  • Craig9S
    Craig9S Posts: 4 Spectator
    edited July 1
    Options

    UPDATE:
    A service tech came to the house. Of course the lovely folks at Spectrum didn't pass along any of the info I provided them and only told him "He's having connectivity issues." Happens every time. Anyways, I provided him with a quick summary on the info I provided above and he said, "Yeah, sounds like it's probably that outside node. I still have to check your signals quick, but I'll look outside after that." 10 minutes later and he said, "Yep, that's definitely the issue. I put in a maintenance request on it and it appears they've already got one scheduled for tonight."

    No issues so far this morning. Ping test to the same problem address showed zero packet loss this morning, so whatever the issue was there, they apparently fixed it. No clue if there are other issues that caused the problem like y'all mention above, but we both work from home, so just having our steady signal back today is rather refreshing.

  • Jake9432423
    Jake9432423 Posts: 2 Spectator
    Options

    Experiencing the same issue recently. The Internet is unusable unless I use a VPN. Spectrum support has been utterly useless and I feel like unless this is resolved soon my only option is to cancel.

  • RAIST515O
    RAIST515O Posts: 156 Contributor
    edited July 2
    Options

    @Jake9432423

    Unfortunately, that is only showing a bad delay responding to a lower priority ICMP ECHO request... which is common practice when under heavier loads. It appears packets are still forwarding properly, which is their primary function.

    All 25 pings made it to the end point in 34-35ms with no packet loss.

    Even though it may raise questions over the load level at a particular hop, a report like this may more or less be viewed as working as intended because the endpoint result does not appear to be negatively impacted <yet>.

  • Jake9432423
    Jake9432423 Posts: 2 Spectator
    Options

    @RAIST515O

    Okay sure, I took a few more samples that apparently do include packet loss / errors:

    I don't know which node is causing the problem but I have been having issues connecting to websites for around a week now. It's only fixed when I am using a VPN, probably because it's not passing through the spectrum nodes.

    I did explain it to the support person over the phone today and they're sending out a tech as part of the standard procedure, and I'll need to show my findings to him so they can send out a maintenance crew to whatever the faulty node may be.

  • RAIST515O
    RAIST515O Posts: 156 Contributor
    edited July 2
    Options

    Unfortunately... still kinda the same story there as well.

    Some requested responses were ignored, others were queued for delayed responses.

    All 15 pings made it to the dns server endpoint in roughly 34-38ms... the YouTube server in <presumably> roughly 34-44ms (they all appear to have arrived, just no lower priority echo response given for some...).

    Granted, not an ideal jitter factor... but for the contents of the packets being sent back and forth, not exactly going to set off alarms further up the chain of command.

    Best case... someone might monitor a node for a high error rate of normal taffic.

    As a general rule, ping replies are pretty low on the priority list for things like routers… DNS and media servers--not all that uncommon to even see them outright ignored in some cases. They can actually pose a security risk for high traffic areas.

    They're best used more as a test for echo, or watching for a pattern that demonstrates increased loads/intermittent outages and such that may indicate specific issues.

    For example. If a ping is forwarded through a previous hop and arrives at the next one in an appropriate time frame... than that earlier hop is doing it's job properly without issues. Now, if they get queued and the FORWARD (not echo reply) is grossly delayed, that may be a cause for concern... as it may have an adverse effect at the endpoint. The question then becomes if that same forward queuing is happening to HiGHER priority packets.

This discussion has been closed.