High ping spikes and packetloss across all LAGs before reaching the internet

Options
sikhlife
sikhlife Posts: 4 Spectator
edited March 6 in Connectivity

Hey folks,

I am hoping someone an help me with this issue before I try to find a way to escalate this outside of Spectrum. I have been having these specific issues for the last 9 months to almost a year. I am at the point where my Field Ops Supervisor has finally escalated this to "Engineering" and even had the ISP Team (NOC?) take a look at this issue and they are all telling me "this is fine and expected". I dont think major ping spikes that happen consistently or packetloss that happens consistently is "fine and expected" but I've never managed a ISP network, only Datacenters and now AWS managed networks.


Details needed before reading my message below

  • Modem has been replaced 10+ times (i lost count after 10 and stopped updating my note on count)
  • Outside line from the pole with tap to my house have been swapped 3 times
  • Outside lines from box mounted on house to inside have been swapped 5 times
  • Connectors have been swapped over 7 times on all cables, including outside
  • Outline line to inside splitter has been replaced 4 times
  • Splitter has been replaced 5 times
  • Inside line from splitter to modem has been replaced 6 times
  • Plan: Business Internet with 600 down, 35 up
  • All traffic is going to AWS US-EAST-1 (Ashburn, VA)
  • All testing has been done wired.
  • All testing has been done behind my router AND directly connected to the modem


My issue has always been that whenever there is any "significant" and consistent download traffic, my pings will spike significantly (normally 55-60ms, spikes are 120ms+) and I would see packetloss. To me it seemed like bufferbloat so I would lower my bandwidth cap from 600 to 400 and it would resolve my ping spikes for an hour or two and then I would drop down to 200 and it would completely get rid of my issues. Well for the last 6 months, even setting my download to 100 is showing massive ping spikes and packet loss.

During the last 4 years that I've had various issues in this area, I started running speedtests hourly to show the issues because I was being blamed every time. Once I switched my speedtests to run every 10 mins, every one involved stopped blaming me because it was clear that at certain times, I wasn't getting enough download or upload. Also it was very apparent that at certain times, even at 600 download or 35 upload, the plant could not deliver my service and I would start seeing bufferbloat type issues (ping spikes, packet loss, packets being dropped, etc).

Last week, during Wednesday (2/28) and Thursday (2/29) is when the engineers and the ISP Team took a look into this issue with my field ops supervisor and the supervisor he assigned to my case. Funny thing is, they have been working on escalating my issue for the last 2-3 weeks and during the last 10 days (Wed 2/21 to Friday 3/1), my internet was almost perfect. I would have to drop my download speeds to avoid the bufferbloat ping spikes but my overall pings were significantly lower (55-60ms to 35-40ms) and absolutely no packetloss.

I was told by my Field Ops Supervisor on Friday (3/1) that "the engineers have said that everything looks good on our end and if he would stop running those speedtests, it would get rid of all of his issues. It is his prerogative if he wants to keep running those speedtests but they are the cause of all of his issues and stopping them will resolve all of his issues". I told him that at midnight I will stop my speedtests and will call him back if the issues still persist. At 12:02am on 3/2, all speedtests were stopped and the machine running the speedtests was shut down. My network was completely flat after this and I have graphs from my router to prove it.

On Saturday (3/2), around 2pm, my ping went from 35-40ms to 75-80ms. I thought this was very strange so I started running pingplotter while I streamed TV or played games. My ping was averaging at 101.3ms to USE1. I didnt bother reporting it because I wanted to wait a few days and wait for a normal business week / work week to gather more data and give my Field Ops Supervisor an update on the issue.

On Monday (3/4), I was in Zoom meetings from 9am to 2pm and I was seeing my connection / signal strength go yellow more often and sometimes red. As it would start going red, people would say "hey your screen is lagging" and/or "hey you are breaking up". During this time, I had my router pulled up and the most traffic on my network was 5.6mbps download and 3.12mbps upload. During work hours, all traffic (outlets, lights, doorbell, etc) is deprioritized and my work computer and my partner's computer are prioritized. So during my zoom meetings, nothing was influencing my bad connection issues besides the charter network itself.

... got "body is 1551 characters too long" error message, posting rest of the message in the comment ...


Answers

  • sikhlife
    sikhlife Posts: 4 Spectator
    Options

    I updated my Field Ops Supervisor on this with screenshots in an email thread we have going. Yesterday (3/5), I spoke to my supervisor on the phone and after taking a look at my node and my neighbors modem, all he could say was "I know this is frustrating but as the engineers have said, this isn't an issue on our end. I dont know what else to tell you and I don't necessarily want you to leave us since you have been a customer for so long (almost 20 years), but it seems like we aren't the right fit for you". I asked him how any of this is acceptable since I have stopped "any and all testing" and have proven it with data. He said "I dont know what to tell you but what I have been educated and told by the engineers". I asked him if he could escalate this one more time to the engineers because now things are bad and I am not running any tests and he said "I will try but honestly I can tell you right now this wont be seen. It was almost impossible to get them to look at this issue the first time and its almost unheard of that they take a look into singular issues like this". I reiterated that this is a spectrum network issue and nothing I can am doing to influence this. I was directly plugged into the modem while talking to him on the phone and sent him screenshots in real time of what I was seeing.

    The only option that was left on the table was to try yet another modem swap but this time a Business modem since I am a business customer and he happened to have these "brand new 4 port modems that just came out of the box that he would love to get to me to see if that solves the issues". Well here I am on the new modem as of 2pm yesterday and my issues have gotten worse. My ping spikes are worse and my packetloss is now consistently 1-2%.

    I am attaching a MTR, Pingplotter and cloudping screenshots to this post. I hope someone can help me with this issue and maybe finally get this looked at.

    Thanks for taking your time to read this post and possibly helping with this issue,

    Sikh

  • James_M
    James_M Posts: 4,728 ✅ Verified Employee Moderator
    Options

    Hi & welcome!

    Sorry for any issue you are experiencing with your service. I was able to locate your account using your registration information and I see that you are a Business Customer.

    We are limited in this Community in the support we can provide to Business Customers. Typically, the path we would follow would be to escalate to Field Operations for further investigation. Since you are already in communication with Field Operations directly through a previous escalation, the only other suggestion would be to reach out to your Sales Executive to see if they have any other suggestions since they would be familiar with your business needs.

    I will tag a couple of our regular contributors, in case they have any other suggestions to add @HT_Greenfield , @RAIST515O

  • RAIST515O
    RAIST515O Posts: 90 Contributor
    Options

    Sounds like you may need a full fiber solution without any coax involved? You will likely still suffer some level of bloat and loaded latency issues, but depending on the solution it may offer a marked improvement. There are just some screwy infrastructure issues at play that may be hard to overcome... especially if you are in an "older" neighborhood.

    Loaded latency spiking has been a challenge for the HFC DOCSIS systems for as long as I can remember... and that goes back to our beta testing with the TimeWarner/RoadRunner rollouts back around '95/96.

    The whole "shared pipe" aspect you may run into does not help matters either.

    Unfortunately, Bus. Class really doesn't help much with the pitfalls either... the ToS and all can offer better support when problems arise and all... but... what you are dealing with ultimately boils down to a systemic architecture kind of thing. The nature of how everything works just has a few shortcomings when things come under load.

    It gets better with each iteration of DOCSIS and the eventual hardware/firmware improvements with the subsequent generations in hardware... but the rollouts of those improvements take time. From nabbing the spectrum from the FCC, integrating that into the networks and upgrading the plants and extended nodes to take advantage of it, then getting the client hardware also in synch--they are always behind the eight ball, and the end-users can come up a day late and a dollar short.

    We still have a big DOCSIS 3.0 presence in our market, despite the massive push to get everyone to start using at least ONE OFDMA channel (the big game changer in 3.1 going forward).

    Sometimes issues can get compounded by a particular application also. Back when I was using a PS3 and PS4, I shunted them through a switch to more even out their packet flows. Horrible netcode... any buffer bloating really knackered things for them. Some people did a similar thing through a pc with 2 NIC's.

    IDK if that would be a practical approach or not... finding some way to isolate the critical applications to their own network segment of sorts and prio that segment, perhaps even running it through a dedicated tunnel or something so it may also get priority on the other side as it goes upstream.

    For that matter... perhaps even just encrypting through a VPN might help if it is falling victim to some form of shaping/queuing somewhere out in the wild. Many out there offer decent free trials (or even ongoing free use) you could tinker with... May be worth a shot.

  • RAIST515O
    RAIST515O Posts: 90 Contributor
    Options

    On that VPN thought... may be something to really consider trying.

    Completely forgot about how I have had to use that to address stutter/stalling for Crunchyroll. Fire one up routinely on my phone for a host of reasons, so it escaped me...but I also use Windscribe on my TV's android boxes sometimes when things get wonky networkwise.

  • HT_Greenfield
    HT_Greenfield Posts: 708 Contributor
    edited March 7
    Options

    ICMP echo and relay are two different things. Echo priority will vary anywhere from "best effort" at best, to none at all as you can see. It is impossible for the relay performance of any router to not be at least as good as the echo performance of every router beyond, as well as the end point. All but one of the 239 packets round-tripped the last router with excellent latency and the one that didn't, didn't even get out very far out of the gate, and the Twitchy autonomous system endpoint is blocking ICMP echo requests so you run Wireshark and you determine the QoS data exchange turn-around delay with that host and however much longer it is than the ICMP echo of that last router is all on that host. Quit worrying about trivial stats, anyway. It's not horse race. No offense.

  • HT_Greenfield
    HT_Greenfield Posts: 708 Contributor
    Options

    Correction: "Twitch Interactive Inc." (not "Twitchy".)