My ISP has been having intermittent upload issues1 (multiple times per day, for several minutes at a time), which impacted my ability to work from home. I’ve had to call technicians out several times, and eventually I got tired of spending the first 20 minutes of every technician visit:
- Waiting for them to plug into the modem,
- Run the usual speed tests and diagnostics,
- Show me that the current signal and speed looks great,
- I re-explain the issues I’m having and wait for the signal to cut out so they can see the issue,2
- And they pay attention to my explanation only once they see the issue.
Which, sure, I get it. There are a lot of customers who I’m sure are having issues with their Wi-Fi, device issues, or other problems that are not the ISP’s hardware. I’m sure a lot of people say they’re pretty tech-savvy and have a good idea of what the problem is from asking ChatGPT, and I’ve dealt with this in my first few jobs (which were customer-facing).
I’m also empathetic that diagnosing intermittent faults is very challenging and I have worked on bugs where it took an entire team two months to find a root cause. It’s a lot easier to diagnose an issue if you can replicate it on-demand.
So by visit #6 (yes, really3), I figured I should have a way to replicate the issue on-demand, and I figure if there’s a signal integrity issue it might get worse as traffic increases. To make sure this would last long enough that the technician could see the issue clearly, and knowing that short speedtests they ran at the beginning of each visit weren’t long enough, I needed another option. So I decided I’ll upload 1GB files in parallel to the fastest S3 provider I could find, using a PC wired directly to my router with an NVMe SSD - doing my best to ensure my ISP was the bottleneck.
I wrote a short bash script to create as many 1GB files before the technician arrived, saved it as large_file_generate.sh, and generated 100GB of files which were placed in ~/large: large_file_generate.sh 100
mkdir -p ~/large
cd ~/large
for ((i = 1; i <= $1; i++)); do
echo "Generating file $i"
head -c 1G /dev/urandom > 1G-$i.txt
done
I then configured Rclone with my S3 providers’ credentials as s3, created a bucket upload-signal-hammer, and then as the technician arrived began the upload with rclone copy -v -L ~/large s3:upload-signal-hammer/
The upload issues started almost as soon as the technician walked in, and speeds plummeted to under 10% of what I pay for. I explained that I was trying to demonstrate the issue, showed the speeds I was getting from a directly-wired device, they plugged in to the modem, and I peered over at their diagnostics and saw upload channels lit up red as “disconnected.” Finally, we understand each other immediately, and the technician called in the issue/started a deeper look for the root cause.
Of course, this didn’t fix the problem, but being able to replicate it on-demand made the subsequent discussions with my ISP and their technicians much easier.
Appendix
-
My ISP uses DOCSIS 3.1 “high split” and has bonded 8x SC-QAM channels and 2x OFDMA channels. The SC-QAM channels are usually fine (lower bandwidth, higher stability), but the OFDMA channels cut out frequently (higher bandwidth, lower stability). This often limits my upload speed to ~10-15% of what I pay for if the OFDMA channels cut out, but when all channels cut out traffic can no longer leave my network at all. I work from home and can’t drop out of work calls several times per day, so this is unacceptable. ↩
-
For some inscrutable reason, the modems used don’t keep or report signal quality logs that the technician can access (so they can only see the live state of the connection), and seemingly the company doesn’t keep or share detailed notes about past service calls with technicians on future calls. Both of which would have solved this and saved the ISP time and money despite being very cheap to implement. Oh well. ↩
-
The noise issue isn’t between my house and the drop, it’s between the drop and the node, so my ISP’s technicians that come out to my house can’t do anything. They have to log the issue and then roll out a lineman and bucket truck to try to fix the problem, which has produced some improvements (ex. SC-QAM channels no longer cut out >20x per day, which was the original issue) but not a complete fix. ↩