How I Spent $149 on a Data Pack My Network Devoured in One Night (And Built an Automated Fix)

Last Friday, Comcast went down. I don’t know why. Comcast doesn’t really explain itself. One moment there was internet; the next moment there wasn’t, in the proud tradition of cable ISPs everywhere. The UniFi Dream Machine Pro Max did exactly what it was supposed to do: it silently failed over to my backup WAN, a UniFi U5G 5G backup device with a prepaid eSIM data pack.

That is great. That is the whole point of having a backup WAN.

The problem is that my network apparently treats backup cellular data with the same respect it treats a gigabit fiber connection, which is to say none at all. My home network has 51 devices on it, and none of them had any idea they were burning through a $149, 50GB eSIM data pack.

## The Numbers Are Brutal

I have syslog flowing off my network gear into Loki, so I can tell you exactly how this unfolded, because the U5G logs its data usage to syslog every hour like clockwork.

Here’s how the carnage looked:

| Time (Local) | Total Usage | Change |
|—|—|—|
| May 30, 9:04 AM | 9.0 MB | Normal standby |
| May 30, 9:17 AM | **66.5 MB** | Failover begins |
| May 30, 10:17 AM | 191.5 MB | +125 MB/hr |
| May 30, 11:17 AM | 316.3 MB | +125 MB/hr |
| May 30, 3:17 PM | 690.7 MB | Still climbing |
| May 30, **3:17 PM → 4:17 PM** | 0.7 GB → **7.0 GB** | 💀 Something woke up |
| May 30, 4:17 PM → 5:17 PM | 7.0 GB → **17.7 GB** | 10+ GB/hr |
| May 30, 5:17 PM → 6:17 PM | 17.7 GB → **28.9 GB** | 11+ GB/hr |
| May 30, 6:17 PM → 7:17 PM | 28.9 GB → **30.2 GB** | Starts tapering |
| May 31, 11:17 PM | **53.6 GB** | Final tally |

That spike between 3:17 PM and 6:17 PM is something else. Three hours of sustained 10+ GB/hr. That’s close to 300 Mbps average sustained throughput on a cellular backup link that was supposed to be, you know, a *backup*.

Here is the thing though: none of this is unusual usage for our household. Between media files downloading, streaming, and general day-to-day activity across 51 devices, we routinely push 4 to 5TB through Xfinity in a month. On a gigabit fiber connection that is completely fine and nobody thinks twice about it. On a 50GB prepaid eSIM data pack, it is catastrophic. The network failed over silently, nobody had any idea anything had changed, and everything just kept running at full speed like it always does.

To be fair, UniFi does send a notification when a WAN failover occurs. I saw it. I glanced at it and moved on, because a small push notification does not really convey the urgency of “your metered backup connection is now carrying your entire household.” What I actually needed was a direct message that told me which connection I was on, what speed limit had just been applied, and what it would cost me if I ignored it. A targeted Discord DM with that context is a fundamentally different thing than a generic system alert, and it turns out that difference matters quite a bit when $149 is on the line.

By the time the eSIM data pack ran out, we’d burned through 53.6 GB. The pack was 50 GB. The carrier helpfully provided an overage data allotment at a rate that I’d rather not think about too hard. Total damage: somewhere north of $149.

Comcast was back up by morning. The network failed back over gracefully. Nobody noticed anything was wrong except me, sitting here looking at a Loki dashboard wondering what happened.

## Why There Was No Safety Net

The UniFi gear is genuinely excellent at failover. It detected the outage, cut over to WAN3 in seconds, and restored connectivity without dropping a single active session. That’s impressive and it’s exactly what you want from a router. What it doesn’t have is any native concept of “apply a speed limit when this specific WAN is active.”

You can set WAN rate limits in UniFi’s Traffic Management. You can enable Smart Queue (HTB shaping) on any WAN interface. What you can’t do is tell it “enable these settings *only when WAN3 is the active uplink*.” That’s a static configuration, not a conditional one. The router doesn’t have a “budget mode” toggle that fires automatically on failover.

So the network failed over, every device on the LAN kept doing exactly what it was doing at full speed, and 53 GB disappeared into the ether over about 18 hours.

## The Fix: Automated Failover Detection and Rate Capping

The solution is to build the conditional logic that UniFi doesn’t provide natively. The UDMP exposes a full REST API, and one of the things it reports is which WAN interface is currently the active uplink. You can also push network configuration changes to it via PUT requests. Those two things together are everything you need.

Here’s the architecture:

“`
wan-watchdog (Docker on Unraid)
↓ polls UDMP API every 30 seconds
↓ reads uplink.comment on the UDM Pro Max device
↓ “WAN” → Xfinity is active, no cap needed
↓ “WAN3” → 5G backup active, apply rate limit
→ UDMP: PUT networkconf with wan_smartq_enabled + wan_provider_capabilities
→ Discord: DM notification on state change
“`

The watchdog is a small Python container that runs on Unraid with `–restart unless-stopped`. It authenticates to the UDMP, polls the device stat endpoint, and compares the current active WAN to the previous state. On a state change it either applies or removes a Smart Queue configuration on the WAN3 interface.

The rate cap itself is configured via two fields on the WAN3 network config object:

“`python
conf[“wan_provider_capabilities”] = {
“upload_kilobits_per_second”: 10000, # 10 Mbps up
“download_kilobits_per_second”: 30000, # 30 Mbps down
}
conf[“wan_smartq_enabled”] = True
“`

30 Mbps down is plenty for everything a household actually needs. Browsing works. Video calls work. Streaming in HD works if you’re not doing it on 12 devices simultaneously. It is the background stuff, backup jobs, app updates, Plex metadata fetching, NAS sync tasks, that does not need to run at full speed and absolutely will if you let it.

On failback to primary WAN, the watchdog sets `wan_smartq_enabled = False` and removes the provider capabilities, restoring full-speed operation. The network does not notice. The users do not notice. I get a Discord DM either way.

## One Gotcha: CSRF Tokens

UniFi’s API requires a CSRF token on all write operations. You get it from the login response header (`X-Csrf-Token`) and need to include it on every subsequent PUT or POST. The cookie alone is not enough, and if you miss it you get a 403 that looks exactly like a permissions problem. I spent more time on this than I’d like to admit.

Also, the local service account I use for API access needed its Network permission upgraded from `Viewer` to `Administrator` to allow config writes. Read operations work on the readonly account just fine; write operations do not. Obvious in retrospect.

“`python
def udmp_login():
s = requests.Session()
s.verify = False
r = s.post(f”{BASE}/api/auth/login”, json={“username”: USER, “password”: PASS})
r.raise_for_status()
csrf = r.headers.get(“X-Csrf-Token”, “”)
if csrf:
s.headers.update({“X-Csrf-Token”: csrf})
return s
“`

That’s it. One extra header. Costs half an afternoon to figure out if you don’t know to look for it.

## What the Discord Notifications Look Like

When Xfinity goes down and WAN3 kicks in, I get this:

> 📡 **WAN Failover** — switched to **WAN3 (5G backup)**
> ✅ Rate cap applied: **30 Mbps down / 10 Mbps up**
> Watching for Xfinity to recover…

And when it comes back:

> ✅ **WAN Restored** — back on **Xfinity (primary)**
> ✅ Rate cap removed — running at full speed.

This is the difference between knowing something happened and knowing what to do about it. The message tells me which connection is active, confirms the speed cap was applied successfully, and makes it obvious when things are back to normal. The notifications come through an existing bot so there is no new infrastructure needed on the Discord side.

## Running It

The container runs on Unraid with a persistent env file at `/mnt/user/appdata/wan-watchdog/watchdog.env` so it survives reboots. Docker’s `–restart unless-stopped` handles everything else. Total resource footprint is negligible — it’s a Python process that wakes up every 30 seconds, fires off two HTTP requests, and goes back to sleep.

“`bash
docker run -d \
–name wan-watchdog \
–restart unless-stopped \
–network host \
–env-file /mnt/user/appdata/wan-watchdog/watchdog.env \
wan-watchdog:latest
“`

The code is in version control alongside everything else in the homelab config repo. If Unraid dies and I rebuild, this comes back up with everything else.

## Lessons Learned

**Failover is not free.** Having a backup WAN is great. Having a backup WAN with no data budget awareness is a $149 lesson. The router’s job is to keep you connected. Managing what that connection costs is your problem.

**Log everything.** The only reason I can tell you exactly when the failover happened, how fast data was being consumed, and at what point things went sideways is because syslog from the U5G was flowing into Loki. Without that I would have a mysterious data overage charge and no idea what caused it. With it I have a complete timeline down to the hour.

**A notification is only as good as its context.** UniFi does send a failover alert. I saw it and ignored it because it did not tell me anything actionable. A Discord DM that says “you are now on metered cellular and your speed has been capped to 30 Mbps” is something I will actually respond to. The medium matters less than what the message contains.

**The API is your friend.** The UniFi API is well-documented and reasonably consistent. If you are running UDMP hardware and not automating things via the API, you are leaving capability on the table. The UI is great for initial configuration. The API is how you build logic the UI does not support.

**What happened at 3:17 PM remains an open question.** Usage went from 690 MB to 7 GB in a single hour. I have 51 suspects and no alibi for any of them. This investigation is ongoing.

*The wan-watchdog source is in the homelab-config repo. The key dependencies are `requests` and a correctly-permissioned local UDMP admin account. Configs in version control as always.*

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.