AWS Spot NAT Instance

My work often involves restricted private networks often found in large enterprises. I run a personal similarly provisioned AWS VPC for experiments. This comes with the challenge of providing internet egress for RFC1918 private subnet instances.

AWS provides several solutions for internet egress. After spending some time considering these, I settled on NAT instance running on Spot. The primary driver of this solution is cost.

Reliability

RiskETTDETTREETFImpactNotes
Spot Restart5m5m30 days100%Every month**
EC2 Fails5m5m90 days100%Every three months*
AZ Fails5m2h180 days100%Every six months*
Region Fails5m4h730 days100%Every two years*

Looking specifically at Spot Restart:

43,200 = 60 min * 24 hours * 30 days (valid minutes)
10     = 5 ettr + 5 ettd             (bad minutes)
99.98% = (valid - bad) / valid * 100 (fraction of good minutes)

That’s nearly four nines of reliability with the introduction of Spot. I’ve also used a generous minimum ETTD & ETTR despite the autoscale group generally recovering within two minutes.

* 90% SLA the SLO appears to be much higher.

** Based on uptime.

April 2, 2020 at 8:36:05 PM UTC+11 (1919 hours)

Cost

The following costs are an estimate:

solutionnetworkcost/GBcost/hourcost/month
NAT Gateway5-45 Gbps0.0590.05942.48
NAT Instance0-5 Gbps0-0.1140.00594.25
NAT Instance (spot)0-5 Gbps0-0.1140.00181.30

Spot market costs are variable but current data suggests:

t3a.nano (1) … total 69% savings

Performance

The performance test can be reproduced with the following command:

yum install python python-pip -y \
 && pip install --upgrade pip \
 && pip install speedtest-cli \
 && speedtest-cli

The t3.nano instances provide several Gbps up and down.

Download: 3283.42 Mbit/s Upload: 2274.26 Mbit/s

Conclusion

This solution has proven acceptable for small egress bandwidth requirements in the range of 1-5 Gbps. The Github project can be found here.