Yahoo Follow-up
On Friday we reported that there were connectivity issues with Yahoo. Initially we thought that it was a problem either at Yahoo or perhaps inside Verizon's networks based on emails we received. Later we determined that it was not Verizon or Yahoo, but more likely an issue at Level3. Yahoo's official response is here.
The first indication we got that the problem was at Level3 was from a post to the NANOG mailing list showing the output of a traceroute to Yahoo. Here are the last few hops, notice the latency at and beyond Level3:
13 * 70 ms 77 ms ge-0-3-0-69.bbr2.sanjose1.level3.net [4.68.18.2]
14 * 78 ms 71 ms so-14-0.hsa4.sanjose1.level3.net [4.68.114.158]
15 487 ms 449 ms 459 ms hanaro.hsa4.level3.net [4.79.60.22]
16 * * * Request timed out.
17 * * * Request timed out.
18 * 586 ms * te-8-1.bas-a2.sp1.yahoo.com [209.131.32.19]
19 * 570 ms * f1.www.vip.sp1.yahoo.com [209.131.36.158]
20 * * 591 ms f1.www.vip.sp1.yahoo.com [209.131.36.158]
Later, one of our readers found that a BGP peer of Level3 was advertising itself as the best path via San Jose for a large number of routes. The advertisement came from AS9318 (Hanaro Telecom) and caused Yahoo and many other sites that were reached via Level3 to be unavailable for a period of about an hour. As an example, that reader did a route lookup for www.merit.edu (host of the NANOG mailing list) to show that it wasn't just Yahoo that was affected. Here is the output provided to the Internet Storm Center:
BGP routing table entry for 198.108.0.0/14
Bestpath Modifiers: deterministic-med
Paths: (2 available, best #1)
Not advertised to any peer
9318 9318 11164 237, (aggregated by 237 lo0x0.2.nl-chi3.mich.net)
AS-path translation: { APNIC-AS-3-BLOCK APNIC-AS-3-BLOCK WILLINET NSFNETTEST14 }
lo-22.hsa4.SanJose1 (metric 161) from lo-22.err1.SanJose1 (lo-22.err1.SanJose1)
Origin IGP, metric 0, localpref 100, valid, internal, atomic-aggregate, best
Community: North_America Lclprf_100 Level3_Customer United_States San_Jose
Originator: hsa4.SanJose1
9318 9318 11164 237, (aggregated by 237 lo0x0.2.nl-chi3.mich.net)
AS-path translation: { APNIC-AS-3-BLOCK APNIC-AS-3-BLOCK WILLINET NSFNETTEST14 }
lo-22.hsa4.SanJose1 (metric 161) from lo-22.err2.SanJose1 (lo-22.err2.SanJose1)
Origin IGP, metric 0, localpref 100, valid, internal, atomic-aggregate
Community: North_America Lclprf_100 Level3_Customer United_States San_Jose
Originator: hsa4.SanJose1
If the same query is done now, here is what Level3's looking glass service says for www.merit.edu via San Jose:
BGP routing table entry for 198.108.0.0/14
Bestpath Modifiers: deterministic-med
Paths: (2 available, best #2)
Not advertised to any peer
7911 237 237 237 237
AS-path translation: { WCG NSFNETTEST14 NSFNETTEST14 NSFNETTEST14 NSFNETTEST14 }
lo-22.car4.SanJose1 (metric 141) from lo-22.err2.SanJose1 (lo-22.err2.SanJose1)
Origin IGP, metric 0, localpref 100, valid, internal
Community: North_America Lclprf_100 Level3_Customer United_States San_Jose 7911:777 7911:7705
Originator: car4.SanJose1
7911 237 237 237 237
AS-path translation: { WCG NSFNETTEST14 NSFNETTEST14 NSFNETTEST14 NSFNETTEST14 }
lo-22.car4.SanJose1 (metric 141) from lo-22.err1.SanJose1 (lo-22.err1.SanJose1)
Origin IGP, metric 0, localpref 100, valid, internal, best
Community: North_America Lclprf_100 Level3_Customer United_States San_Jose 7911:777 7911:7705
Originator: car4.SanJose1
Over at Netcraft, you can see the brief outage by observing the red area on the bottom-right side of this status graphic:
So, bottom line - it wasn't Yahoo having the problems. It was a BGP routing issue that affected reachability of many sites that had routes advertised through Level3. Unfortunately this is one of the Internet's "dirty little secrets" - BGP updates are the lifeblood of the Internet but yet there are many ways these route advertisements can fail. There have been many suggestions for improvement (see the soBGP and S-BGP projects) and even the US Department of Homeland Security has tried to get some traction in making improvements to the routing infrastructure. But the Internet remains vulnerable to these types of configuration errors and intentional false routing advertisements.
Marcus H. Sachs
Director, SANS Internet Storm Center
Comments