Over the weekend, Cloudflare released a tool called isBGPSafeYet.com, to "track deployments and filtering of invalid routes by the major networks". The release has managed to get some notoriety (both positive and controversial) — more than one might expect for such a technical topic.
If you haven't seen or heard of the website, this is what it looks like as of April 20, 2020:
Looks kind of alarming, doesn't it? In case you're not familiar with some of the terminology being thrown around, let's review these concepts. If you're already familiar with BGP & RPKI technologies, feel free to skip ahead.
BGP is the dynamic routing protocol 1 that quite literally runs the internet. ISPs, cloud providers, and larger enterprises use BGP to advertise their networks to one another, which is how anything on the internet knows how to get to anything else. Take the below example 2:
In this traceroute, the highlighted hop 7 shows where BGP is happening - AS22773, or Cox Communications (my home's ISP), directly interconnects with AS15169, or Google. My home network can reach Google through this interconnection, which is commonly referred to as a peering arrangement.
As you can probably imagine, there's a lot more to BGP than this, very little of which is in-scope for this article. There are lots of great resources out there to learn more about BGP and internet peering — I'll link to some of my favorites in the Resources section.
RPKI, or Resource Public Key Infrastructure, is a security mechanism introduced via RFC6487 in February 2012.
To understand RPKI, one must first understand the problem it's trying to solve: BGP Hijacking. Cloudflare's isBGPSafeYet.com controversial website actually does a fantastic job of illustrating what BGP Hijacking is:
In a nutshell, an attacker can advertise networks they don't own to ISPs and cause traffic to web resources within those networks to route through the attacker's network. This can lead to the web resources appearing completely down, or worse, the traffic can be intercepted and sent on to its original destination with the end-user being none the wiser.
Does this really happen?
Yes, this really happens! Here are just a few examples:
This Cloudflare blog article also recaps some of the more infamous hijacks:
- 1997 - AS7007 mistakenly (re)announces 72,000+ routes (becomes the poster-child for route filtering).
- 2008 - ISP in Pakistan accidentally announces IP routes for YouTube by blackholing the video service internally to their network.
- 2017 - Russian ISP leaks 36 prefixes for payments services owned by Mastercard, Visa, and major banks.
- 2018 - BGP hijack of Amazon DNS to steal crypto currency.
By default, BGP allows this to happen, but it's up to the network operators and providers to prevent it.
RPKI is a certificate system which may be used by an organization to cryptographically sign which networks they own, and from where they should originate. These certificates are called ROA (Route Origin Authorization) records.
There are two primary requirements to ensure RPKI is effective:
1. Sign ROAs
A BGP-operating organization must sign and issue ROAs for their networks, which are validated and maintained by Regional Internet Registries. While this is a critical first step, it doesn't solve any problems by itself.
2. Reject Invalid Routes
For this organization's ROAs to do anyone any good, internet providers upstream from the organization have to reject any route advertisements that are signed by the organization, but advertised by someone else.
For example, if Google says
220.127.116.11/18 should only originate from Google (AS15169) via their ROAs, and
18.104.22.168/18 is suddenly advertised from an attacker (whether maliciously or by mistake), internet providers should reject the route so that the traffic is never sent to the attacker:
You might be wondering why a provider would accept a route to a network they don't own in the first place. This is an important question! Network providers should never accept route advertisements from customers or peers unless the organization is authorized to advertise that prefix. Unfortunately, it's impossible to enforce such a statement on a global scale.
RPKI's goal is to provide a mechanism through which participating networks can prevent these hijacks from happening, or at least reduce their impact.
If you've read the above background information, or are already familiar with these technologies, you can see that RPKI adoption is an important step to improving the security of the internet. The discussion has been ongoing in the community for nearly a decade. Many content providers (like Cloudflare) have championed the implementation of RPKI, and some major internet carriers have recently taken a major step by rejecting RPKI-invalid route advertisements from ever entering their networks (Telia, NTT).
To date, Cloudflare has been doing fantastic work related to routing security in the internet operator community; they're probably the largest vocal advocate of RPKI adoption in the world. However, the release of isBGPSafeYet.com has been very controversial in the community, and with good reason.
Public shaming is rarely a successful or healthy catalyst for change, and Cloudflare's new tool enables potentially under-educated end-users to quickly (2 clicks) call out their service providers for not "implement[ing] BGP safely", which isn't necessarily accurate.
A couple of highly-respected network engineers (@Benjojo12, @JobSnijders) tested Cloudflare's own claim to "implement BGP safely", and found some interesting results. They intentionally advertised a network onto the internet —
22.214.171.124/24 — from an invalid ASN, which should be considered an RPKI-invalid network. Cloudflare's own RPKI Validation Tool confirms this:
If Cloudflare truly rejects all invalid RPKI networks as they claim to, traffic originating from this network (from the "attacking" source) should be unreachable for Cloudflare. In other words, you shouldn't be able to reach Cloudflare from that network. And yet:
This screenshot shows that traffic originating from the "attacker" can reach Cloudflare's network, and that Cloudflare's network has reachability back to the "attacker" network. Another engineer set up a site using Cloudflare Workers that also shows Cloudflare able to reach their own invalidity test site.
To sum up our take on isBGPSafeYet.com, and Cloudflare:
We love Cloudflare. This site is served over Cloudflare's CDN, we exclusively use their DNS services, and we recommend them to just about all our customers.
However, we believe isBGPSafeYet.com is a well-intentioned, poorly-implemented scare tactic that will only serve to raise RPKI awareness in a negative light, at a very sensitive time in the global internet community due to the COVID-19 pandemic. And, if Cloudflare is going to be the RPKI standards bearer for everyone to follow, shouldn't they implement RPKI correctly themselves?
Cloudflare's method of testing a network's "safe" implementation of BGP is reachability to their network
126.96.36.199/24, which is intentionally advertised with an invalid ROA. However, reachability to this network isn't a sole indicator of an unsafe BGP implementation. Not all network providers accept a full routing table from their upstream carriers. Indeed, many providers only accept routes local to a particular region, with a default route to cover the rest. In these cases, the provider has no control over its reachability to an RPKI-Invalid network. This further illustrates that the attitude of Cloudflare's isBGPSafeYet.com tool is not the right approach to fostering the adoption of RPKI.
What is Stellar doing about RPKI?
At Stellar, we place extremely high importance on routing security. You can read more about our routing policies here. In summary:
Customer & Peer Routing
We ensure invalid routes never enter our network from customers and peers by only accepting routes that are either a) listed in an Internet Routing Registry, b) manually validated by our engineering team, or both.
We also use BGP policies to ensure only routes we or our customers are authorized to advertise ever leave our network.
Route Origin Authorization
As a cloud provider, our primary objective in the RPKI ecosystem is to prevent the hijacking of our own networks by a third party. This is accomplished by signing and issuing ROAs, which inform upstream RPKI participants of valid paths for the containing networks. At Stellar, we issue ROAs for our address space:
Rejecting RPKI-Invalid Routes
The above is the BGP routing table entry for Cloudflare's intentionally invalid RPKI test network. The fact that it exists in our routing table means that we do not yet reject RPKI-Invalid routes from our upstream providers.
While it's been on our roadmap for some time, we've placed a higher priority on other objectives, such as our customer & peer routing policies, global network redundancy, and making our network faster and more reliable. Additionally, there are very few options available for the server implementation of RPKI - all of them open source. While we fully support and contribute to the open source community, the use of open source software in production networks requires a great deal of testing, validation, and architecture effort to ensure what we implement meets our Service Level Agreement across all our data centers.
We do plan to implement an RPKI solution to reject invalid routes, but it is still in progress at this time.
BGP & Internet Peering
- DrPeering: Internet Peering Playbook
- Cloudflare: BGP Routing Explained
- CBTNuggets: What is Border Gateway Protocol?
- Dynamic routing protocols are used between routers in complex and/or highly available networks to automatically route traffic over multiple paths, react to failures, and communicate which networks can be reached via which paths.↩
- I've modified the output of this traceroute for readability. I obfuscated some of the intermediate AS22773 hops, removed all the per-hop latency statistics, and changed all the RFC1918/RFC6598 AS numbers to AS22773 since that's where they truly originate (by default,
tracerouteshows these as