Developer Notes

Beware Cloudflare

May 10, 2021

Cloudflare is a DNS provider that offers a range of addon services to help protect and speed up websites. We use it extensively because even the free tier offers a number of features that are very useful to us, e.g:

However, as we discovered recently, Cloudflare can also cause problems - even for websites that aren't using it.

Losing Our Place in Google's Search Index

We recently launched a small website on a newly purchased domain. We submitted a sitemap to Google, but since we weren't particularly interested in attracting traffic at that time, we didn't make any efforts to promote the site.

After a week or so, we checked the site in Google Search Console. Everything was as expected - our sitemap had been processed, and the pages were indexed. Over the next few weeks, however, we noticed that pages were being removed from the search index until there was just a single page left.

Taking a closer look at the Search Console coverage report, it turned out that our pages had been excluded as "Duplicate without user-selected canonical". When we examined the excluded URLs, it turned out that Google had selected similar URLs on some decidedly X-rated domains as canonical instead of ours. Searching for our content through Google's search engine gave us hits on the bad domains, not on our own.

So what happened?

Our Content on Unknown Domains

We initially thought that our content had been scraped, but when we looked at the pages that Google was linking to, it turned out that they were exactly identical to ours - our website was, for some reason, available on a number of domains unknown to us, with names containing a lot of "xx"s and words like "badgirl" and "porn". That didn't make much sense from a "stealing content" perspective - we would have expected to see ads, or perhaps malware and links to other x-rated websites, but there was nothing of that kind.

We then tried to make small changes to our content to see what would happen. Our changes appeared immediately on both our own site and on the bad domains, so that pretty much ruled out the scraping theory. Then we looked in our server logs, and lo and behold - the requests for the bad domains were in our access logs. For some reason our server was responding to requests made on the bad domains, and the responses were being rendered in the browser, without any errors.

Next step was to take a look at the IP addresses the bad domains were pointing to. They turned out to belong to Cloudflare. That meant that the bad domains were using Cloudflare's DNS in proxy mode.

Proxying Through Cloudflare

Cloudflares' DNS can be used in two different modes. "DNS only" works just like any other DNS service, i.e. looking up a domain name will get you the IP address of the server hosting that domain. In the "Proxied" mode, however, Cloudflare will respond with its own IP address. It will accept the request on behalf of the server and do some processing before passing them on. (Examples of this processing are filtering a request through Cloudflare's firewall service, blocking repeated login requests, responding to requests for assets from its CDN etc).

When a browser connects to a website via Cloudflare's proxy, there are two network connections being made:

  1. The browser connects to Cloudflare
  2. Cloudflare connects to the web server (the "origin" server)

This is how the bad domains were configured. So, someone had been running a number of x-rated websites on a web server, and had pointed several different domains to that server's IP address using Cloudflare in proxy mode. At some point that server had been decommissioned, returning its IP address to the provider's pool, but the Cloudflare config had been left in place, still pointing to that IP address. Then we come along and provision a new server - and we just happened to get that particular IP address. So now our server was getting a large number of requests for content on the previous tenant's websites.

But why did our web server process requests for domains that we hadn't configured, and why did we not get an SSL certificate error in the browser when visiting the bad domain?

nginx request processing and Cloudflare SSL modes

Our server is running nginx. It is the only website on that server, so nginx is configured with a single server block that only accepts HTTPS. Plain HTTP is not supported

When nginx receives a request, it will first try to find a matching server block in its configuration, e.g. a matching domain name. If the lookup fails, it will forward the request to the default server block. If there is no default server block configured, it will just forward the request to the first server block in its list.

We hadn't configured a default server block - we didn't think we needed one - so nginx directed requests for unknown domains (including the bad domains) to our website. But the responses should have caused an error in the browser since we don't have the bad domains in our SSL certificate. Why didn't that happen?

The answer is that, when Cloudflare is being used in proxy mode, the HTTPS connection from the browser is terminated at the edge of Cloudflare's network. From the browser's perspective it is connected to Cloudflare, not to our server, and it is Cloudflare's certificate that is being used to encrypt the connection.

What happens between Cloudflare and the origin server depends on how each individual domain has been configured. Cloudflare provides three different options:

Evidently, the bad domains had been configured to use "full SSL", which meant that Cloudflare accepted the responses from our server, even though our certificate wasn't valid for those domains. It then forwarded the responses to the browser, this time using its own, valid certificate.

To summarize:

From Google's perspective this meant our content was available on multiple URLs, and it apparently selected those with most incoming links (watching porn on the web is apparently a popular passtime). And our low-traffic website got bumped from the index.

So What to Do About It?

It is worth pointing out that this could happen to anyone regardless of which DNS service is being used. If you provision a web server and happen to get an IP address that is the target of another domain proxied through Cloudflare using the "Full SSL" setup, your server will get the traffic, and its responses will be treated as valid.

We briefly considered contacting Cloudflare Support to get DNS entries for the bad domains taken down, but this would have taken some time (we would have to look through our logs to find all the different bad domains that were pointing to our IP). I might also be something that Cloudflare wouldn't want to deal with, as it would require some way of verifying that we the rightful users of that particular IP address.

So we opted for the simplest approach and re-provisioned the server with a new IP address. That did the job (except we could now have a similar problem with different domains that we just haven't discovered yet).

If you are using nginx, the best solution is probably to create a default nginx server block to handle this scenario. We have been experimenting with something like this:

server {
    server_name _;
    listen 443 ssl default_server;
    ssl_certificate  /etc/nginx/ssl/dummy.cert;
    ssl_certificate_key  /etc/nginx/ssl/dummy.key;
    return 404;

The certificates used in this server block don't have to be valid, they can be from another website or self-signed, they just have to be present in the config file or nginx won't start.

Anyone connecting directly to the server using a domain we don't know about will get a certificate error. If they connect through Cloudflare on a domain configured to use "full SSL", they will get a 404 error. Google will ignore the URL in either case, which is what we want.

← Back to articles