CloudFlare

From IndieWeb

CloudFlare is a service to, in their own words, "supercharge your website". Essentially they provide a distributed DNS service and CDN that acts as a caching reverse proxy for traffic from the user to the website. There's a free basic plan available, with paid plans available for $20 (pro) or $200 (business) a month.

The ability to use HTTPS is included in every plan and CloudFlare provides a "Universal SSL" feature in which they automatically create and manage an SSL certificate for their customer's domains. This is available in the free plan as well, making CloudFlare effectively a free certificate authority!

Requirements and setup

  • To use CloudFlare, you need a domain and a registrar that allows you to change the DNS servers (NOT just DNS records).
  • You must either provide an IP address for your website (A record) or a hostname (CNAME record). CloudFlare also allows using CNAME for the root domain!
  • For every A or CNAME record (e.g. different subdomains) you can specify to either use the CloudFlare proxy/CDN or not.

SSL setups

  • Flexible SSL only encrypts the connection between the user and the CloudFlare proxy edge location, the traffic to the origin server is not encrypted! For static sites with mostly public content this should be fine, since the risk of an attack closer to the end user (e.g. ISP, untrusted WiFi hotspot) is more likely than an attack on the core infrastructure (IMHO).
  • Full SSL encrypts the connection between the user and the CloudFlare proxy edge location as well as between the edge location and the origin server. The origin server can use a self-signed certificate as well. It's not end-to-end encryption though, so you need to trust CloudFlare on this.

Possible issues and disadvantages

  • CloudFlare may block access to fight potential attackers, change the HTML source of your pages (e.g. add some custom code), minify or combine JS/CSS etc., either through services you request ("CloudFlare apps") or on their own because they decide it's required for better performance or security. Generally the less you pay the less control you have over how they "supercharge" your website. There's no visible CloudFlare branding on the site, but it's visible in the source and on the custom error pages they may show on your site.
    • In particular, CloudFlare shoves annoying CAPTCHAs right in the face of Tor users by default. You can whitelist Tor traffic.
  • This mechanism also blocks VPNs and other unusual/potentially malicious source IPs, which can interfere with wanted crawlers as well (e.g. fetching feeds, verifying Webmentions)
    • Webmention services like Brid.gy and Webmention.app will return 503 error when "Bot Fight Mode" (free) and/or "Super Bot Fight Mode" (pro) are turned-on. It is not possible to use Firewall rules to bypass these features (frequently requested in Cloudflare forums). The only workaround is to turn these off completely.
  • Universal SSL works with SNI only.
  • If you use your domain for email, you may no longer be able to use email services provides by your registrar and CloudFlare does not provide any email services, so you have to provide MX records for your own or a hosted mail service.
  • A lot of global traffic passes through CloudFlare, which means the Internet is becoming more centralized.
  • According to CloudFlare Terms of Service, "copies of [abuse complaints] may be provided to the CloudFlare user, the user’s hosting provider, posted on CloudFlare’s website, and/or provided to third party services such as ChillingEffects.org". Which already resulted in people being targeted for reporting abuse.

avoiding captcha pages with Python

Cloudflare seems to be looking for "real" browser request headers and the ability to store session cookies. If this doesn't happen, it'll show a HTTP 403 page with a captcha challenge.

In order to avoid this, the requests Python module has a functionality called Session. By sending a dummy request, the Session object will store the session cookie, and each consecutive request will use this session cookie, thus behaving "properly" in the eyes of CloudFlare.

Example code:

   import requests
   headers = {
       "User-Agent": "Mozilla/5.0 (X11; Linux x86_64; rv:67.0) Gecko/20100101 Firefox/67.0",
       "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
       "Accept-Language": "en-US,en;q=0.5",
       "Accept-Encoding": "gzip, deflate, br",
       "DNT": "1",
       "Connection": "keep-alive",
       "Upgrade-Insecure-Requests": "1",
       "Pragma": "no-cache",
       "Cache-Control": "no-cache",
   }
   session = requests.Session()
   # make the dummy request
   url = "https://www.artstation.com/"
   session.get(url, headers=headers)
   # make the real request, which now should work:
   url = "https://www.artstation.com/users/%s/likes.json?page=%s" % ("username", 1)
   res = session.get(url, headers=headers)
   print(res.json())

Downtime

2022-10-25

See Also