common crawl

From IndieWeb


common crawl is an open repository of web crawl data, extensively used for web analysis and generative text (AI) model training.

How to Remove Content from Common Crawl

You can request content to be removed from the Common Crawl dataset by emailing their team directly.

IndieWeb Examples

  • capjamesg requested removal of his personal website from common crawl.

See Also