BeautifulSoup

From IndieWeb
Jump to navigation Jump to search


BeautifulSoup is an HTML parsing library for Python. BeautifulSoup4 leverages one of several possible HTML/XML parsing libraries (ElementTree, lxml, or html5lib) and provides a nice interface for wading through HTML, even very broken HTML (Earlier versions used complex regular expressions, instead of a "real" parser, to do this).

In the context of an IndieWeb site, BeautifulSoup is used by microformats parsing libraries, and is often used directly, for example to find <link rel>'s in an external document.