pup

 pup  is a command line tool for parsing and transforming HTML using some CSS selectors.

pup supports returning the HTML elements themselves, or a JSON representation of the element, so that other tools such as jq can then further extract information without falling back to regex or other FSM.

It seems does not follow in-browser JavaScript DOM conventions, such as innerText or innerHTML properties. Some users may find that surprising.

Selecting using hierarchy of tags only is documented (see below) to have caused some users some issues.