To get the innerHTML of the first <pre> element in an HTML document on Linux, you can use command-line tools like curl (to fetch the HTML) and xmllint or pup (to parse and extract the content).
curl + pup (Recommended)If you have pup installed (a command-line HTML parser), you can do:
bashcurl -s <URL> | pup 'pre' | head -n 1
curl -s <URL> fetches the HTML silently.pup 'pre' extracts all <pre> elements.head -n 1 takes the first one.If you want the innerHTML (not just the text), use:
bashcurl -s <URL> | pup 'pre html'
curl + xmllint (XPath)If you prefer xmllint (part of libxml2), you can use XPath:
bashcurl -s <URL> | xmllint --html --xpath '//pre[1]' - 2>/dev/null
--html parses the input as HTML.--xpath '//pre[1]' selects the first <pre> element.To get only the innerHTML (without the <pre> tags), use:
bashcurl -s <URL> | xmllint --html --xpath 'string(//pre[1])' - 2>/dev/null
grep (Quick & Dirty)If the HTML is simple and well-formatted, you can use grep:
bashcurl -s <URL> | grep -oP '<pre>\K[^<]*' | head -n 1
-oP enables Perl regex.\K discards the matched <pre> tag.[^<]* captures everything until the next <.⚠️ Warning: This may fail if the <pre> content contains < or > characters.
pup (Method 1)xmllint (Method 2)grep (Method 3)Vibe can make mistakes. Check answers. Learn more