On Linux how to get first HTML `pre` innerHTML without `pup`?
Jun 3, 2:01pm

To extract the first <pre> element's innerHTML from an HTML file or input on Linux without using pup, you can use a combination of command-line tools like grep, sed, or awk. Here’s a concise and reliable approach using sed and grep:

Method 1: Using grep and sed

This method extracts the content between the first <pre> and </pre> tags:

bash
grep -oP '(?<=<pre[^>]*>).*?(?=</pre>)' file.html | head -n 1
  • -oP enables Perl-compatible regex and only prints the matched part.
  • (?<=<pre[^>]*>) is a positive lookbehind for the opening <pre> tag.
  • .*? lazily matches any character (including newlines if -z is used).
  • (?=</pre>) is a positive lookahead for the closing </pre> tag.
  • head -n 1 ensures only the first match is returned.

For multi-line content: If the <pre> content spans multiple lines, use grep -z (treat input as a single line):

bash
grep -zoP '(?<=<pre[^>]*>).*?(?=</pre>)' file.html | head -n 1

Method 2: Using sed

This method is more robust for multi-line content:

bash
sed -n '/<pre[^>]*>/,/<\/\s*pre>/p' file.html | sed '1d;$d' | sed ':a;N;$!ba;s/\n//g'
  • The first sed extracts everything between <pre> and </pre>.
  • 1d;$d removes the first and last lines (the tags themselves).
  • The last sed removes newlines (optional, depending on your needs).

Method 3: Using awk

For more control, use awk:

bash
awk '/<pre[^>]*>/{flag=1; next} /<\/pre>/{flag=0} flag' file.html | head -n 1
  • This prints all lines between <pre> and </pre>, and head -n 1 ensures only the first block is returned.

Example Usage

If your HTML looks like this:

html
<html>
<body>
<pre>
  This is the first pre block.
  It can span multiple lines.
</pre>
<pre>This is the second pre block.</pre>
</body>
</html>

Running:

bash
grep -zoP '(?<=<pre[^>]*>).*?(?=</pre>)' file.html | head -n 1

Will output:

text
  This is the first pre block.
  It can span multiple lines.

2/2
Jun 3, 2:10pm

This is a shared conversation and may include unverified content that doesn't reflect Mistral AI's views. Login to resume chatting.

Vibe can make mistakes. Check answers. Learn more