Extracting HTML content on Linux

On Linux how to get first HTML `pre` innerHTML without `pup`?

Jun 3, 2:01pm

To extract the first <pre> element's innerHTML from an HTML file or input on Linux without using pup, you can use a combination of command-line tools like grep, sed, or awk. Here’s a concise and reliable approach using sed and grep:

Method 1: Using `grep` and `sed`

This method extracts the content between the first <pre> and </pre> tags:

bash
grep -oP '(?<=<pre[^>]*>).*?(?=</pre>)' file.html | head -n 1

-oP enables Perl-compatible regex and only prints the matched part.
(?<=<pre[^>]*>) is a positive lookbehind for the opening <pre> tag.
.*? lazily matches any character (including newlines if -z is used).
(?=</pre>) is a positive lookahead for the closing </pre> tag.
head -n 1 ensures only the first match is returned.

For multi-line content: If the <pre> content spans multiple lines, use grep -z (treat input as a single line):

bash
grep -zoP '(?<=<pre[^>]*>).*?(?=</pre>)' file.html | head -n 1

Method 2: Using `sed`

This method is more robust for multi-line content:

bash
sed -n '/<pre[^>]*>/,/<\/\s*pre>/p' file.html | sed '1d;$d' | sed ':a;N;$!ba;s/\n//g'

The first sed extracts everything between <pre> and </pre>.
1d;$d removes the first and last lines (the tags themselves).
The last sed removes newlines (optional, depending on your needs).

Method 3: Using `awk`

For more control, use awk:

bash
awk '/<pre[^>]*>/{flag=1; next} /<\/pre>/{flag=0} flag' file.html | head -n 1

This prints all lines between <pre> and </pre>, and head -n 1 ensures only the first block is returned.

Example Usage

If your HTML looks like this:

html
<html>
<body>
<pre>
  This is the first pre block.
  It can span multiple lines.
</pre>
<pre>This is the second pre block.</pre>
</body>
</html>

Running:

bash
grep -zoP '(?<=<pre[^>]*>).*?(?=</pre>)' file.html | head -n 1

Will output:

text
  This is the first pre block.
  It can span multiple lines.

2/2

Jun 3, 2:10pm

This is a shared conversation and may include unverified content that doesn't reflect Mistral AI's views. Login to resume chatting.

Vibe can make mistakes. Check answers. Learn more

Method 1: Using grep and sed

Method 2: Using sed

Method 3: Using awk

Example Usage

Method 1: Using grep and sed

Method 2: Using sed

Method 3: Using awk

Example Usage

Method 1: Using `grep` and `sed`

Method 2: Using `sed`

Method 3: Using `awk`

Method 1: Using `grep` and `sed`

Method 2: Using `sed`

Method 3: Using `awk`