Bash: remove headers from HTTP response

Question

Bash: remove headers from HTTP response

If I have text containing HTTP headers and body, for example:

HTTP/1.1 200 OK Cache-Control: public, max-age=38 Content-Type: text/html; charset=utf-8 Expires: Fri, 22 Nov 2013 06:15:01 GMT Last-Modified: Fri, 22 Nov 2013 06:14:01 GMT Vary: * X-Frame-Options: SAMEORIGIN Date: Fri, 22 Nov 2013 06:14:22 GMT <!DOCTYPE html> <html> <head> <title>My website</title> </head> <body> Hello world! </body> </html>

and this text is passed from the command, how can I remove the headers to leave only the body?

(The headings use \r\n . \r\n\r\n as line breaks to indicate the end of the headings and the beginning of the body.)

Here is what I tried ( ... indicates any command, for example cat or curl ) that will output some HTTP headers and body to stdout):

SED

My first idea was to make a replacement with sed to remove everything until the first appearance of \r\n\r\n :

 ... | sed 's|^.*?\r\n\r\n||'

But this does not work, mainly because sed only works on separate lines, so it cannot work with \r or \n . (Also, does it not support a non-greedy operator ? )

Grep

I also thought about using grep with a positive lookbehind for \r\n\r\n :

 ... | grep -oP '(?<=\r\n\r\n).*'

But this does not work either (mainly because grep only works on separate lines).

pcregrep has multi-line mode ( -M ), but pcregrep often not available (it is not installed by default in Ubuntu 12.04, Mac OS X 10.7, etc.), and I need a solution that does not require any non-standard tools.

Perl

Then I thought about making a replacement using perl using the /s modifier to . corresponded to line breaks:

 ... | perl -pe 's/^.*?\r\n\r\n//s'

I think this is closer to a working solution. However, I think the Perl Input Record Separator ( $/ ) defaults to \n and needs to be changed to \r\n , so . may match \r\n . The -0 option can be used to set $/ to one character, but not to several characters. I tried this, but I do not think this is correct:

 ... | perl -pe '$/ = "\r\n"; s/^.*?\r\n\r\n//s'

Also, I think ^ matches “start of line”, but should match “start of file”.

Offset and substring

I had the idea of getting the offset \r\n\r\n using:

 BodyOffset=$(expr index "$MyHttpText" "\r\n\r\n")

and then extracting the body as a substring using:

 HttpBody=${MyHttpText:BodyOffset}

Unfortunately, the Mac OS X expr version does not support index . In addition, if possible, I would like to get a solution that does not require the creation of variables.

Parameter Substitution

Another idea I had was to use parameter swapping, where # means "Remove from $MyHttpText shortest part *\r\n\r\n that matches the front end of $MyHttpText ":

 HttpBody=${MyHttpText#*\r\n\r\n}

But I'm not sure how to use this in a sequence of commands, and I would prefer a solution that does not require variables.

+6

bash regex grep perl sed

Tachyonvortex Nov 24 '13 at 19:09

source share

5 answers

Your one-line Perl command (cannot) deletes the headers, because at that time it reads only one line of input. You need to disable the input delimiter to read all the input as one line.

 perl -0777 ...

+2

TLP Nov 24 '13 at 19:12

source share

Also interesting to do in bash (internal commands only):

 #!/bin/bash while read LINE #<-- while you can read line from input do #<-- do the following actions if [ $FLAG ] #<-- if: this flag is set then echo "$LINE" #<-- echo the input to output elif [ ${LINE:0:1} = $'\r' ] #<-- else: if line starts with \r then FLAG=true #<-- then raise the flag fi done

+1

thom Nov 24 '13 at 21:44

source share

 ... | perl -ne 'print if $after_header; $after_header = 1 if /^\r$/'

0

hobbs Nov 24 '13 at 21:19

source share

curl does not return the default headers from bash unless you specify the -I (capital i) or -D (dump headers) option. So make a cure, none of them are listed in your curl call!

0

Kat Oct 21 '14 at 14:32

source share

pfnuesel · Accepted Answer · 2013-11-24T19:24:44+0000

sed can do this:

 sed '1,/^$/d' data.txt

This command deletes everything from line 1 and ends at the first occurrence of an empty string ( ^$ ). This works if you have \n as a newline. If you have \r\n as a newline, you can use dos2unix and unix2dos to convert them back and forth, or you can add the \r character to sed regex:

 sed '1,/^\r$/d' data.txt

However, the last line will only work if you have \r\n as a newline character, to make it work with both types of newline lines, you can use:

 sed '1,/^\r\{0,1\}$/d' data.txt

Here we are looking for an empty string with 0 or 1 characters \r .

Bash: remove headers from HTTP response

SED

Grep

Perl

Offset and substring

Parameter Substitution

More articles: