Ruby - comparing the hex value "==" in a string

I mainly read in the header of the image file and do a quick comparison to see which file it actually is. BMP, GIF, PNG are all easy, as their headers contain BM, GIF, and PNG to identify themselves. JPG throws me on a bit of a loop.

The first 3 bytes of jpg tend to be 0xff \ 0xd8 \ 0xff, and for the life of me I cannot get the true value in a simple comparison, no matter how I set it up.

I read in the first 4 bytes:

if data[0, 3] == "\xff\xd8\xff" puts "This is a JPG" end 

I know that I'm near, but I just can't get it to work. Please let me know what I missed here.

Note. I know there are gems for me, but I do not want to use a gem. Just like that.

+4
source share
4 answers

This is a character encoding problem. Reading the first 4 bytes from JPEG returns an ASCII encoded string:

 head = File.read("some.jpg", 4) # => "\xFF\xD8\xFF\xE1" head.encodig # => #<Encoding:ASCII-8BIT> 

Strings on the other hand are encoded by UTF-8:

 jpg_prefix = "\xff\xd8\xff" # => "\xFF\xD8\xFF" jpg_prefix.encoding # => #<Encoding:UTF-8> 

Comparing UTF-8 and ASCII strings does not work as expected:

 head[0,3] == jpg_prefix # => false 

You must explicitly set the encoding to String#force_encoding :

 jpg_prefix = "\xff\xd8\xff".force_encoding(Encoding::ASCII_8BIT) # => "\xFF\xD8\xFF" jpg_prefix.encoding # => #<Encoding:ASCII-8BIT> head[0,3] == jpg_prefix # => true 

Concatenating ASCII characters created using Integer#chr (as suggested by Mario Visic) also works:

 jpg_prefix = 0xff.chr + 0xd8.chr + 0xff.chr # => "\xFF\xD8\xFF" jpg_prefix.encoding # => #<Encoding:ASCII-8BIT> 

Or using Array#pack :

 jpg_prefix = ["FFD8FF"].pack("H*") # => "\xFF\xD8\xFF" jpg_prefix.encoding # => #<Encoding:ASCII-8BIT> 
+9
source

Your code works fine for me when Data is a string, but Data is probably an array of byte values.

Try the following:

 if data[0,3] == [0xff, 0xd8, 0xff] 

as your condition.

0
source

You should be able to compare file information with character codes, for example:

 if data[0, 3] == 0xff.chr + 0xd8.chr + 0xff.chr puts "This is a JPG" end 

If you are stuck, you can always look into the fastimage gem code, the type detection code is here: https://github.com/sdsykes/fastimage/blob/master/lib/fastimage.rb#L337-L354

Like others (@Stefan), the strings did not match in your original example, because the encodings were different.

 # Check the encodings for our strings: "\xff\xd8\xff".encoding #=> <Encoding:UTF-8> (0xff.chr + 0xd8.chr + 0xff.chr).encoding #=> <Encoding:ASCII-8BIT> # Compare our two strings with different encodings: utf8 = "\xff\xd8\xff" ascii = 0xff.chr + 0xd8.chr + 0xff.chr utf8 == ascii #=> false utf8.force_encoding("ASCII-8BIT") == ascii #=> true 

The source code would actually work just fine if you made the encoding be ASCII-8BIT

0
source

File identification is a good thing to let someone else if possible. ruby-filemagic gem will do this.

 gem 'ruby-filemagic' 

When used, it returns a string:

 require 'filemagic' magic = FileMagic.new p magic.file("/tmp/pic1.jpg") # => "JPEG image data, JFIF standard 1.02" 

The returned string can be matched with regular expressions:

 case magic.file(path) when /JPEG/ # do JPEG stuff when /GIF/ # do GIF stuff else # we don't recognize it end 

ruby-filemagic uses the libmagic library, which recognizes a large number of file types.

The documentation is a bit rare (README doesn't even have a hello world example), and it doesn't update after a few years, but don't let it stop you from trying. It is fairly easy to use and quite robust - today I have production code, and it still works great.

If for some reason you cannot use the gem, but are in the * nix environment and have access to the file command, you can get the same functionality by downloading the file:

 p `file /tmp/pic1.jpg` # => "/tmp/pic1.jpg: JPEG image data, JFIF standard 1.02\n 

In Debian, the file command is provided by the package file. Your OS may vary.

0
source

Source: https://habr.com/ru/post/1483365/


All Articles