How to open and present source binary data in Python?

This seems to be the type of question that should have many duplicates and many answers, but my searches only led to disappointment and futile solutions.

In Python (preferably 3.x), I would like to know how I can open a file of arbitrary type, read the bytes stored on disk, and present these bytes in their most “native”, “original” ',' raw 'before any encoding will be performed.

If the file is stored on disk as a stream 00010100 10000100 ... , then this is what I would like to present on the screen.

Such questions usually provoke the answer “why do you want to know” and “what is used”. I'm curious about my use case.

Before you mark this as a duplicate, make sure that the answer you mean really answers the question (and not just discuss encodings, etc.). Thanks!

EDIT AFTER THE FIRST THREE ANSWERS:

Thanks to three respondents up to this point, and especially Yu.F. Sebastian, for an extended discussion. From what has been said, it can be seen that my question boils down to how bytes in files are physically written to disk and how they can be read and represented. At this point in Python, it is not possible to obtain a representation of bytes in their original form, but they are available in various representations; integers, hexadecimal values, ascii, etc. As the issue is not resolved, I will leave the question open for more input.

+1
source share
3 answers

'rb' mode allows you to read the source binary data from a file in Python:

 with open(filename, 'rb') as file: raw_binary_data = file.read() 

type(raw_binary_data) == bytes . bytes is an immutable byte sequence in Python.

Do not confuse bytes with their textual representation: print(raw_binary_data) will show you a textual representation of the data, for example, byte 127 (base 10: decimal), which you can represent as bin(127) == '0b1111111' (base 2: binary) or as hex(127) == '0x7f' (base 16: hexadecimal) is displayed as b'\x7f' (seven ascii characters are output). Bytes from the printable ascii range are represented as the corresponding ascii characters, for example, b'\x41' displayed as b'A' ( 65 == 0x41 == 0b1000001 ).

Byte

0x7f not saved on disk as seven ascii 1111111 binary digits, it is not saved as two ascii hexadecimal digits: 7F , it is not saved as three 127 decimal digits. b'\x7f' is a textual representation of a byte that can be used to indicate it in the Python source code (you won’t find the literally seven characters ascii b'\x7f' on disk either) This code writes one byte to disk:

 with open('output.bin', 'wb') as file: file.write(b'\x7f') 

Some characters should be used to represent bytes, what are they?

OS interfaces (a way to access hardware such as disks) are defined in bytes, for example POSIX read (2) , i.e. bytes are the fundamental unit here: you can directly read / write bytes - you don't need any intermediate representation. See Richard Feynman. Why.

How bytes appear physically between OS drivers and hardware - it can be anything - you do not need to worry about it: it is hidden behind a single OS interface. See How are data physically written, read, and stored on hard drives?

You can call os.read() directly in Python, but you don't need it; file.read() does this for you (Python 3 objects are directly implemented on top of the POSIX interface. I / O Python 2 uses the C stdio library, which in turn uses the OS interfaces to implement its functions).

As you pointed out, it is the OS drivers and hardware that determine how bytes are written, but the Python interpreter could read them. So he is reading something - what is it? This is not a reading of the magnetic orientation of the particles on the disk, is it? He is reading something symbolic, and I want to access him.

He reads bytes. A hard drive is a small computer, and therefore it may be interesting, but it will not change that it is completely omitted (as far as "symbolic" or software).

The book "CODE" The Hidden Language of Computer Hardware and Software " gives a very gentle idea of ​​how information is presented on computers - the word" byte "is not defined until page 180. To view the levels of abstraction used on computers, the course" From NAND up to Tetris' can help .

+3
source

If you are ok with bytes:

 with open('yourfile', 'rb') as fobj: raw_bytes = fobj.read() print(raw_bytes) 

If you really want binary code:

 with open('yourfile', 'rb') as fobj: raw_bytes = fobj.read() print(' '.join(map(lambda x: '{:08b}'.format(x), raw_bytes))) 
+2
source

Python 3 presents file data as bytes . A type is basically a list of integers from 0 to 255, so a list of bytes. They have some convenient methods (for example, decoding into a string), and they were presented similarly to strings when printing. A.

To get a bitwise representation, you must use b mode when opening the file.

bin() helps you convert integers to binary representation. But you may have to delete the first two characters and fill in 0 s.

 with open(filename, 'rb') as my_file: my_bytes = my_file.read() bin_list = [bin(i)[2:].rjust(8, '0') for i in my_bytes] print(' '.join(bin_list)) 
+1
source

Source: https://habr.com/ru/post/958150/


All Articles