How to generate build code with gcc, which can be compiled with nasm

I try to learn assembly language as a hobby, and I often use gcc -S to build the assembly. It is pretty simple, but I can not compile the assembly. I was curious if this could be done at all. I tried using standard assembly output and syntax syntax using -masm=intel . Both cannot be compiled with nasm and are associated with ld .

Therefore, I would like to ask if it is possible to generate assembly code, which can then be compiled.

To be more precise, I used the following C code.

  >> cat csimp.c int main (void){ int i,j; for(i=1;i<21;i++) j= i + 100; return 0; } 

Generated assembly with gcc -S -O0 -masm=intel csimp.c and tried to compile with nasm -f elf64 csimp.s and associate with ld -m elf_x86_64 -s -o test csimp.o . The result I got from nasm reads:

 csimp.s:1: error: attempt to define a local label before any non-local labels csimp.s:1: error: parser: instruction expected csimp.s:2: error: attempt to define a local label before any non-local labels csimp.s:2: error: parser: instruction expected 

This is most likely due to violation of assembly syntax. I hope I can fix this without resorting to gcc -S manual output correction


Edit

They gave me a hint that my problem was solved in another question; Unfortunately, after testing the method described there, I was unable to create the nasm assembly nasm . You can see the objconv result below. Therefore, I still need your help.

 >>cat csimp.asm ; Disassembly of file: csimp.o ; Sat Jan 30 20:17:39 2016 ; Mode: 64 bits ; Syntax: YASM/NASM ; Instruction set: 8086, x64 global main: ; **the ':' should be removed !!!** SECTION .text ; section number 1, code main: ; Function begin push rbp ; 0000 _ 55 mov rbp, rsp ; 0001 _ 48: 89. E5 mov dword [rbp-4H], 1 ; 0004 _ C7. 45, FC, 00000001 jmp ?_002 ; 000B _ EB, 0D ?_001: mov eax, dword [rbp-4H] ; 000D _ 8B. 45, FC add eax, 100 ; 0010 _ 83. C0, 64 mov dword [rbp-8H], eax ; 0013 _ 89. 45, F8 add dword [rbp-4H], 1 ; 0016 _ 83. 45, FC, 01 ?_002: cmp dword [rbp-4H], 20 ; 001A _ 83. 7D, FC, 14 jle ?_001 ; 001E _ 7E, ED pop rbp ; 0020 _ 5D ret ; 0021 _ C3 ; main End of function SECTION .data ; section number 2, data SECTION .bss ; section number 3, bss 

Explicit solution:

I made a mistake while cleaning objconv output. I had to run:

 sed -i "s/align=1//g ; s/[az]*execute//g ; s/: *function//g; /default *rel/d" csimp.asm 

All steps can be compressed in a bash script

 #! /bin/bash a=$( echo $1 | sed "s/\.c//" ) # strip the file extension .c # compile binary with minimal information gcc -fno-asynchronous-unwind-tables -s -c ${a}.c # convert the executable to nasm format ./objconv/objconv -fnasm ${a}.o # remove unnecesairy objconv information sed -i "s/align=1//g ; s/[az]*execute//g ; s/: *function//g; /default *rel/d" ${a}.asm # run nasm for 64-bit binary nasm -f elf64 ${a}.asm # link --> see comment of MichaelPetch below ld -m elf_x86_64 -s ${a}.o 

Running this code. I get ld warning:

  ld: warning: cannot find entry symbol _start; defaulting to 0000000000400080 

An executable file executed in this way fails with a segmentation fault message. I would be grateful for your help.

+5
source share
3 answers

The difficulty I encountered with the error at the entry point was to use ld in the object file containing the entry point named main , while ld looking for the entry point named _start .

There are several considerations. First, if you contact the C library to use functions such as printf , the link will expect main as an entry point, but if you do not contact the C library, ld expects _start , your script is very close, but you need some kind of this is a way to distinguish which entry point you need to fully automate the process for any source file.

For example, the following conversion uses your approach to the source file, including printf . It was converted to nasm using objconv as follows:

Create an object file:

 gcc -fno-asynchronous-unwind-tables -s -c struct_offsetof.c -o s3.obj 

Convert using the objconv assembly file to nasm format

 objconv -fnasm s3.obj 

(note: my version of objconv added DOS line endings - maybe the option is missing, I just ran it through dos2unix )

Using a slightly modified version of your sed call, configure the contents:

 sed -i -e 's/align=1//g' -e 's/[az]*execute//g' -e \ 's/: *function//g' -e '/default *rel/d' s3.asm 

(note: if there are no standard library functions and with ld , change main to _start by adding the following expressions to your sed call)

 -e 's/^main/_start/' -e 's/[ ]main[ ]*.*$/ _start/' 

(there are probably more elegant expressions for this, this was just an example)

Compile with nasm (replacing the source file of the object):

 nasm -felf64 -o s3.obj s3.asm 

Using gcc for reference:

 gcc -o s3 s3.obj 

Test

 $ ./s3 sizeof test : 40 myint : 0 0 mychar : 4 4 myptr : 8 8 myarr : 16 16 myuint : 32 32 
+3
source

There are many different assembly languages ​​- for each processor, several possible syntaxes are possible (for example, "Intel Syntax", "AT & T Syntax"), and then completely different directives, a preliminary processor, etc. It adds about 30 different dialects of assembly language for 32-bit 80x86 only.

GCC can only generate one dialect of assembly language for 32-bit 80x86. This means that it cannot work with NASM, FASM, MASM, TASM, A86 / A386, etc. It works only for GAS (possibly YASM in its "AT & T mode").

Of course, you can compile the code with three different compilers into 3 different types of assembly, and then write 3 more different code fragments (in 3 different types of assembly) yourself; then put it all together (each with its own assembler) into object files and merge all the object files together.

+3
source

You basically cannot, at least not directly. GCC displays assembly in Intel syntax; but NASM / MASM / TASM have their own Intel syntax. They are mainly based on it, but there are some differences that the assembler may not understand and therefore not compile.

The closest, probably, the presence of objdump shows the assembly in Intel format:

 objdump -d $file -M intel 

Peter Cordes suggests in the comments that assembler directives will continue to target GAS, so they will not be recognized by NASM, for example. They usually have the same name, but directives like GAS begin with . as in .section text (vs section text ).

+2
source

Source: https://habr.com/ru/post/1241889/


All Articles