C code to assembly using gcc and gdb
Reading the disassembled code from the C programs which we can comfortably write is a great way to learn assembly language, do some archtecture specific optimizations and also to know what's happening under the hood.
In this post, we will see how to translate a small C program to assembly (using flat assembler).
Consider the following, where the code for gcd is taken from rosetta code:
#include <stdio.h> int gcd(int u, int v) { return (v != 0)?gcd(v, u%v):u; } int main() { int n,m; scanf("%d%d",&n,&m); printf("%d \n",gcd(n,m)); return 0; } |
Compile to 32 bit code as
gcc -o gcd gcd.c -m32 |
and disassemble:
gdb ./gcd (gdb) disas gcd |
We will see something like this:
and
(gdb) disas main |
shows like this:
From the disassembly, we can see that the function arguments are pushed from right to left. We can also see that the local variables are allocated space in the stack.
We need to replace all the relative references by labels, memory references by names and remove all "PTR" keywords. Using the example to produce dynamically linked executable from fasm for linux (doing it in 1.70.03), we may write it as:
and assemble:
./fasm gcd.asm |
The assembled code will perform the same way, but the executable produced is about 10 times smaller! With the assembly code, we will have more liberty to use architecture specific instructions. And, if we see that there are unnecessary register spills happening, we may modify the code to avoid it. (using "register" keyword and -O3 option in gcc makes good use of registers)
p.s.
-
By default, disassembly syntax is not intel. To change it, use
set disassembly-flavor intel
You may consider placing it in $HOME/.gdbinit to use intel syntax everytime.
-
-m32 option in gcc is not required if 32 bit linux distro is used.
-
-g option is helpful in debugging the executable. We can check instruction-wise disassembly and also deduce the operator precedence. You'll never need another silly book on C. When in doubt, go to the root!