Detecting recursion in a C file using Python

I need to detect direct and indirect recursion in a rather large (5-15,000) set of C files (not C ++).

Files are already pre-processed.

The code is pretty "old school" for security reasons, so there are no fancy things like function pointers, only functions that pass variables, and some function macros that do the same.

The most natural way to detect recursion is to create a directed call graph by looking at each a node function when the edge goes over to all the other functions that it calls. If the graph has any cycles, then we have recursion.

The regular expression for calling function calls is trivial, but I also need to know which function called the call.

PyCParser was nice, but it complains about a lot of things, such as variables that are not defined or typedefs, where the source type is not defined or not defined in another file that is completely irrelevant to my use case. The project uses a custom control system dependencies, so some of them are included and are automatically added, so I need to PyCParser did not care about anything other than the nodes FuncCalland FuncDefand I do not think there is a way to limit myself parsing process.

I would prefer not to use the parser, since I definitely do not have time to learn how to do this in python, and then implement the solution.

, C? , (, ) ? .

python .

+3
1

objdump , ?

test1.c:

extern void test2();

void test1()
{
   test2();
}

test2.c:

extern void test1();

void test2()
{
   test1();
}


int main()
{
   test2();
}

:

gcc -g test1.c test2.c -o myprog

objdump -d myprog > myprog.asm

, . , :

00401630 <_test1>:
  401630:   55                      push   %ebp
  401631:   89 e5                   mov    %esp,%ebp
  401633:   83 ec 08                sub    $0x8,%esp
  401636:   e8 05 00 00 00          call   401640 <_test2>
  40163b:   c9                      leave  
  40163c:   c3                      ret    
  40163d:   90                      nop
  40163e:   90                      nop
  40163f:   90                      nop

00401640 <_test2>:
  401640:   55                      push   %ebp
  401641:   89 e5                   mov    %esp,%ebp
  401643:   83 ec 08                sub    $0x8,%esp
  401646:   e8 e5 ff ff ff          call   401630 <_test1>
  40164b:   c9                      leave  
  40164c:   c3                      ret    

python = > :

import re
import collections

calldict = collections.defaultdict(set)

callre = re.compile(".*\scall\s+.*<(.*)>")
funcre = re.compile("[0-9a-f]+\s<(.*)>:")

current_function = ""

with open("myprog.asm") as f:
    for l in f:
        m = funcre.match(l)
        if m:
            current_function = m.group(1)
        else:
            m = callre.search(l)
            if m:
                called = m.group(1)
                calldict[current_function].add(called)

, "-" , :

for function,called_set in calldict.items():
    for called in called_set:
        callset = calldict.get(called)
        if callset and function in callset:
            print(function,called)

:

_test2 _test1
_test1 _test2

/asm callcatcher C ( : , , - , )

+4

Source: https://habr.com/ru/post/1695412/


All Articles