How to get function declaration or definition using regex

I want to get only prototypes of functions, for example

int my_func(char, int, float) void my_func1(void) my_func2() 

from C files using regex and python.

Here is my regex format: ".*\(.*|[\r\n]\)\n"

+4
source share
7 answers

This is a handy script that I wrote for such tasks, but it will not give function types. This is only for function names and argument list.

 # Exctract routine signatures from a C++ module import re def loadtxt(filename): "Load text file into a string. I let FILE exceptions to pass." f = open(filename) txt = ''.join(f.readlines()) f.close() return txt # regex group1, name group2, arguments group3 rproc = r"((?<=[\s:~])(\w+)\s*\(([\w\s,<>\[\].=&':/*]*?)\)\s*(const)?\s*(?={))" code = loadtxt('your file name here') cppwords = ['if', 'while', 'do', 'for', 'switch'] procs = [(i.group(2), i.group(3)) for i in re.finditer(rproc, code) \ if i.group(2) not in cppwords] for i in procs: print i[0] + '(' + i[1] + ')' 
+6
source

See if your C compiler has the ability to output a file of only prototypes of what it compiles. For gcc he -aux-info FILENAME

+2
source

I think regex is not the best solution in your case. There are many pitfalls such as comments, text in a line, etc., but if your function prototypes have a common style:

 type fun_name(args); 

then \w+ \w+\(.*\); should work in most cases:

 mn> egrep "\w+ \w+\(.*\);" *.h md5.h:extern bool md5_hash(const void *buff, size_t len, char *hexsum); md5file.h:int check_md5files(const char *filewithsums, const char *filemd5sum); 
+1
source

I think this should be done:

 r"^\s*[\w_][\w\d_]*\s*.*\s*[\w_][\w\d_]*\s*\(.*\)\s*$" 

which will be decomposed into:

 string begin: ^ any number of whitespaces (including none): \s* return type: - start with letter or _: [\w_] - continue with any letter, digit or _: [\w\d_]* any number of whitespaces: \s* any number of any characters (for allow pointers, arrays and so on, could be replaced with more detailed checking): .* any number of whitespaces: \s* function name: - start with letter or _: [\w_] - continue with any letter, digit or _: [\w\d_]* any number of whitespaces: \s* open arguments list: \( arguments (allow none): .* close arguments list: \) any number of whitespaces: \s* string end: $ 

This is not entirely correct for matching all possible combinations, but should work in more cases. If you want it to be more accurate, just let me know.

EDIT: Disclaimer - I'm completely new to both Python and Regex, so please be lenient;)

+1
source

There are many bugs trying to โ€œparseโ€ C code (or at least extract some information) using only regular expressions; I will definitely take C for your favorite parser generator (say, Bison or any other Python alternative, as Examples are everywhere grammar C) and add actions to the appropriate rules.

Also, be sure to run the C preprocessor in the file before parsing.

+1
source

The regular expression below also addresses the definition of destructor or const functions:

 ^\s*\~{0,1}[\w_][\w\d_]*\s*.*\s*[\w_][\w\d_]*\s*\(.*\)\s*(const){0,1}$ 
0
source

I built on Nick Dandoulakis an answer for a similar use case. I wanted to find the definition of the socket function in glibc. This finds a bunch of functions with a โ€œsocketโ€ in the name, but socket not found, emphasizing what many others said: there may be better ways to extract this information, such as tools provided by compilers.

 # find_functions.py # # Extract routine signatures from a C++ module import re import sys def loadtxt(filename): # Load text file into a string. Ignore FILE exceptions. f = open(filename) txt = ''.join(f.readlines()) f.close() return txt # regex group1, name group2, arguments group3 rproc = r"((?<=[\s:~])(\w+)\s*\(([\w\s,<>\[\].=&':/*]*?)\)\s*(const)?\s*(?={))" file = sys.argv[1] code = loadtxt(file) cppwords = ['if', 'while', 'do', 'for', 'switch'] procs = [(i.group(1)) for i in re.finditer(rproc, code) \ if i.group(2) not in cppwords] for i in procs: print file + ": " + i 

Then

 $ cd glibc $ find . -name "*.c" -print0 | xargs -0 -n 1 python find_functions.py | grep ':.*socket' ./hurd/hurdsock.c: _hurd_socket_server (int domain, int dead) ./manual/examples/mkfsock.c: make_named_socket (const char *filename) ./manual/examples/mkisock.c: make_socket (uint16_t port) ./nscd/connections.c: close_sockets (void) ./nscd/nscd.c: nscd_open_socket (void) ./nscd/nscd_helper.c: wait_on_socket (int sock, long int usectmo) ./nscd/nscd_helper.c: open_socket (request_type type, const char *key, size_t keylen) ./nscd/nscd_helper.c: __nscd_open_socket (const char *key, size_t keylen, request_type type, ./socket/socket.c: __socket (int domain, int type, int protocol) ./socket/socketpair.c: socketpair (int domain, int type, int protocol, int fds[2]) ./sunrpc/key_call.c: key_call_socket (u_long proc, xdrproc_t xdr_arg, char *arg, ./sunrpc/pm_getport.c: __get_socket (struct sockaddr_in *saddr) ./sysdeps/mach/hurd/socket.c: __socket (int domain, int type, int protocol) ./sysdeps/mach/hurd/socketpair.c: __socketpair (int domain, int type, int protocol, int fds[2]) ./sysdeps/unix/sysv/linux/socket.c: __socket (int fd, int type, int domain) ./sysdeps/unix/sysv/linux/socketpair.c: __socketpair (int domain, int type, int protocol, int sv[2]) 

In my case, this and this can help me, besides it looks like I will need to read the assembly code in order to reuse the strategy described there.

0
source

Source: https://habr.com/ru/post/1285648/


All Articles