The complexity of the memset function in C

I discussed a code snippet with some friends, and we discussed the use of the memset function in C, which is the order in Big-O notation for this function if we initialize an array of size N?

+6
source share
4 answers

In a system where you have direct access to page tables, and they are stored hierarchically, memset can be implemented in O(log n) by replacing the entire mapping of virtual addresses with copy links to write the only page filled with this byte. Please note that if you intend to make any future modifications to the object, the normal cost of O(n) memset will simply be delayed until the page fails to create copies of individual copies of the pages when they change.

+12
source

You asked about complexity, but you probably intended to ask about performance.

The complexity, denoted by O (n), is a concept regarding how the number of operations in an algorithm is forced to grow as the size of the problem increases. O (n) means that a certain number of steps must be performed proportional to the size of the input. He does not say what proportion it is. memset - O (n). O (n 2 ) means that a certain number of steps proportional to n 2 must be performed. memset is not O (n 2 ), since setting 2n bytes takes twice as much work as n bytes, and not four times as much work.

You are most likely interested in memset performance because the library version of memset is much faster than the C version you could write.

The library version is much faster because it uses specialized instructions. The most common modern processors have instructions that allow them to write 16 bytes to memory in a single instruction. Library developers write critical functions, such as memset in assembly language or something close to it, so they have access to all of these instructions.

When you write in C, it is difficult for the compiler to use these instructions. For example, a pointer to the memory you are installing may not coincide with a multiple of 16 bytes. The authors of memset will write code that checks the pointer and branches for different code for each case in order to set individual bytes separately and then align the pointer so that they can use quick instructions that store 16 bytes for a while. This is just one of several complications that library developers face when writing routines such as memset.

Due to these difficulties, the compiler cannot easily take your C memset implementation and turn it into fast code that experts write. When the compiler sees a loop in C code that writes one byte at a time, it usually generates an assembly language that writes one byte at a time. Optimizers are getting smarter, but complications limit how much they are allowed to do and how much they can do without generating a lot of code to handle cases that can rarely occur.

+11
source

Complexity O (n). This is the main material.

+1
source

Some C libraries provide vectorized versions of memset() . If your compiler does not perform automatic vectorization and loop unrolling, your for loop will be slower than the vector memset() . Vectorized or not, memset() limited by the memory bandwidth, and the minimum time is proportional to the size of the array divided by the memory bandwidth, i.e. This is an O (n) operation because the memory bandwidth is constant.

On NUMA machines, memsetting of very large arrays can be streamed to speed up the order of the number of NUMA nodes. See this answer for some tests.

+1
source

Source: https://habr.com/ru/post/921368/


All Articles