Is this the correct and portable way to check if 2 c-lines overlap in memory?

It may not be the most effective way, but is it correct and portable?

int are_overlapping(const char *a, const char *b) { return (a + strlen(a) == b + strlen(b)); } 

To clarify: what I'm looking for is overlapping in memory , not in actual content. For example:

 const char a[] = "string"; const char b[] = "another string"; are_overlapping(a, b); // should return 0 are_overlapping(a, a + 3); // should return 1 
+44
c
Jul 09 '13 at 23:25
source share
5 answers

Yes, your code is correct. If two lines end at the sampling location, they by definition overlap - they have the same zero delimiter. Either both lines are the same, or one substring of the other.

Everything that concerns your program is a completely clear behavior, therefore, when adopting compilers that are compliant with standards, it should be completely portable.

The corresponding bit in the standard is from 6.5.9 Equality operators (main attention):

Two pointers compare equal if and only if both are null pointers, both are pointers to the same object (including a pointer to an object and a subobject at the beginning) or a function, both are pointers to one of the last elements of the same array object , or one is a pointer to one end of the end of one array object, and the other is a pointer to the beginning of another array object that occurs immediately after the first array object in the address space.

+33
Jul 09 '13 at 23:47 on
source share

Reflecting on zdan's comments on my previous post (which will probably be deleted soon), I came to the conclusion that endpoint verification is enough.

If there is any overlap, a null terminator will make the two lines no different. Let's look at some of the possibilities.

If you start with

 a 0x10000000 "Hello" and somehow add b 0x10000004 "World", 

you will have one word: HellWorld, since W will overwrite \ 0. They would end at the same endpoint.

If you somehow write to the same starting point:

 a 0x10000000 "Hello" and b 0x10000000 "Jupiter" 

You will have the word Jupiter and will have the same endpoint.

Is there a case where you can have the same endpoint and not have overlap? View.

 a = 0x1000000 "Four" and b = 0x1000004 "". 

It will also give overlap.

I can’t think that someday you will have an overlap where you will not have the corresponding endpoints - it is assumed that you write to the end of the line with zero completion.

So the short answer is: Yes, your check is enough.

+12
Jul 09 '13 at 23:47 on
source share

This probably does not apply to your use case, since your question is only about C-lines, but the code will not work if the data has embedded NUL bytes in the lines.

 char a[] = "abcd\0ABCD"; char *b = a + 5; 

In addition, your decision is straightforward and correct. It works because you use == to compare pointers and according to the standard (from C11 6.5.9 / 6)

Two pointers compare the same if and only if both are null pointers, both are pointers to the same object (including a pointer to an object and a subobject at the beginning) or a function, both are pointers to one of the last elements of the same array object , or one is a pointer to one end of the end of one array object, and the other is a pointer to the beginning of another array object that occurs immediately after the first array object in the address space.

However, relational operators are more strict (from C11 6.5.8 / 5):

When comparing two pointers, the result depends on the relative locations in the address space of the objects that it points to. If two pointers to types of objects point to the same object or both point one after the last element of the same array object, they compare equal ones. If the objects pointed to are members of the same aggregate object, pointers to structure elements announced later are compared more than pointers to elements declared earlier in the structure, and pointers to array elements with large index values ​​are compared more than pointers to elements of the same array with lower index values. All pointers to members of the same union object are compared the same way. If the expression P points to an element of an array object, and the expression Q points to the last element of the same array object, the expression of the pointer Q + 1 is compared to P. In all other cases, the behavior is undefined.

The last sentence is a kicker.

Some of them made an exception to the fact that your code can calculate the length of the overlap twice and tried to take precautions to avoid this. However, the reduction efficiency of this calculation is compared with an additional comparison of the iteration pointer, or includes undefined or implementation-defined behavior. Assuming you need a portable and compatible solution, the actual average gain is probably zero and not worth the effort.

+2
Jul 10 '13 at 2:35
source share

This solution is still the same worst performance, but optimized for hits - you don't need to parse both lines.

 char * temp_a = a; char * temp_b = b; while (*temp_a != '\0') { if (temp_a++ == b) return 1; } // check for b being an empty string if (temp_a == b) return 1; /* but if b was larger, we aren't done, so you have to try from b now */ while (*temp_b != '\0') { if (temp_b++ == a) return 1; } /* don't need the a==b check again here return 0; 

Apparently, only the single equality (unevenness) of the pointer is portable in C, so the following solutions are not portable - everything is lower than before I knew it.

Your solution is valid, but why calculate strlen on the second line? You know the start and end point of one line, just see if it is between them (inclusive). saves your pass through the second line - O (M + N) to O (M)

 char * lower_addr_string = a < b ? a : b char * higher_addr_string = a > b ? a : b length = strlen(lower_addr_string) return higher_addr_string >= lower_addr_string && higher_addr_string <= lower_addr_string + length; 

alternatively, do the parsing yourself.

 char * lower_addr_string = a < b ? a : b char * higher_addr_string = a > b ? a : b while(*lower_addr_string != '\0') { if (lower_addr_string == higher_addr_string) return 1; ++lower_addr_string; } /* check the last character */ if (lower_addr_string == higher_addr_string) return 1; return 0; 
+1
Jul 10 '13 at 1:38
source share

Yes, your check is correct, but, of course, it is not the most effective (if by "efficiency" you mean computational efficiency). The obvious intuitive inefficiency in your implementation is based on the fact that when the lines actually overlap, strlen calls will iterate over their common part twice.

A slightly different approach can be used for formal effectiveness.

 int are_overlapping(const char *a, const char *b) { if (a > b) /* or `(uintptr_t) a > (uintptr_t) b`, see note below! */ { const char *t = a; a = b; b = t; } while (a != b && *a != '\0') ++a; return a == b; } 

An important note about this version is that it performs a relational comparison of two pointers that do not guarantee pointing to the same array, which formally leads to undefined behavior. He will work in practice in a system with a flat memory model, but may attract criticism from a reviewer of pedantic code. To get around this problem formally, you can convert pointers to uintptr_t before doing relational comparisons. Thus, undefined behavior is converted to behavior defined by the implementation, with the correct semantics for our purposes in most (if not all) traditional implementations with a flat memory model.

This approach is free from the “double counting” problem: it analyzes the non-overlapping part of the string that is “earlier” in memory. Of course, in practice, the benefits of this approach may not exist. This will depend on the quality of your strlen implementation as well as on the properties of the actual input.

For example, in this situation

 const char *str = "Very very very long string, say 64K characters long......"; are_overlapped(str, str + 1); 

my version will detect overlap much faster than yours. My version will do this in 1 iteration of the loop, while your version will spend 2 * 64K iterations (assuming a naive strlen implementation).

If you decide to dive into the area of ​​questionable comparisons of pointers, the above idea can be redefined as

 int are_overlapping(const char *a, const char *b) { if (a > b) { const char *t = a; a = b; b = t; } return b <= a + strlen(a); } 

This implementation does not perform additional pointer comparisons at each iteration. The price we pay for this is that it always iterates to the end of one of the lines, rather than ending earlier. However, it is still more efficient than your implementation, as it only calls strlen once.

+1
Jul 10 '13 at 1:43
source share



All Articles