I found a fragment similar to this in some (C ++) code that I am preparing for a 64-bit port.
int n; size_t pos, npos; while((pos = find(ch, start)) != npos) { n++;
While I seriously doubt that this will actually cause a problem even in applications with intensive memory, it is worth taking a look from a theoretical point of view, since such errors can occur, which will cause problems. (Change n to short in the example above, and even small files can overflow the counter.)
Static analysis tools are useful, but they cannot detect such an error directly. (Not yet.) The counter n does not participate in the while statement at all, therefore it is not as simple as other loops (where type conversion errors lead to an error). Any tool will have to determine that the cycle will execute more than 2 31 times, but this means that it should be able to evaluate how many times the expression (pos = find(ch, start)) != npos will be evaluated as true - not a small feat ! Even if the tool can determine that the loop can execute more than 2 31 times (say, because it recognizes that the find function is working on the string), how could it know that the loop wonβt execute more than 2 64 times, overflowing the size_t value too ?
It seems obvious that the human eye is required for the final identification and correction of this kind of error, but are there schemes that give such an error, so it can be checked manually? What kind of errors do I have to keep track of?
EDIT 1: Since the short , int and long types are intrinsically problematic, such an error can be found by examining each instance of these types. However, given their ubiquity in legacy C ++ code, I'm not sure if this is practical for most software. What else gives this error? It seems that every while may exhibit some error? ( for loops, of course, are not protected from this!) How bad is such an error if we are not dealing with 16-bit types such as short ?
EDIT 2: Here is another example showing how this error appears in a for loop.
int i = 0; for (iter = c.begin(); iter != c.end(); iter++, i++) { /* ... */ }
This is essentially the same problem: loops rely on some variable that never interacts directly with a wider type. The variable may still overflow, but the compiler or tool does not detect a casting error. (Strictly speaking, they are not.)
EDIT 3: The code I'm working with is very large. (10-15 million lines of code for C ++ alone.) It is impossible to verify all this, so I am interested in identifying this problem (even if it leads to a high false positive rate) automatically.