A more accurate and elegant solution (which I called the Main solution ) is as follows:
Main decision
char *internalRepeat(char *s, int n, size_t total) { return (n > 0) ? strcat(internalRepeat(s, n - 1, total + strlen(s)), s) : strcpy(malloc(total + 1), ""); } char *repeat(char *s, int n) { return internalRepeat(s, n, 0); }
This is the beauty of recursion. The key to this solution uses recursion to gradually build the length of the result. The total
parameter does this (not including the NUL terminator). When the recursion completes, the result buffer is allocated once (including the NUL terminator), and then we use recursion to add each copy of s
to the result. The main solution works as follows:
- Returns a zero-length string for any number of repetitions of an empty string.
- Returns a zero-length string for zero or negative iterations of a non-empty string.
- Returns a string without zero length for a nonzero positive repeat number on a non-empty string.
If you create a program based on the above functions, the following statements:
printf("Repeat \"\" 0 times: [%s]\n", repeat("", 0)); printf("Repeat \"\" 3 times: [%s]\n", repeat("", 3)); printf("Repeat \"abcde\" 0 times: [%s]\n", repeat("abcde", 0)); printf("Repeat \"abcde\" 1 times: [%s]\n", repeat("abcde", 1)); printf("Repeat \"abcde\" 4 times: [%s]\n", repeat("abcde", 4));
will produce the following result:
Repeat "" 0 times: [] Repeat "" 3 times: [] Repeat "abcde" 0 times: [] Repeat "abcde" 1 times: [abcde] Repeat "abcde" 4 times: [abcdeabcdeabcdeabcde]
EDIT: an optimized solution. Read on if you are interested in optimization methods.
All other sentences here are mostly executed in O (n ^ 2) and allocate memory at each iteration. Although the Basic Solution is elegant, it uses only one malloc()
and accepts only two operators, surprisingly the Basic Solution also has an O (n ^ 2) runtime . This makes it very inefficient if the string s
long and means that the basic solution is no more efficient than any other sentence here.
Optimized Solution
The following is an optimal solution to this problem, which actually runs in O (n) :
char *internalRepeat(char *s, int n, size_t total, size_t len) { return (n > 0) ? strcpy(internalRepeat(s, n - 1, total, len), s) + len : strcpy(malloc(total + 1), ""); } char *repeat(char *s, int n) { int len = strlen(s); return internalRepeat(s, n, n * len, len) - (n * len); }
As you can see, it now has three statements and uses another len
parameter to cache length s
. It recursively uses len
to calculate the position in the result buffer where the n
copy of s
will be placed, so we can replace strcat()
with strcpy()
each time s
to the result. This gives the actual runtime of O (n), not O (n ^ 2).
What is the difference between basic and optimized solutions?
All other solutions used strcat()
at least n
times in line s
to add n
copies of s
to the result. The problem here is that the strcat()
implementation hides inefficiency. Internally strcat()
can be thought of as:
strcat = strlen + strcpy
ie, when adding, you first need to find the end of the line you are adding before you can make the application. This hidden overhead means that actually when creating n
copies of a string, it is necessary to check the length of n
and n
physical copy operations. However, the real problem is that for every copy of s
that we add, our result increases. This means that each subsequent length check inside strcat()
on the result also increases. If we compare the two solutions using the βnumber of times we need to scan or copy s
β as our base for comparison, we can see where the difference lies in the two solutions.
For n
copies of string s
main solution is as follows:
strlen's/iteration: 2 strcpy's/iteration: 1 Iteration | Init | 1 | 2 | 3 | 4 | ... | n | Total | ----------+------+---+---+---+---+-----+---+------------+ Scan "s" | 0 | 1 | 2 | 3 | 4 | ... | n | (n+1)(n/2) | Copy "s" | 0 | 1 | 1 | 1 | 1 | ... | 1 | n |
whereas the Optimized solution works as follows:
strlen's/iteration: 0 strcpy's/iteration: 1 Iteration | Init | 1 | 2 | 3 | 4 | ... | n | Total | ----------+------+---+---+---+---+-----+---+------------+ Scan "s" | 1 | 0 | 0 | 0 | 0 | ... | 0 | 1 | Copy "s" | 0 | 1 | 1 | 1 | 1 | ... | 1 | n |
As you can see from the table, the main solution performs (n ^ 2 + n) / 2 scans of our string due to the built-in length check in strcat()
, whereas the Optimized solution always does (n + 1) scans. This is why the main solution (and any other strcat()
runs in O (n ^ 2), while the Optimized solution runs in O (n).
How does O (n) compare with O (n ^ 2) in real life?
Runtime makes a huge difference when using large strings. As an example, let's take the s
1 MB line that we want to create 1000 copies (== 1 GB). If we have a 1 GHz processor that can scan or copy 1 byte / cycle, then 1000 copies of s
will be generated as follows:
Note: n is taken from the above performance tables and represents one s scan.
Basic: (n + 1) * (n / 2) + n = (n ^ 2) / 2 + (3n / 2) = (10^3 ^ 2) / 2 + (3 * 10^3) / 2 = (5 * 10^5) + (1.5 * 10^2) = ~(5 * 10^5) (scans of "s") = ~(5 * 10^5 * 10^6) (bytes scanned/copied) = ~500 seconds (@1GHz, 8 mins 20 secs). Optimised: (n + 1) = 10^3 + 1 = ~10^3 (scans of "s") = ~10^3 * 10^6 (bytes scanned/copied) = 1 second (@1Ghz)
As you can see, an optimized solution that completes almost instantly destroys the main solution, which takes almost 10 minutes. However, if you think that making the string s
less will help, the following result will horrify you. Again, on a machine with a frequency of 1 GHz, which processes 1 byte / cycle, we take s
as 1 KB (1,000 times less) and make 1,000,000 copies (total == 1 GB, as before). This gives:
Basic: (n + 1) * (n / 2) + n = (n ^ 2) / 2 + (3n / 2) = (10^6 ^ 2) / 2 + (3 * 10^6) / 2 = (5 * 10^11) + (1.5 * 10^5) = ~(5 * 10^11) (scans of "s") = ~(5 * 10^11 * 10^3) (bytes scanned/copied) = ~50,000 seconds (@1GHz, 833 mins) = 13hrs, 53mins, 20 secs Optimised: (n + 1) = 10^6 + 1 = ~10^6 (scans of "s") = ~10^6 * 10^3 (bytes scanned/copied) = 1 second (@1Ghz)
This is really an amazing difference. The optimized solution is executed at the same time as before, since the total amount of recorded data is the same. However, Basic Solution comes in half a day , creating the result. This is the difference in runtime between O (n) and O (n ^ 2).