Compilation problem with partition / splitter

Question

Compilation problem with partition / splitter

Here is a simple code:

import std.algorithm; import std.array; import std.file; void main(string[] args) { auto t = args[1].readText() .splitter('\n') .split("---") ; }

It seems like it should work, but it will not compile. DMD 2.068.2 error with this error:

 Error: template std.algorithm.iteration.splitter cannot deduce function from argument types !()(Result, string), candidates are: ... Error: template instance std.array.split!(Result, string) error instantiating

It compiles if I insert .array before .split .

Am I missing something? Or is this a mistake? I tried to do a brief search in the error tracker but found nothing.

+5

d phobos

sigod Oct 28 '15 at 0:03

source share

1 answer

Adam D. Ruppe · Accepted Answer · 2015-10-28T01:57:41+0000

Bottom line: Such problems can often be fixed by sticking the .array call before the violation function. This gives it a buffer with enough functionality to run the algorithm.

The following is the library reasoning and a couple of other ideas you can use to implement this:

The reason this doesn't compile is due to the std.algorithm philosophy and ranges: they are as cheap as possible to take cost decisions to the upper level.

In std.algorithm (and in most well-written ranges and high-consumption algorithms), template restrictions will reject any input that does not offer what it needs for free. Similarly, transforming ranges, such as a filter, splitter, etc., return only the capabilities that they can offer at minimal cost.

Rejecting them at compile time, they force the programmer to make decisions at the highest level regarding how they want to pay these costs. You can rewrite a function to work in different ways, you can buffer it yourself using various methods to pay for expenses in front or something else that you can find that works.

So what happens with your code: readText returns an array, which is an almost complete range. (Since it returns a string made from UTF-8, it does not actually offer random access with respect to Phobos (although, confused, the language itself sees it differently, search in D forums for "autodecode") if you want to know more), since searching for a Unicode code point in a variable-length utf-8 character list requires scanning everything. Scanning is not a minimum cost, so Phobos will never try to do this unless you specifically ask for it.)

In any case, readText returns a range with a large number of functions, including scanning, which requires a splitter . Why does splitter need saving? Consider the result of promises: a range of lines starting from the last split point and continuing to the next split point. What does the implementation look like when writing this for the most general range that it can do for the cheap?

Something in this direction: firstly, save your initial position so that you can return it later. Then, using popFront , go through it until you find the split point. When this happens, return the saved range to the split point. Then popFront go through the split point and repeat the process until you have destroyed it all ( while(!input.empty) ).

So, since splitter implementation requires the ability to save starting point, it requires at least the front range (this is just a reasonable range). Andrei now feels that naming things like this is a bit silly, because there are many names, but while he was writing std.algorithm , he still believed that he had given them all the names).

Not all ranges are ranges ahead! Arrays store them as easily as returning a slice from its current position. Many numerical algorithms, too, saving them simply means saving a copy of the current state. Most transformation ranges are reasonable if the range they transform is reasonable - again all they need to do is return the current state.

... as I write this, in fact, I think your example should be glorious. And indeed, there is an overload that takes a predicate and compiles!

http://dlang.org/phobos/std_algorithm_iteration.html#.splitter.3

  import std.algorithm; import std.array; import std.stdio; void main(string[] args) { auto t = "foo\n---\nbar" .splitter('\n') .filter!(e => e.length) .splitter!(a => a == "---") ; writeln(t); }

Exit: [["foo"], ["bar"]]

Yes, it is compiled and broken into lines equal to a certain thing. Another overload .splitter("---") does not compile because this overload requires a slice function (or a narrow line that Phobos refuses to trim in general ... but knows that it really can be anyway, so the function has a special shell. see this throughout the library.)

But why does it require slicing instead of saving? Honestly, I do not know. Maybe I'm missing something too, but having an overload that works means that my concept of the algorithm is correct; it can be done like this. I find that slicing is a bit cheaper, but the save version is also quite cheap (you would count the number of items that you popped by to get to the splitter, and then return saved.take(that_count) .... maybe the reason is straightforward there: you have to iterate over the elements twice, once inside the algorithm, then again outside, and the library finds it expensive enough to raise the level. (The predicate version bypasses this, forcing your function to perform a scan, and thus Phobos no longer considers problem you know Making your own function.)

I see the logic in this. I could go both ways, although the decision to actually relive it again is still outside, but I don’t understand why it might be undesirable to do without any thoughts.

Finally, why doesn't the splitter offer indexing or slicing on its output? Why doesn't filter suggest? Why does he offer map ?

Well, this has to do with this low cost philosophy. map can offer it (provided that its input does), because map does not actually change the number of elements: the first element in the output is also the first input element, just with some function that is executed on the result. The same goes for the latter, and all the others between them.

filter changes this. Filtering odd numbers [1,2,3] gives only [2]: the length is different and 2 is now in the beginning, and not in the middle. But you cannot know where it is until you actually apply the filter — you cannot jump without buffering the result.

splitter is like a filter. It changes the placement of the elements, and the algorithm does not know where it breaks until it actually passes through the elements. That way, it can tell when you repeat, but not ahead of the iteration, so indexing will be O(n) speed - too expensive. Indexing should be extremely cheap.

In any case, now that we understand why the principle exists to allow you, the final programmer decides on such expensive things as buffering (which requires more memory than free) or additional iteration (which requires more processor time than cost -free to the algorithm) and have some idea of why the splitter needs it, thinking about its implementation, we can see how to satisfy the algorithm: we need to either use the version that eats a few more percent cycles essor and write it using our special comparison function (see example above) or suggest slicing somehow. The easiest way is to buffer the result in an array.

 import std.algorithm; import std.array; import std.file; void main(string[] args) { auto t = args[1].readText() .splitter('\n') .array // add an explicit buffering call, understanding this will cost us some memory and cpu time .split("---") ; }

You can also buffer it locally or something yourself to reduce the distribution cost, but nevertheless you do, the cost has to be paid somewhere, and Phobos prefers you as a programmer who understands the needs of your program, and if you willing to pay these costs or not, make this decision instead of paying it on your behalf without telling you.

Compilation problem with partition / splitter

More articles: