API Design for Array Functions

I am developing an API in Java for a set of numerical algorithms that act on double arrays (for real-time financial statistics, as it happens). For performance reasons, the API should work with primitive arrays, so List<Double> , etc. Not an option.

A typical use case may be an algorithm object that takes two input arrays and returns an output array containing the result calculated from two inputs.

I would like to establish consistent conventions on how array parameters are used in the API, in particular:

  • Should I include offsets with all functions so that users can act on parts of a larger array, for example, someFunction(double[] input, int inputOffset, int length)
  • If functions need both input and output parameters, should the input or output be included in the parameter list?
  • Should the caller allocate the output array and pass it as a parameter (which could potentially be reused), or should the function create and return the output array each time it is called?

The goal is to achieve a balance of efficiency, simplicity for API users and consistency both in the API and with established agreements.

Clearly there are many options, so what is the best API design?

+6
source share
6 answers

So that really sounds like three questions, so here are my opinions.

Of course, this is very subjective - therefore, your mileage may vary:

  • Yes. Always include offset and length. If most of the uses for a particular function do not need these parameters, overload the function so that input and length are not required.

  • For this, I will follow the standard used by arraycopy :

    arraycopy (Object src, int srcPos, Object dest, int destPos, int length)

  • The performance difference here will be negligible if the caller does not re-call your service functions. If this is just one thing, there should be no difference. If they are called repeatedly, you must send the allocated array to the caller.

+2
source
  • If you do this, also indicate the default option (starts at 0, full length).
  • I think most users expect a 2nd release. However, if you can use varargs, this can change your mind.
  • I like the transfer of the caller in the output array, but with the null option, i.e. the method will highlight.

When developing a vararg comment, let's say you have a way to add two arrays. If you put the arg output array as the 1st argument and 2 input arrays at the end, it is trivial to extend the method to add N arrays.

Developing at # 3, allowing subscribers to pass through the output array, sometimes it is more efficient. And even if the gain is negligible, your users dealing with primitive arrays probably came from C or FORTRAN, and I think that the gain will be big and will complain if you do not allow them to be “efficient” :-)

+2
source

Assuming you are working with arrays small enough to be placed on the stack or in Eden, distribution is very fast. Therefore, there is no harm in the fact that functions allocate their own arrays to return results. This is a big win for readability.

I would suggest starting work with your functions on entire arrays and introducing a function call function only with a piece of the array, only if you find out that it is useful.

+2
source

In an API design that demonstrates many features, the main thing is its internal consistency. Everything else comes as a distant second.

The decision on whether you pass index / length pairs depends on how you use the API. If you expect users to write a series of method calls that take or put data in different segments of the same array, as in System.arrayCopy , you need index / length pairs. Otherwise, it’s too much.

Entering or exiting at first is your decision, but as soon as you make it, stick to it by all methods with similar signatures.

Passing the output buffer is a reasonable option only if the buffer is reused in the client. Otherwise, he wasted the effort to create and maintain an additional set of API methods. Of course, this decision is closely related to your choice to go with pairs of indexes / lengths: if you take the index and the length, you should also take the output buffer.

+1
source

I think the design of the API is largely subjective and / or should greatly influence the “use cases” of the API. What your API uses, on the other hand, is completely dependent on client code.

Having said all this, I personally will take advantage of the method overload and move on to the following structure:

Method with all parameters:

void someFunction(int[] input1, int[] input2, int offset, int length, int[] output)

This is the main function. All other functions simply call this the corresponding parameters.

int[] someFunction(int[] input1, int[] input2, int offset, int length)

This calls the first function, but allocates and returns the output array on behalf of the caller.

void someFunction(int[] input1, int[] input2, int[] output)

int[] someFunction(int[] input1, int[] input2)

Note that the general strategy is to make the parameter list shorter by eliminating the “optional” parameters.

In general, I try to avoid changing the behavior of the method depending on whether the parameter (for example, the output array) is null . This can make it difficult to catch such errors. Therefore, I prefer two different calling styles - one where the output parameter is provided (and required), and one where the method returns its output.

+1
source

I would use a List<Double> , and the methods return the result as a new List :

 public List<Double> someFunction(List<Double> input) 
0
source

Source: https://habr.com/ru/post/905880/


All Articles