Good practice when manipulating data in java

Question

Good practice when manipulating data in java

Is it wrong for you to manipulate data, for example:

Sorter.mergeSort(testData); //(testData is now sorted)

Or I need to create a copy of the data, and then manipulate and return like this:

  sortedData = Sorter.mergeSort(testData); // (sortedData is now sorted and testData remains unsorted)?

I have several sorting methods, and I want to be consistent in how they manipulate data. Using the insertionSort method, I can work directly with unsorted data. However, if I want to leave the unsorted data intact, then I will have to create a copy of the unsorted data in the insertionSort method and manipulate and return this (which seems redundant). On the other hand, in my mergeSort method, I need to create a copy of unsorted data anyway, so I ended up doing something that also seems redundant, since working around returns a new sortedList:

 List <Comparable> sorted = mergeSortHelper(target); target.clear(); target.addAll(sorted);`

Please let me know which one is better, thanks!

+6

java sorting

user2792010 18 sept. '13 at 15:17

source share

5 answers

Mikefhay · Answer 1 · 2013-09-18T15:23:10+0000

It depends on whether you optimize performance or functional cleanliness. Usually, functional purity is not emphasized in Java, for example Collections.Sort sorts the list that you give it (even if it is implemented by making a copy of the array first).

I would optimize for performance here, since it looks more like typical Java, and anyone who wants can always copy the collection first, for example Sorter.mergeSort(new ArrayList(testData));

StuPointerException · Answer 2 · 2013-09-18T15:26:27+0000

Best practice is to be consistent.

Personally, I prefer that my methods do not change the input parameters, as this may be unacceptable in all situations (you press the end user to create a copy if they need to keep the original order).

Thus, there are clear performance advantages for modifying input (especially for large lists). Therefore, this may be appropriate for your application.

As long as the functionality is understood by the end user, you are covered anyway!

Paul · Answer 3 · 2013-09-18T16:10:07+0000

In Java, I usually provide both options (when writing reusable utility methods):

 /** Return a sorted copy of the data from col. */ public List<T> mergeSort(Collection<T extends Comparable<T>> col); /** Sort the data in col in place. */ public void mergeSortIn(List<T extends Comparable<T>> col);

I make some assumptions about signatures and types here. However, the Java norm - or at least was * - was usually for state change. This is often dangerous, especially across the borders of the API — for example, changing the collection passed to your library by its “client” code. Minimizing the overall state and the mutable state in particular is often a sign of a well-designed application / library.

It looks like you want to reuse the same test data. To do this, I would write a method that creates test data and returns them. That way, if I need the same test data again in another test (i.e., to test the mergeSort () / insertionSort () implementations on the same data), you simply create and return it again. I usually do just that when writing unit tests (e.g. in JUnit).

In any case, if your code is a library class / method for use by other people, you should clearly document its behavior.

In addition: in the "real" code, there should be no reason to indicate that merge sorting is the implementation used. The caller should take care of what he does, not how he does it - so the name will usually not be mergeSort (), insertionSort (), etc.

(*) In some new JVM languages, there has been a conscious movement from mutable data. Clojure does not have any mutable state at all, as it is a pure functional programming language (at least in normal single-threaded application development). Scala provides a parallel set of collection libraries that do not change the state of collections. This has great advantages in multi-threaded, multi-processor applications. It is not as expensive as expected due to the smart algorithms used by the collections.

TheLostMind · Answer 4 · 2013-09-18T15:36:43+0000

In your particular case, a more efficient modification of the "actual" data. You are sorting data, it is observed that it is more efficient for working with sorted data, and not with unsorted data. So, I do not understand why you should store unsorted data. Why is a sorted array faster to process than an unsorted array?

Philipp sander · Answer 5 · 2013-09-18T15:42:22+0000

Functions must use a manipulated object. Like Arrays#sort

But immutable objects (like String) can only return “new” objects. Like String#replace

Good practice when manipulating data in java

More articles: