Fast loading of elements into an array / list with a fixed index without duplicates

My requirement is to inject rows into the array that are not in the array. I also need to maintain fixed indexes, as this array will be used with a different data structure with a one-to-one relationship with each index. I am currently using the ArrayList class and checking with the indexOf () method to check if it exists first, if not, add it to the list using the add () method with one argument. I am not familiar with classes in java and therefore I can’t understand how I can implement it using HashMap or something else (trie or else), which will speed up the loading process.

Does sequential search indexOf () in ArrayList ? My task is to reduce processing time when loading words into an array without inserting duplicates and maintain a fixed index of elements. If the word under test is already in the array, then the index into which it is already inserted is required, since this index is necessary for indexing into some other structure and performing some processing. Any suggestions for improving this process?

UPDATE

There is an array, I have some documents, from where I need to scan every word and find unique words in the documents. But also I need to count the number of duplicates. In other words, I need to calculate the number of frequencies of unique terms found in documents. I support ArrayList<Integer[]> time frequency (number of terms x number of documents). I extract one word and then check if it is in the list of words using the indexOf () method. If it is not in the list of words, I insert the word in the list and select a new line in the 2d array ( Array<Integer[]> ), and then set the counter of the term element in the 2d array to 1. If the word is already in the word array, then I I use the index of the word in the array for indexing in the matrix row Array<Integer[]> and use the current number of the processed document to access the cell and increase the score.

My question is to reduce the processing time of indexOf () for every word that I am currently using. I need to get the index of a word in an array of words, if it is already there, and if it is not there, I need to insert it into the array dynamically.

Code example

 import java.io.*; import java.util.ArrayList; import static java.lang.Math.log; class DocumentRepresentation { private String dirPath; private ArrayList<String> fileNameVector; private ArrayList<String> termVector; private ArrayList<Integer[]> tf; /* store it in natural 2d array */ private Integer df[]; /* do normal 1d array */ private Double idf[]; /* do normal 1d array */ private Double tfIdf[][]; /* do normal 2d array */ DocumentRepresentation (String dirPath) { this.dirPath = dirPath; fileNameVector = new ArrayList<String> (); termVector = new ArrayList<String> (); tf = new ArrayList<Integer[]> (); } /* Later sepatere the internal works */ public int start () { /* Load the files, and populate the fileNameVector string */ File fileDir = new File (dirPath); int fileCount = 0; int index; if (fileDir.isDirectory () == false) { return -1; } File fileList[] = fileDir.listFiles (); for (int i=0; i<fileList.length; i++) { if (fileList[i].isFile () == true) { fileNameVector.add (fileList[i].getName ()); // System.out.print ("File Name " + (i + 1) + ": " + fileList[i].getName () + "\n"); } } fileCount = fileNameVector.size (); for (int i=0;i<fileNameVector.size (); i++) { System.out.print ("Name " + (i+1) + ": " + fileNameVector.get (i) + "\n"); } /* Bind the files with a buffered reader */ BufferedReader fileReaderVector[] = new BufferedReader [fileCount]; for (int i=0; i<fileCount; i++) { try { fileReaderVector[i] = new BufferedReader (new FileReader (fileList[i])); } /* Not handled */ catch (FileNotFoundException e) { System.out.println (e); } } /* Scan the term frequencies in the tf 2d array */ for (int i=0; i<fileCount; i++) { String line; try { /*** THIS IS THE PLACE OF MY QUESTION **/ while ((line = fileReaderVector[i].readLine ()) != null) { String words[] = line.split ("[\\W]"); for (int j=0;j<words.length;j++) { if ((index = termVector.indexOf (words[j])) != -1) { tf.get (index)[i]++; /* increase the tf count */ } else { termVector.add (words[j]); Integer temp[] = new Integer [fileCount]; for (int k=0; k<fileCount; k++) { temp[k] = new Integer (0); } temp[i] = 1; tf.add (temp); index = termVector.indexOf (words[j]); } System.out.println (words[j]); } } } /* Not handled */ catch (IOException e) { System.out.println (e); } } return 0; } } class DocumentRepresentationTest { public static void main (String args[]) { DocumentRepresentation docSet = new DocumentRepresentation (args[0]); docSet.start (); System.out.print ("\n"); } } 

Note: code is disabled to focus on the issue.

+4
source share
3 answers

LinkedHashMap can meet all your requirements right away, with good performance features.

Keys will be your positions, and values ​​will be indices. If you insert elements in ascending index order, then iterating over the map also returns elements in ascending index order.

Here is a sample code:

 LinkedHashMap<Item,Integer> map = new LinkedHashMap<Item,Integer>(); 

Get item index:

 Integer index = map.get(item); if (index != null) { // already in the map; use `index' } else { // not in the map } 

Add item with the following index:

 if (!map.containsKey(item)) { map.put(item, map.size()); } 

Iterate over the elements in ascending index order:

 for (Entry<Item,Integer> e : map.entrySet()) { Item item = e.getKey(); int index = e.getValue(); ... } 

What this cannot do effectively is to get the value at a specific index, but my reading of your question indicates that you really don't need it.

+5
source

ArrayList.indexOf() does a linear search, so O (n).

If you really need to go to ArrayList, you can create two collections: ArrayList and HashSet. Add and remove items to both collections. Before adding, call HashSet.contains() to see if this element exists.

Encapsulate your ArrayList and HashSet in your class.

+1
source

If you want to leave ArrayList , you can have a HashSet as support, at the same time the cost of dual memory.

You can use HashSet.add() , if return is true, you can add an element to ArrayList too

0
source

Source: https://habr.com/ru/post/1394769/


All Articles