Word Count in a Document

I have a directory in which I have 1000 txt.files. I want to know for each word how many times this happens in the document 1000. So to speak, even the word "cow" happened 100 times in X, it will still be considered one. If this happened in another document, it increases by one. Thus, a maximum of 1000 if a “cow” appears in each document. How to do this in a simple way without using any other external library. That's what i still have

private Hashtable<String, Integer> getAllWordCount() private Hashtable<String, Integer> getAllWordCount() { Hashtable<String, Integer> result = new Hashtable<String, Integer>(); HashSet<String> words = new HashSet<String>(); try { for (int j = 0; j < fileDirectory.length; j++){ File theDirectory = new File(fileDirectory[j]); File[] children = theDirectory.listFiles(); for (int i = 0; i < children.length; i++){ Scanner scanner = new Scanner(new FileReader(children[i])); while (scanner.hasNext()){ String text = scanner.next().replaceAll("[^A-Za-z0-9]", ""); if (words.contains(text) == false){ if (result.get(text) == null) result.put(text, 1); else result.put(text, result.get(text) + 1); words.add(text); } } } words.clear(); } } catch (IOException e) { // TODO Auto-generated catch block e.printStackTrace(); } System.out.println(result.size()); return result; } 
+4
source share
2 answers

You will also need a HashSet<String> , in which you save each unique word that you read from the current file.

Then after each word you read, you should check if it is in the set, if it is not, increase the corresponding value on the result card (or add a new record if it was empty, as you already did) and add the word to the set.

Remember to reset the set when you start reading a new file.

+2
source

how about this?

 private Hashtable<String, Integer> getAllWordCount() { Hashtable<String, Integer> result = new Hashtable<String, Integer>(); HashSet<String> words = new HashSet<String>(); try { for (int j = 0; j < fileDirectory.length; j++){ File theDirectory = new File(fileDirectory[j]); File[] children = theDirectory.listFiles(); for (int i = 0; i < children.length; i++){ Scanner scanner = new Scanner(new FileReader(children[i])); while (scanner.hasNext()){ String text = scanner.next().replaceAll("[^A-Za-z0-9]", ""); words.add(text); } for (String word : words) { Integer count = result.get(word) if (result.get(word) == null) { result.put(word, 1); } else { result.put(word, result.get(word) + 1); } } words.clear(); } } } catch (IOException e) { // TODO Auto-generated catch block e.printStackTrace(); } System.out.println(result.size()); return result; } 
0
source

Source: https://habr.com/ru/post/1343093/


All Articles