Hello,
I am currently working on word prediction in Java. For this, I use a model based on NGram, but I have memory problems ...
For the first time I had such a model:
public class NGram implements Serializable { private static final long serialVersionUID = 1L; private transient int count; private int id; private NGram next; public NGram(int idP) { this.id = idP; } }
But it requires a lot of memory, so I thought that I needed optimization, and I thought that if I "greet the world" and "hello people", instead of getting two ngrams, I could keep in one that save " Hi, and then there are two possibilities: "people" and "peace."
To be more clear, this is my new model:
public class BNGram implements Serializable { private static final long serialVersionUID = 1L; private int id; private HashMap<Integer,BNGram> next; private int count = 1; public BNGram(int idP) { this.id = idP; this.next = new HashMap<Integer, BNGram>(); } }
But it seems that my second model consumes twice as much memory ... I think this is due to the HashMap, but I can not reduce it? I tried to use various Map implementations, such as Trove or others, but this does not change anything.
To give you an idea, for text with 9 MB with 57818 different words (different, but this is not the total number of words), after generating NGram, my javaw process consumes 1.2 GB of memory ... If I save it using GZIPOutputStream, it takes about 18 MB of disk space.
So my question is: how can I use less memory ? Can I do something with compression (like Serialization). I need to add this to another application, so I need to reduce memory usage to ...
Thank you very much and sorry for my bad english ...
Ziath