We created a system that analyzes some data and displays some results in plain English (i.e. no diagrams, etc.). The current implementation relies on a lot of patterns and some randomization to maximize the variety of text.
We would like to switch to something more advanced, hoping that the resulting text will be less repetitive and sound less robotic. I searched a lot on google, but I cannot find anything specific to start with. Any ideas?
EDIT: The data passed to the NLG engine are in JSON format. The following is an example of web analytics data. A json file can contain, for example, a metric (for example, visits), a value for the last X days, the last value expected or not, and what dimensions (for example, countries or distribution channels) influenced its change.
The current implementation may give something like this:
Total visits to the UK mainly from the ABC email campaign reached 10K (+ 20% DoD) and were 10% higher than expected. Users basically landed on the XXX page, while the increase was consistent across devices.
We strive to find a way to be less dependent on templates, the sound is even more natural and to increase the vocabulary.
source
share