Visualize large datasets with Hadoop

I'm looking for a framework, a combination of frameworks, best practices, or a tutorial on visualizing large datasets using Hadoop.

I am not looking for an infrastructure to visualize the mechanics of running Hadoop jobs or managing disk space on Hadoop. I am looking for an approach or guide for visualizing data contained in HDFS using graphs and charts, etc.

For example, let's say I have a set of data points stored in several files in HDFS, and I would like to show a histogram of data. Am I the only option to write a custom map / reduce task that would try to figure out which points fall into the bucket, write the resulting data to a file, and then use the graphics library to visualize this?

Do I need to deploy a custom solution, or is there anyone else doing this? I am trying to find online, but I could not find something that is directly related to this.

thanks for the help

+4
source share
1 answer

We do something similar in Datameer . Files will receive a few more processing steps to get to our visualizations, but we run it initially on Hadoop so that the files are not far away.

+1
source

Source: https://habr.com/ru/post/1439850/


All Articles