I want to turn an existing python-based implementation (data analysis tool into an event stream) into a storm topology.
During the research phase, my team and I used python pandas to develop a prototype of our tool and found that it was very useful in terms of programmer productivity. Now we want to create a storm topology that does the same, and aim to reuse our existing python modules as bolts, or at least make an informed decision about whether this is a good idea that can do this.
Any restrictions on using a python script depending on external libs like Storm Bolt on a cluster? Also, does anyone have a sense of what there would be a productive penalty for using an interpreted and non-JVM language like Python instead of Java for our bolts? The pandas library itself is designed with high performance.
thanks
source share