Integration of Cassandra and Pig - is it possible to howow?

Question

Integration of Cassandra and Pig - is it possible to howow?

I am trying to create a trial group of cassandra + pig. The cassandra wiki page sounds like you need chaos to integrate with pigs.

but the readme in cassandra-src / contrib / pig makes it sound like you can run pigs on cassandra without chaos.

If hasoop is optional, what do you lose without using it?

+4

java cassandra hadoop apache-pig

marathon Jan 11 '12 at 2:30

source share

2 answers

It's not obligatory. Cassandra has its own implementation of pig LoadFunc and storeFunc, which allow u to request and store.

Hadoop and Cassandra are different in many ways. It is hard to say what you are losing, not knowing what exactly you are trying to accomplish.

-1

ligerdave Jan 11 '12 at 2:59

source share

nickmbailey · Accepted Answer · 2012-01-11T05:13:29+0000

Hadoop is optional only for testing. To do anything on any scale, you will also need hadoop.

Running without hadoop means that you are using the pig locally. This basically means that all the data is processed by the same pig process in which you work. This works great with single node and sample data.

When working with any significant amount of data or multiple machines, you want to run pigs in chaos mode. By running debugger control tracks on your cassandra nodes, you can take advantage of benefits that reduce the cost of the card by providing workload balancing and using data localization to reduce network transfer.

Integration of Cassandra and Pig - is it possible to howow?

More articles: