I want to geo-inspect my data in spark mode. For this, I use the geoIP MaxMind database.
What I want to do is initialize the geo-information base object once on each section, and then use this to search for the city associated with the IP address.
Does the spark have an initialization phase for each node, or should I check if the instance variable is undefined, and if so, initialize it before continuing? For instance. something like (this is python, but I want a scala solution):
class IPLookup(object): database = None def getCity(self, ip): if not database: self.database = self.initialise(geoipPath) ...
Of course, this requires a spark that serializes the entire object, which warns docs.
source share