What's the best way to get a ton of small pieces of data synchronized between your Mac App and the Internet?

I am considering MongoDB right now. That is why the goal is clear here what should happen: In my Finch application (finchformac.com for details) I have thousands and thousands of entries per day for each user, in which window they were opened, the time they opened, the time they closed and the tag if they choose one for this. I need this data for online backup so that it can sync with other Mac computers, etc. I also need to be able to derive graphs from my data, which means that some complex queries fall into hundreds of thousands of records.

At the moment, I tried to use Ruby / Rails / Mongoid with the JSON parser on the application side, sending data in increments of 10,000 records at a time, the data is processed in other collections with the background mapreduce task. But it all seems to be blocked and ultimately too slow. What recommendations (if anyone has) have how to do this?

+4
source share
1 answer

You have a difficult problem, which means you need to break it down into smaller, more easily solved problems.

Problems (as I see):

  • You have an application that collects data. You just need to store this data somewhere locally until it is synchronized with the server.
  • You got the data on the server, and now you need to put it into the database fast enough so that it does not slow down.
  • You must report this data and it sounds complicated and complicated.

You probably want to write this as some kind of API, for simplicity (and since you have many backup processing cycles on clients), you will want these pieces of data processed on the client side to be ready for JSON import into the database . When you have JSON you don't need Mongoid (you just throw JSON into the database directly). Also, you probably don't need rails because you are just building a simple API, so stick with Rack or Sinatra only (possibly using something like Grape ).

Now you need to solve everything: "it all seems to be blocking and ultimately too slow." We have already removed Mongoid (so there is no need to convert from JSON -> Ruby Objects -> JSON) and Rails. Before we move on to creating MapReduce from this data, you need to ensure that it is loaded into the database quickly enough. Most likely, you should architect all this so that your MapReduce supports your reporting functions. To synchronize data, you do not need to do anything except send JSON. If your data is not written to your database fast enough, you should consider facing your data set . This will probably be done using some user key, but you know your data schema better than me. You need to select a border key to synchronize multiple users at the same time, they are likely to use different servers.

Once you solve problems 1 and 2, you will need to work on your reporting. This is probably supported by your MapReduce functions inside Mongo. My first comment on this part is to make sure you are using at least Mongo 2.0. In this release, 10gen accelerated MapReduce (my tests show that it is significantly faster than 1.8). In addition, you can achieve further growth by sharding and directing reads to secondary servers in your Replica set (are you using a replica set?). If this still does not work, consider structuring your schema to support your reporting functions. This allows you to use more cycles for your clients to work, rather than loading servers. But this optimization should be left until you have proved that conventional approaches will not work.

I hope the wall of text helps a bit. Good luck

+1
source

Source: https://habr.com/ru/post/1398435/


All Articles