Store a large number of images on multiple servers

I would like to know what is the best solution for storing a large number of images on multiple servers such as google, facebook.

It seems that storing in the file system is better than inside the database, but what about using noSQL DB like cassandra.

Does Google / Facebook store the same image on multiple servers for load balancing? How it works? What is the best solution?

thanks a lot

+6
source share
3 answers

There is nothing wrong with what you accept. As mentioned, there are caveats, but opportunities do exist, and many people and companies successfully store files in Apache Cassandra.

  • zjffdu / cassandra-fs is the first solution I would consider. Now it was developed 2 years ago, so I would be a little careful when working for the first time out of the box. Apache Cassandra is now in version 1.0.x with 1.1.x on the way. 2 years ago it was version 0.6.x, maybe? Within 24 months, much has changed and improved.
  • semantico / cassandra-fs a fork ... last touched 7 months ago
  • favoritas37 / cassandra-fs is another fork ... last touched 3 months ago and indicates compatibility with the Cassandra 1.0.5 branch

The principle behind this is to capture the file, split it into many pieces and save these fragments as columns in a row. When you retrieve, pull out each column, assemble the file and voila.

Cassandra Frequently Asked Questions: Large file and blog repository

... files of about 64 MB or less can be easily stored in a database without breaking them into smaller pieces ...

Lucene performance in Kassandra

... its files are divided into blocks (whose sizes are limited), where each block (see FileBlock) is stored as the column value in the corresponding row ...

You will receive more positive feedback on the Cassandra mailing list and IRC.

Finally, it has been written by people on Facebook since 2009, which should help you answer your more important basic questions: Cassandra - A decentralized structured storage system .

+4
source

Notice, I know this is an old question, I just want to confront some misconceptions about value, as I am doing this right now as a test.

Unlike what DavidB thinks, it will not cost millions - even if you run dedicated hosting equipment, it will be easy for you to be several thousand / month (BTDT, one of my clients works with an 8 node cluster of about $ 800 per month ) However, the maintenance headache you want to avoid and Cassandra on EC2 is much easier to handle.

You can easily run a significant cloud for production on EC2 for less than $ 1,000 per month, and you can have R&D clouds for less than $ 100 per month (I spend about $ 52 last month on 10 machine test clusters) . I highly recommend using TurnKey Linux to manage and provide your R&D farm, as their tools will allow you to transfer instances from your desktop to almost any virtual hosting platform in a few minutes (and vice versa). In addition, they have really poor integration with EC2.

For really serious traffic levels, Pintrest once stated that they spend between $ 15 and $ 50 / hour depending on the load on the server, autoscaling to meet traffic needs, see http://www.theregister.co.uk/2012/04/ 30 / inside_pinterest_virtual_data_center / for details

The real cost is to configure and manage your distributed Cassandra instance. Fortunately, NetFlix has just released a ton of management tools just for that. You can find them here: https://github.com/netflix - there are also a lot of interesting videos about using NetFlix AWS, in particular, moving material from Cassandra to S3 - see their blog here http://techblog.netflix.com /2012/12/videos-of-netflix-talks-at-aws-reinvent.html

+1
source

If you want to save in a "cloud" environment, it is best to use a cloud solution with resources such as Google App Engine or Amazon Web Services. You will not be able to customize your own if this is a question. It will cost millions of dollars and the resources to manage it. And yes, Google and Facebook use thousands of servers to spread their data in the clouds.

-1
source

Source: https://habr.com/ru/post/911587/


All Articles