Rails Architecture Considerations

I am creating a Rails site that, among other things, allows users to create their own recipe repository. Recipes are entered manually or via a link to another site (think epicurious , cooks.com , etc.). I am writing scripts that will clear the recipe from these sites, if you have a link from the user, and still (despite legal problems), this part does not give me any problems.

However, I'm not sure where to put the code that I write for these scraper scripts. My first thought was to put it in a recipe model, but it seemed to be too busy to go there; would a library or helper be more appropriate?

In addition, as I already mentioned, I create several scraper for different product websites. It seems to me that an elegant way to do this is to define an interface (or an abstract base class) that defines a set of methods for creating a recipe object based on the link, but I'm not sure which is the best approach here too. How can I build this OO relationship and where should the code go?

+4
source share
7 answers

You have two sides to this thing that are obvious. First, how will you store recipes that will be models. Obviously, models will not clean up other sites, because they have one responsibility: maintaining reliable data. Your controller (s), which initiates the scrambling and storage process, should also not contain a cleanup code (although they will call it).

In Ruby, we don’t deal with abstract classes and interfaces β€” it's ducky, so it's enough for your scrapers to implement a well-known method or set of methods β€” your scraping engines should be similar, especially in terms of the public methods that they reveal.

You put your scrapers - and here's the lame answer - anywhere. lib great, but if you want to create a plugin, that might also be a bad idea. See My question here - with the terrific answer of the famous Rails guy Yehuda Katz - for some other ideas, but overall: there is no right answer. However, there are some wrong ones.

+2
source

The scrape engine must be a standalone plugin or plugin. For dirty and fast you can put it inside lib. This is a common agreement anyway. He probably needs to implement a factory class that creates instances of different types of scrapers depending on the URL, so for clients to use it will be as simple as:

 Scraper.scrape(url) 

In addition, if this is a long-running task, you might consider using resque or delayed-jobs to offload the task to separate processes.

+1
source

Try to focus on working with the material first before moving it to the gem / plugin. Also, forget about the interface / abstract class - just write code that does the thing. The only thing your model needs to know is that remote recipe, and that URL. You can put all the scrapers code in app / scrapers. Here is an example implementation diagram:

 class RecipePage def new(url) @url = url @parser = get_parser end def get_attributes raise "trying to scrape unknown site" unless @parser @parser.recipe_attributes(get_html) end private def get_html #this uses your favorite http library to get html from the @url end def get_parser(url) #this matches url to your class, ie returns domain camelized, or nil if you are not handling particular site yet return EpicurusComParser end end class EpicurusComParser def self.recipe_attributes(html) # this does the hard job of querying html and moving # all the code to get title, text, image of recipe and return hash { :title => "recipe title", :text => "recipe text", :image => "recipe_image_url", } end end 

then in your model

 class Recipe after_create :scrape_recipe, :if => :recipe_url private def scrape_recipe # do that in background - ie in DelayedJob recipe_page = RecipePage.new(self.recipe_url) self.update_attributes(recipe_page.get_attributes.merge(:scraped => true)) end end 

Then you can create more parser, i.e. CookComParser, etc.

+1
source

Often utility classes that are not part of the MVC design are placed in the lib folder. I also saw people put them in the models folder, but lib really the β€œright” place.

Then you could create an instance of the recipe scraper in the controller as needed by feeding data to the model.

0
source

Not everything in the application / models must be an ActiveRecord model. Since they are directly related to the business logic of your application, they belong to the application directory, not the lib directory. They are also neither a controller, nor a view, nor an assistant (helpers can only help with views and views). So, they belong to applications / models. I would make sure that they will forgive them in space, for organizational purposes only, in an application / model / scrapers or something like that.

0
source

I would set a rake task to clear the site and create a new rake task. When this works, I use the background processor or the cron job to run the rake task.

0
source

I would create a folder in lib called scrapers. Then inside this folder create one file per scraper. Name these epicards, cooks, etc. You can then define a base scrapers class that contains common methods that will be common to all scrapers. Similar to the following

Library / scrapers / base.rb

 class Scrapers::base def shared_1() end def shared_2() end def must_implement1 raise NotImplemented end def must_implement2 raise NotImplemented end end 

Library / scrapers / epicurious.rb

 Class Epicurious < Base def must_implement1 end def must_implement2 end end 

Then call the appropriate class from your controller using Scrapers::Epicurious.new or call the class method in Scrapers::Base , which calls the appropriate implementation based on the argument passed.

0
source

Source: https://habr.com/ru/post/1308539/


All Articles