What is the best way to process the source data files in a web application?

I have about 30 MB of text data, which are basic to the algorithms that I use in my web application.

On the one hand, data is part of the algorithm, and data changes can lead to the failure of the entire algorithm. This is why I save the data in text files in my original control, and all changes are automatically checked (pre-commit). I currently have a good level of control. Distributing data with the source as we create more web instances is not a problem because it contains tags along with the source. I currently have the following issues:

  • I often develop specialized file management tools by replicating the functionality of a database access tool
  • I would like to provide non-developers with controlled web access to this data.

On the other hand, it is data, and it “belongs” to the database. I would like to place it in the database, but then I will have the following problems:

  • How do I sync this database with source code? The release contains both code and data.
  • How to send it with data when I create a new web server instance?
  • How do I set up data pre-validation?

Things I looked at this way:

  • Sqlite (does not allow non-developer access)
  • Building a carefully prepared database that data users will edit to create “patches” into a “real” database that developers will accept, test and commit. It sounds very complicated. I have not fully developed this yet, and I'm sure that I am reinventing the wheel here and some SO user will show me the error of my paths ...

BTW: I also have a “regular” database, with things that are not algorithmic data.
BTW2: I added the Python tag because I am currently using Python, Django, Apache and Nginx, Linux (and some lame developers use Windows).

Thanks in advance!

UPDATE

Some examples of data (algorithms are natural language):

  • Cities of the world and their alternative names
  • Currency Names
  • Hotel coordinates

The list goes on, but imagine that you are trying to make out the sentence Romantic hotel for 2 in Rome arriving in Italy next monday if someone changes the coordinates that teach me that Rome is in Italy, or if someone adds “romance” as an alternative name for Las Vegas (yes, the example is lame, but I hope you get a drift).

+4
source share
2 answers

Ok, here is the idea:

  • Send all the data as it is now.
  • The installation script is installed in the appropriate database.
  • Allow users to modify this database and give them a "restore to original" button, which simply reinstalls from a text file.

Alternatively, this route could be easier, especially. when updating the installation:

  • Send all the data as it is now.
  • Allow users to modify the data and save the modified versions in the appropriate database.
  • Let the code look in the database, returning to text files if the corresponding data is not found. Do not let the code modify text files in any way.

In any case, you can save your current testing code; you just need to add tests that ensure that the database correctly overrides text files.

+1
source

I would call this resource, which is the data that your application relies on, but not the data that your application manages. Images, CSS, and templates are similar resources, and you control all versions of them.

In this case, you can split your data into a separate package. In a python distribution environment, use a separate egg that your application depends on; package deployment tools such as pip and buildout automatically download the dependency. That way you can regardless of version, you can use other tools, etc.

Finally, select a format that can be effectively managed using a version control system. This means text format. You can initiate parsing of this format at startup, but by keeping it text, you can properly manage it with changes. It can be a SQL dump (CREATE TABLE and INSERT) for loading into the sqlite database at startup or some other python-based structure. You can also load data on demand, caching it in memory if necessary.

Take a look at pytz for a great example of how another resource project manages such structures.

+1
source

Source: https://habr.com/ru/post/1342499/


All Articles