Production, testing, development environment and security

What are the current methods that allow developers to create systems containing personal data? Can anyone point out a guide to “best practices” for this kind of thing?

We have Catch-22 here in which developers need to write applications that go against systems that have data that is considered "private." The IT administration would like our developers to not have access to data (i.e., to provide a data scheme or structure, but not the data itself), while most developers (including me) would like to have access to production data, because they do not have a representative dataset can lead to unsuccessful assumptions (e.g., data format) and errors later.

Are there any formalized “best practices” for this type of thing? Especially the official guilds from some "BigCo" (for example, Microsoft, IBM) can help, because it is necessary to convince the leadership.

+3
source share
6 answers

My view of the world may be different, since I am based in the UK, but over the past 20 plus years I have worked mainly in the public sector on confidential data processing systems. Rules ** fully ** carved and dried. No production data is permitted for real estate.

As a fundamental principle, we do not want to be held responsible for the loss of confidential data. Users do an excellent job of this themselves.

Over the past 12 months, my wife has moved from the same regime to one in the private sector, where they allow developers access to production data, and she is terrified of it. Legal implications (in the UK, at least) can be serious.

Developers do not need ** access to production data. This is just laziness. Define and create test data for specific test cases (including edge cases) and do not rely on the arbitrary nature of production data.

If you ** must ** use production data (i.e. you can convince someone who does not know the best that this is acceptable), make sure that the data is anonymous ** before ** he reaches real estate development.

+5
source

Often time a subset of sanitized data will be provided that will represent private data, but not personal data.

+3
source

At my company, we started using the Red-gate data generator to generate test data. There are few options, but you can use the tools to create very convenient test data. Yes, I would rather use live production data, but this is not feasible (especially if you need to consider in HIPAA). It uses a regular expression for each column and allows you to use a lookup table for related tables.

+2
source

At MediumCo, we strip patented data from our production data in Test and Dev. In the past, it was a little painful for us not to have exactly representative data, but customers have already asked about this before, and this is usually not a problem, since there are a lot of fake proprietary data in the environments.

+1
source

I do not have paper with recommendations or anything else. But I would think that if you are developing an environment that is as secure as the environment in which data is placed in the production, there would be a large number of arguments against it.

That is, if your production database is located in a data center hosted and controlled and protected by your IT staff, if you have a development database that lives in the same scenario and does not offer any new ways to access information - you would be in good shape. As an additional sign of goodwill, it would be nice to offer anyone who is concerned about security the opportunity to do some kind of penetration test to make sure that you are telling the truth about security.

The other side of this, of course, is the analysis of the cost of not using data: that is, it will lead to the creation of more complex code that will cost $ xxxxxx.xx during development and with little or no cost to allow access to the specified data to a small subset of your development team .

0
source

To avoid having to manually sanitize / anonymize the data, you can use random text replacement - to replace each alphanumeric character in each text field with a random alphanumeric. It:

  • saves data by length, size, etc. from the point of view of the developer
  • no problem with character sets
  • It does not contain date and date dates, which allows accurate checks regarding date and quantity ranges.
  • satisfy most privacy requirements

If you want to go a little further, you can start a random digital change of numbers to phone numbers and postal codes using the alphanumeric change in other text fields.

Having an automatic script replacement allows you to regularly receive updated data dumps from a live system, so your tests are relevant in terms of data size and variability in practice.

This means that a small number of operations will not be realistic (for example, indexing name fields that are grouped around common letters in real life), but they should be limited.

0
source

Source: https://habr.com/ru/post/896313/


All Articles