Need a better approach to the madness of database architecture

We are creating a planning system for our launch.

This is just normal, except for the automatic search function that we want to implement. LAMP architecture. Nothing special.

This is what the DB looks like. Three main tables:

  • Offices (id, start_time, stop_time)
  • People (id, office_id, start_time, stop_time)
  • Schedule (id, people_id, start_time, stop_time)

start_time / stop_time - TIMESTAMPS.

Tables do not have to be that way. This is exactly what we have.

Offices table has opening / closing times for offices. This table can be 365 days per office, since the opening / closing times are not the same. Please note that this can be up to 1000 offices. This makes approximately 365,000 entries in the table.

People have a connection / vacation time. This is clearly more restrictive than an office. Again, every day of the year, people can have different hours of visits. Each office has about 50 people. This amounts to 1,000 offices * 365 days * 50 employees = 18,250,000 records.

Schedule - who will meet whom. Each person can have up to 10 appointments per day. Yes, at the moment this can easily make 182.5 million rows in this table.

Nothing strange but big numbers. What the application needs to do is provide an office, a person to meet and a duration, show the first 5 available dates.

From what we believe, this application will completely destroy our server. We just desperately need it. The first thing we thought was "this is generally impossible." But hey! everything is possible in software, right?

PS: If someone thinks of a better approach that makes the application viable, we REALLY evaluate it.

Thanks so much for reading. Hope some hardcore programmer can lend a hand to us.

UPDATE:

For testing purposes, we created two exactly the same tables:

meetings and offices (id, profesional, start, stop).

The identifier is primary, the rest is the BTREE index. SQL is something like this (which does not work 100%):

SELECT a.profesional, a.stop AS desde, Min(b.start) AS hasta FROM meetings AS a JOIN meetings AS b ON a.profesional=b.profesional AND a.stop < b.start WHERE a.profesional = 1 AND b.profesional = 1 GROUP BY a.start UNION SELECT m.profesional, MIN(m.start), MIN(j.start) FROM offices m JOIN meetings j ON j.profesional = m.profesional WHERE j.profesional = 1 AND m.profesional = 1 UNION SELECT m.profesional, MAX(j.stop), MAX(m.stop) FROM offices m JOIN meetings j ON j.profesional = m.profesional WHERE j.profesional = 1 AND m.profesional = 1 ORDER BY desde ASC 

We have done the following. Add only 1 office with 240 days. Each day has 8 meetings, which is a total of about 2,000 lines. To complete such a request, 2.6 (!) Seconds are required. Is the request incorrect? Can I rewrite it?

+4
source share
2 answers

If you were given a person, would this not reduce the number of lines in the schedule to consider a 50,000-fold reduction? The number of office rows will also be reduced to a few hundred, if you only consider this office. The correct index will find these rows as soon as possible.

In addition, do people really plan a full year of meetings in advance, or is it likely that you will only have a fully reserved database for a month or two in the future? You can always move old data to the archive if you have any performance problems in your main database.

In addition, with a “do” rating it’s easy to think too much. You should try to find out how many people each office will have on average and how many meetings they will have on average a day. "Up to 10 meetings per day" can easily turn into "usually twice a day." Depends on what kind of business we are talking about, of course.

And don't forget to deduct the weekend. They are 2/7 years.

+6
source

Your application requires one important request. Find the intervals defined

  (OfficeOpenIntervals INTERSECT PeopleAtOfficeIntervals) MINUS ScheduleIntervals 

and search at these intervals, next to or after a specific date.

Using appropriate indexes and query restriction (finding only one person over the next 60 days, etc.) is likely to be fine. Handling time intervals for operations is complex, but you can test various indexes and methods for writing queries.


Another option (if you check and do not find an effective way of indexing) is to have a separate AvailableSlots table, which would be the first when there are no scheduled meetings filled with all available days when a person is in the office (this will be OfficeOpenIntervals INTERSECT PeopleAtOfficeIntervals ). Then, each time the appointment is added to the Schedule , the corresponding row in this AvailableSlots table will either be deleted, updated, or split into two rows, which will save the remaining available intervals for the person who was scheduled for the meeting.

Thus, a request to display the first 5 available dates should be searched only in this table.

This is not a normalized solution, and integrity must be maintained by stored procedures (for all operations, such as adding a schedule, leaving the office, launching, etc.). The initial population may also take time and space, but you do not need to fill the table for centuries. It can be just a few months, and additional populations can be done later (once a month or a year or when necessary).

0
source

Source: https://habr.com/ru/post/1395091/


All Articles