Followers / next database structure

My site has followers / the following system (e.g. Twitter). My dilemma is creating a database structure to handle who follows who.

As a result, I created a table like this:

id | user_id | followers | following 1 | 20 | 23,58,84 | 11,156,27 2 | 21 | 72,35,14 | 6,98,44,12 ... | ... | ... | ... 

Basically, I thought that each user would have a row with columns for their followers and the users they follow. Followers and the people they follow will separate their user ID with commas.

Is this an effective way to handle it? If not, what is the best alternative?

Thanks.

+6
source share
4 answers

This is the worst way to do this. This is against normalization. They have 2 separate tables. Users and User_Followers. Users will store user information. User_Followers will look like this:

 id | user_id | follower_id 1 | 20 | 45 2 | 20 | 53 3 | 32 | 20 

User_Id and Follower_Id will be foreign keys that reference the Id column in the Users table.

+20
source

One weakness of this representation is that each relationship is encoded twice: once per line for the follower and once per line for the next user, which makes it difficult to maintain data integrity and update tedious ones.

I would make one table for users and one table for relationships. The relationship table will look like this:

 id | follower | following 1 | 23 | 20 2 | 58 | 20 3 | 84 | 20 4 | 20 | 11 ... 

Thus, adding a new relationship is just an insertion, and deleting a relationship is a deletion. It is also much easier to collapse the counts to determine how many followers a given user has.

+4
source

There is a better physical structure than suggested by other answers:

 CREATE TABLE follower ( user_id INT, -- References user. follower_id INT, -- References user. PRIMARY KEY (user_id, follower_id), UNIQUE INDEX (follower_id, user_id) ); 

InnoDB tables are clustered , so secondary indexes behave differently than heap tables and can have unexpected overhead if you are not aware of this. Having a primary surrogate key id simply adds another index for no good reason 1 and makes the indexes {user_id, follower_id} and {follower_id, user_id} thicker than they should be (since the secondary indexes in the cluster table implicitly include a copy of the PC).

There is no surrogate key id table above and (it is assumed that InnoDB) is physically represented by two B-trees (one for the primary / clustered key and one for the secondary index), which is approximately as effective as for searching in both directions 2 . If you need only one direction, you can abandon the secondary index and go to only one B-tree.

By the way, what you did was a violation of the principle of atomicity and, therefore, 1NF.


1 And each additional index occupies a space, reduces cache efficiency and affects the performance of INSERT / UPDATE / DELETE.

2 From followers to followers and vice versa.

+2
source

No, the approach you described has several problems.

First, storing multiple data points as comma-separated rows has a number of problems. It’s hard to connect (and while you can join using like , it will slow down) and it will be difficult and slow to search, and cannot be indexed as you would like.

Secondly, if you keep both a list of followers and a list of people, you have redundant data (the fact that A next B will be displayed in two places), which is both a waste of space and also creates the potential for getting data from due to synchronization (if the database shows A on the list of B followers, but does not display B in the list below, then the data is incompatible with the fact that it is very difficult to restore).

Use the connection table instead. This is a separate table in which each row has a user identifier and a repeater identifier. This allows you to store things in one place, allows you to index and join, and also allows you to add additional columns to this row, for example, to show when the next ratio began.

+1
source

Source: https://habr.com/ru/post/957216/


All Articles