How to implement Twitter replica action in my database

I am using a web application like Twitter. I need to implement the retweet action, and one tweet can be rephrased by one person several times .

I have a basic table of tweets that has columns for:

Tweets: tweet_id | tweet_text | tweet_date_created | tweet_user_id

(where tweet_id is the primary key for tweets, tweet_text contains the tweet text, tweet_date_created is the DateTime when the tweet was created, and tweet_user_id is the foreign key in the users table and identifies the user who created the tweet)

Now I am wondering how can I implement the retweet action in my database.

Option 1

Should I create a new connection table that looks like this:

Retweets : tweet_id | user_id | retweet_date_retweeted

(Where tweet_id is the foreign key in the tweets table, user_id is the foreign key in the users table and identifies the user who renamed the tweet, retweet_date_retweeted is a DateTime that indicates when the retweet was done.)

pros: There will be no empty columns when the user reteet process is created, a new row in the retweets table.

minus . The request process will be more complicated, it will need to join the two tables and somehow sort the tweets by two dates (when the tweet is not redirected, sort it by tweet_date_created, when the tweet is mixed up, collect it by retweet_date_retweeted).

Option 2

Or should I implement it in the tweets table as parent_id , then it will look like this:

Tweets: tweet_id | tweet_text | tweet_date_created | tweet_user_id | parent_id

(where all columns remain unchanged, and parent_id is the foreign key in the same tweets table. When the tweet is created, parent_id remains empty. When the tweet is re-read, parent_id contains the id of the start of the tweet, tweet_user_id contains the user who processed the retweets, tweet_date_created contains DateTime, when a retviant was executed, and tweet_text remains empty - becouse we do not allow users to change the original tweets when relaying .)

Pros: The query process is much more elegant since I don’t need to join two tables.

cons: There will be empty cells every time a tweet is reread. Therefore, if there are 1,000 tweets in my database and each of them is reviewed 5 times, tweets will be 5,000 rows in my tweets table.


What is the most effective way? Is it better to have empty cells or make the query process cleaner?

+6
source share
2 answers

IMO option 1 would be better. A request to connect to a tweet and grid tables would not be complicated at all and could be done through a left or inner join, depending on whether you want to show all tweets or only tweets that have been processed. And a join request must be made because the table is narrow, the columns that join are ints, and each of them will have indexes due to FK restrictions.

Another recommendation is not to mark all your columns with tweet or retweets, this can be done from the table in which the data is stored, for example:

 tweet id user_id text created_at retweet tweet_id user_id created_at 

And the example is combined:

 # Return all tweets which have been retweeted SELECT count(*), t.id FROM tweet AS t INNER JOIN retweet AS rt ON rt.tweet_id = t.id GROUP BY t.id # Return tweet and possible retweet data for a specific tweet SELECT t.id FROM tweet AS t LEFT OUTER JOIN retweet AS rt ON rt.tweet_id = t.id WHERE t.id = :tweetId 

- Update for each request -

Below is just a demo representing why I would choose option # 1, there are no foreign keys and no indexes, you will have to add them yourself. But the results should demonstrate that the compounds will not be too painful.

 CREATE TABLE `tweet` ( `id` int(10) unsigned NOT NULL AUTO_INCREMENT, `user_id` int(10) unsigned NOT NULL, `value` varchar(255) NOT NULL, `created_at` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP, PRIMARY KEY (`id`) ) ENGINE=MyISAM AUTO_INCREMENT=8 DEFAULT CHARSET=utf8 CREATE TABLE `retweet` ( `id` int(10) unsigned NOT NULL AUTO_INCREMENT, `tweet_id` int(10) unsigned NOT NULL, `user_id` int(10) unsigned NOT NULL, `created_at` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP, PRIMARY KEY (`id`) ) ENGINE=MyISAM AUTO_INCREMENT=3 DEFAULT CHARSET=utf8; # Sample Rows mysql> select * from tweet; +----+---------+----------------+---------------------+ | id | user_id | value | created_at | +----+---------+----------------+---------------------+ | 1 | 1 | User1 | Tweet1 | 2012-07-27 00:04:30 | | 2 | 1 | User1 | Tweet2 | 2012-07-27 00:04:35 | | 3 | 2 | User2 | Tweet1 | 2012-07-27 00:04:47 | | 4 | 3 | User3 | Tweet1 | 2012-07-27 00:04:58 | | 5 | 1 | User1 | Tweet3 | 2012-07-27 00:06:47 | | 6 | 1 | User1 | Tweet4 | 2012-07-27 00:06:50 | | 7 | 1 | User1 | Tweet5 | 2012-07-27 00:06:54 | +----+---------+----------------+---------------------+ mysql> select * from retweet; +----+----------+---------+---------------------+ | id | tweet_id | user_id | created_at | +----+----------+---------+---------------------+ | 1 | 4 | 1 | 2012-07-27 00:06:37 | | 2 | 3 | 1 | 2012-07-27 00:07:11 | +----+----------+---------+---------------------+ # Query to pull all tweets for user_id = 1, including retweets and order from newest to oldest select * from ( select t.* from tweet as t where user_id = 1 union select t.* from tweet as t where t.id in (select tweet_id from retweet where user_id = 1)) a order by created_at desc; mysql> select * from (select t.* from tweet as t where user_id = 1 union select t.* from tweet as t where t.id in (select tweet_id from retweet where user_id = 1)) a order by created_at desc; +----+---------+----------------+---------------------+ | id | user_id | value | created_at | +----+---------+----------------+---------------------+ | 7 | 1 | User1 | Tweet5 | 2012-07-27 00:06:54 | | 6 | 1 | User1 | Tweet4 | 2012-07-27 00:06:50 | | 5 | 1 | User1 | Tweet3 | 2012-07-27 00:06:47 | | 4 | 3 | User3 | Tweet1 | 2012-07-27 00:04:58 | | 3 | 2 | User2 | Tweet1 | 2012-07-27 00:04:47 | | 2 | 1 | User1 | Tweet2 | 2012-07-27 00:04:35 | | 1 | 1 | User1 | Tweet1 | 2012-07-27 00:04:30 | +----+---------+----------------+---------------------+ 

Note that in the final result set, we were also able to enable retweets and display the retweets from # 4 before retweets # 3.

- Update -

You can accomplish what you request by changing the request a bit:

 select * from ( select t.id, t.value, t.created_at from tweet as t where user_id = 1 union select t.id, t.value, rt.created_at from tweet as t inner join retweet as rt on rt.tweet_id = t.id where rt.user_id = 1) a order by created_at desc; mysql> select * from (select t.id, t.value, t.created_at from tweet as t where user_id = 1 union select t.id, t.value, rt.created_at from tweet as t inner join retweet as rt on rt.tweet_id = t.id where rt.user_id = 1) a order by created_at desc; +----+----------------+---------------------+ | id | value | created_at | +----+----------------+---------------------+ | 3 | User2 | Tweet1 | 2012-07-27 00:07:11 | | 7 | User1 | Tweet5 | 2012-07-27 00:06:54 | | 6 | User1 | Tweet4 | 2012-07-27 00:06:50 | | 5 | User1 | Tweet3 | 2012-07-27 00:06:47 | | 4 | User3 | Tweet1 | 2012-07-27 00:06:37 | | 2 | User1 | Tweet2 | 2012-07-27 00:04:35 | | 1 | User1 | Tweet1 | 2012-07-27 00:04:30 | +----+----------------+---------------------+ 
+10
source

I would choose option 2 with minor changes. The parent_id column in the tweet table must point to itself if it is not a retweet. Then the request will be very simple:

 SELECT tm.Id, tm.UserId, tc.Text, tm.Created, CASE WHEN tm.Id <> tc .Id THEN tm.UserId ELSE NULL END AS OriginalAsker FROM tweet tm LEFT JOIN tweet tc ON tm.ParentId = tc.Id ORDER BY tm.Created DESC 

( tc is the parent table - the one that contains the content .. it has tweet text, the original identifier of the poster, etc.)

The reason for introducing the rule of pointing to oneself, if not re-read, is that it is then easy to add more associations to the original tweet. You simply join the table with tc and don't care if it is a retweet or not.

Not only is the query simple, but it will also work much better than option 1, because the sorting is performed using only one physical column that can be indexed.

The only drawback is that the database will be a bit larger.

+1
source

Source: https://habr.com/ru/post/921324/


All Articles