Repeated pagination with randomly arranged rows

I have an API that returns cubic broken rows from a database. It works, however, when I order strings using RANDOM() , I get duplicates on consecutive pages. Is it possible to set a random seed on a request?

If you cannot set an arbitrary SEED globally to force RANDOM() generate the same values ​​for each request? Then I could just change global random every 3 minutes or something like that ...


U uses this code:

 SELECT * FROM "table" ORDER BY RANDOM() OFFSET 5 LIMIT 5 

Now I want to pass the seed to this query so that I can break up random results. I have to do it like this:

 SELECT "table".*, SETSEED(0.1) FROM "table" ORDER BY RANDOM() OFFSET 5 LIMIT 5 SELECT "table".*, SETSEED(0.1) FROM "table" ORDER BY RANDOM() OFFSET 10 LIMIT 5 

And will the results be paginated correctly?

+11
source share
4 answers

If the order should be “jumbled”, but not randomly ...

( Update: see my other answer for a more flexible and randomized solution. )

You say the "random" order that you get when you call ORDER BY random() - for each row, PostgreSQL calls random() , gets the value and uses it to decide how to sort that row in the result set.

To make it repeatable, you have to mess with the seeds. It looks indecent. According to the docs :

effects will remain until the end of the session if they are not canceled by another SET

I think this means that when using the connection pool setseed connection is for the next process that uses this connection.

What is modulo?

I have a case when I do not need real randomness. My criteria:

  • not the same order every time
  • predictable ordering on pages of the same result set so that we don't get duplicates on subsequent pages

For example, this would be good:

  • Listing 1
    • page 1: paragraphs 1, 4
    • page 2: paragraphs 3, 2
  • Listing 2 (another user or same user coming back later)
    • page 1: paragraphs 3, 1
    • page 2: paragraphs 2, 4

To get something like this, modulo seems to work well. For example, ORDER BY id % 7, id for all pages of query 1 and ORDER BY id % 11, id for all pages of query 2. That is, for each row, divide your identifier modulo and sort by the remainder. In rows with the same remainder, sort by id (to ensure sort stability).

The module can be randomly selected for the first page, and then reused as a parameter for each subsequent page request.

You can see how this might work for your database, for example:

 echo "select id, id % 7 FROM my_table ORDER BY id % 77, id" | psql my_db > sort.txt 

The main module is likely to give you the biggest change. And if your identifiers begin with 1 (so that % 77 returns the first 77 lines in the normal order), you could instead try to execute a module for the timestamp field. For instance:

ORDER BY (extract(epoch from inserted_at)* 100000)::bigint % 77

But you need a function index to make it fast.

+7
source

With this union all method, the random order is repeated

 select a, b from ( select setseed(0.1), null as a, null as b union all select null, a, b from t offset 1 ) s order by random() offset 0 limit 5 ; 
+4
source

You can use setseed(dp) for the random() seed with the seed in [-1.0, 1.0]. For instance:.

 engine=> SELECT SETSEED(0.16111981); setseed --------- (1 row) engine=> SELECT RANDOM(); random ------------------- 0.205839179921895 (1 row) engine=> SELECT RANDOM(); random ------------------- 0.379503262229264 (1 row) engine=> SELECT RANDOM(); random ------------------- 0.268553872592747 (1 row) engine=> SELECT RANDOM(); random ------------------- 0.788029655814171 (1 row) 

And, of course, every time you overload, you get the same result:

 engine=> SELECT SETSEED(0.16111981), RANDOM(); setseed | random ---------+------------------- | 0.205839179921895 (1 row) engine=> SELECT SETSEED(0.16111981), RANDOM(); setseed | random ---------+------------------- | 0.205839179921895 (1 row) engine=> SELECT SETSEED(0.16111981), RANDOM(); setseed | random ---------+------------------- | 0.205839179921895 (1 row) engine=> SELECT SETSEED(0.16111981), RANDOM(); setseed | random ---------+------------------- | 0.205839179921895 

(clarification: the output was copied from psql , engine is the name of my database)

+2
source

Specify exact line identifiers (pre-randomized)

This query will give you rows with ids 4, 2, 1 and 4 again, in that exact order.

 SELECT items.id, items.name FROM items -- unnest expands array values into rows INNER JOIN unnest(ARRAY[4,2,1,4]) AS item_id ON items.id = item_id 

profitability

  id | name ----+--------------- 4 | Toast Mitten 2 | Pickle Juicer 1 | Horse Paint 4 | Toast Mitten 

Knowing this, you can provide identifiers that should be included on each page as you want.

For example, you can SELECT id FROM items ORDER BY random() , break the list into "pages" of 5 identifiers each and save it in the application’s memory, in Redis or anywhere. For each page requested, you must complete the request above with the correct block of 5 identifiers.

Options:

  • For true chance, you can include pgcrypto and ORDER BY gen_random_uuid() .
  • You can omit ORDER BY and shuffle the identifiers in memory in your programming language.
  • You can create different shuffles for the user or per day.
0
source

Source: https://habr.com/ru/post/974297/


All Articles