Huge leaderboard with filtering

We are building a massive multi-player educational game with several million entries on the leading board (based on accumulated HP). After the game, we need to show the leaderboard and rating of this player / student. But for this leaderboard there are several filters (global / by country, by month / year / today, by age, etc.) that can be mixed, for example. "Get the leaderboard for my Country for the last month." The number of combinations is ~ 20.

My problem is how to keep a structure that is regularly updated; recalculation of ranking should be made after each game. A typical complete leaderboard currently has ~ 5 million entries for players from 150 countries.

  • I used to have a MySQL cluster table (userid, xps, countryid) with 3 nodes, but ordering by XP (either in the DBMS or in an application that required all the data from the database) turned out to be too slow, more (> 20 thousand users) . This is an interesting post , but again, half a second for each request is too much.

  • Then we used REDIS (see this post ), but the filtering problem is here. We used separate lists for TOP 5, and the rest. TOP-5 was updated instantly, for the rest there was a slight delay of 20-30 minutes. We actually rated this user based on a cached instance of Leaderboard (using real XP, but not cached), so that was acceptable. Real-time on non-Top5 is not a prerequisite. This is good for one global ranking, but how to filter the results based on the month and / or country and / or age. Do I need to keep a list for each combination of filters?

  • We also tested custom structures in Java (using it as a Java caching server, similar in functionality to REDIS), while still experimenting with it. What is the best combination of structures to achieve our goal? As a result, we used one list for each filtering combination, for example. Map<FilteringCombination, SortedList<User>>and then perform a binary search in the list of a specific key. Thus, the finished game requires that several inserts say that X, but this requires the space X * NumOfPlayers, which is X times more than saving one list (not sure if this can fit in memory, but we can always create a cluster here splitting combinations on different servers). Here the problem is how to rebuild the cache in the event of a failure, but this is another problem that we can deal with.

  • , , (, 0-100xp, 101 - 1000xp, 1001 - 10000xp ..). xp . , , , , , XP , .

  • Cassandra, white-rows, , .

, , . ( UserX) Top5, (, 2 2 ) :

    Global TOP 5        My Global Ranking (425)   My Country Ranking     Other Rankings      
1. karen (12000xp)          423. george              1. david    
2. greg (11280xp)           424. nancy               2. donald 
3. philips (10293xp)      **425. UserX**             3. susan
4. jason (9800xp)           426. rebecca           **4. UserX** 
5. barbara (8000xp)         427. james               5. teresa

SO , Leaderboard. ( + + ( / ))?

+4
3

- . , , . , MySQL. , n- , SQL- .

, , REDIS. VoltDB, , .

, , , . ( ) , . 10 20 1 , .

. , , . , - . ( ). ( - , , ):

enum Country {
    ...
}

class User {
    String givenName;
    String familyName;
    int xp;
    Country country;
    int age;
}

class LeaderBoard {
    Set<User> users;
    Map<Integer, Set<User>> xpIndex;
    Map<Country, Set<User>> countryIndex;
    Map<Integer, Set<User>> ageIndex;
}

. :

private setUserAge(User user, int age) {
    assert users.contains(user);
    assert ageIndex.get(user.getAge()).contains(user);
    ageIndex.get(user.getAge()).remove(user);
    if (!ageIndex.containsKey(age)) {
        ageIndex.put(age, new TreeSet<>());
    }
    ageIndex.get(age).add(user);
    user.setAge(age);
}

, , :

countryIndex.get(Country.Germany).stream()
    .filter(ageIndex.get(20)::contains)
    .sorted(User::compareRank)
    ...

SortedSet<User> germanUsers = new TreeSet<>(User::compareRank);
germanUsers.addAll(countryIndex.get(Country.Germany));
germanUsers.retainAll(ageIndex.get(20));

, - , . paralellStream.

. , , . , , .

, , . HashMap vs TreeMap .

, , - . , . , ( Java 8).

, (, -), - .

0

, . :

Map<Country, Map<Age, Map <TimingIdentifier, List<User>>>> ( )

: "", , ( ). Age (All-Age) TimeIdentifier (All-Time). TimeIdentifier: [All-Time, Month, Week, Day]

, . Map<Filter1,Map<Filter2,Map<Filter3,Map<Filter4 ..other Map Keys here..,List<User>>>>

. , , . , FilterCombination:

class FilterCombination {
    private int CountryId;
    private int AgeId;
    private int TimeId;
    ...
}

Map<FilterCombination, List<User>> ( )

TreeSet, . ? , (. ), , , Java (. here). , VS - List.add(index, Object), O (n). LinkedList .add(index, Object), , , k- ( - O (n)). , .

. , (. ). ( ), O (logn + n) ( + List.add(, )).

- , , O (logn + n) + ?

* , , User XP (+ timestamp, ), Id, User -Id ).

**

1st: XP

- 2- : XP

, , . , , XP ( , , XP, ).

XP . , ( ), XP ... XP timestamps, log (n).

0

The easiest option is to select a sorted set of Redis and use the master subordinates for replication. Enable RDB on each slave and backup RDB files to S3. Using Kafka to save all records before they go to Radish. Thus, we can later ignore the missing transactions.

0
source

Source: https://habr.com/ru/post/1569427/


All Articles