Sort faster in a racket using a hash table

So, I have an example of a list of items like this

(define A (list 'a 'c 'd 'e 'f 'e 'a)) 

Now I want to make a rating from this example

 (define (scan lst) (foldl (lambda (element a-hash) (hash-update a-hash element add1 0)) (hash) lst)) 

The result should be like this:

 > #(('a . 2) ('f . 1) ('e . 2) ....) 

Since the scan function will create a hash table containing unique keys and the number of repetitions of this key (if it catches an unindexed key, it will create a new location for this new key, starting from 0).

Then I would like to sort this hash table because it is unsorted:

 (define (rank A) (define ranking (scan A)) (sort ranking > #:key cdr))) 

Thus, the result will look like this:

# ((a. 2) ('e. 2) (' f. 1) ...)

Now I would like to truncate the hash table and throw the bottom to the threshold n = 1 (otherwise, we will select only elements with more than two repetitions).

 (define (truncate lst n) (define l (length lst)) (define how-many-to-take (for/list ([il] #:when (> (cdr (list-ref lst i)) n)) i)) (take lst (length how-many-to-take))) 

Thus, the result may look like this:

(('a. 2) (' e. 2))

However, on a large scale, this procedure is not very effective, it takes too much time. Will you offer better performance?

Thank you very much,

Part 2:

I have this data structure:

 (automaton x (vector (state y (vector abc)) (state y (vector abc)) ...)) 

Then I randomly generate a population of 1000 of them. Then I browse and classify them using the above functions. If I scan them as is, it will take a long time. If I try to flatten them into a list like this

 (list xyabcyab c...) 

it will take even more time. Here is the smoothing function:

 (define (flatten-au au) (match-define (automaton x states) au) (define l (vector-length states)) (define body (for/list ([i (in-range l)]) (match-define (state yz) (vector-ref states i)) (list y (vector->list z)))) (flatten (list x body))) 

The scan function will look a little different:

 (define (scan population) (foldl (lambda (auto a-hash) (hash-update a-hash (flatten-automaton auto) add1 0)) (hash) population)) 
+5
source share
1 answer

Yes, I think I see a problem. Your algorithm has a run time of O (n ^ 2) ("n-square"). This is because you are calculating from one to the length of the list, and then for each index doing list-ref , which takes time proportional to the size of the index.

This is very easy to fix.

In fact, there is no reason to sort it or convert it to a list if that is what you want; just filter the hash table directly. Like this...

 #lang racket (define A (build-list 1000000 (Ξ» (idx) (random 50)))) (define (scan lst) (foldl (lambda (element a-hash) (hash-update a-hash element add1 0)) (hash) lst)) (define ht (scan A)) (define only-repeated (time (for/hash ([(kv) (in-hash ht)] #:when (< 1 v)) (values kv)))) 

I added a call to time to find out how long it would take. For a list of one million, on my computer it takes a measured time of 1 millisecond.

Asymptotic complexity is important!

+5
source

Source: https://habr.com/ru/post/1237992/


All Articles