Random List of Millions of Elements in Python Effectively

I read this answer potentially as the best way to randomize a list of strings in Python. I'm just wondering if this is the most efficient way to do this, because I have a list of about 30 million items through the following code:

import json
from sets import Set
from random import shuffle

a = []

for i in range(0,193):
    json_data = open("C:/Twitter/user/user_" + str(i) + ".json")
    data = json.load(json_data)
    for j in range(0,len(data)):
        a.append(data[j]['su'])
new = list(Set(a))
print "Cleaned length is: " + str(len(new))

## Take Cleaned List and Randomize it for Analysis
shuffle(new)

If there is a more efficient way to do this, I would really appreciate any advice on how to do this.

Thank,

+3
source share
3 answers

Some possible suggestions:

import json
from random import shuffle

a = set()
for i in range(193):
    with open("C:/Twitter/user/user_{0}.json".format(i)) as json_data:
        data = json.load(json_data)
        a.update(d['su'] for d in data)

print("Cleaned length is {0}".format(len(a)))

# Take Cleaned List and Randomize it for Analysis
new = list(a)
shuffle(new)

.

  • the only way to find out if this is faster is with a profile!
  • Do you prefer sets.Set to the built-in set () for a reason?
  • ( , ).
  • , "a" , ; ?
  • , , ...
+4

, shuffle, , , . realz.

3-

( , 3 , 30 ). , , . badboy.

. , , ( ),

import json
import random
from operator import itemgetter

a = set()
for i in range(0,193):
    json_data = open("C:/Twitter/user/user_" + str(i) + ".json")
    data = json.load(json_data)
    a.update(d['su'] for d in data)

print "Cleaned length is: " + str(len(new))

new = [(random.random(), el) for el in a]
new.sort()
new = map(itemgetter(1), new)
+2

I don't know if it will be faster, but you can try numpy shuffle .

0
source

Source: https://habr.com/ru/post/1784207/