I am trying to multithreadedly read the following code, and I just can't get it to work.
The following code (from which I removed most of the code for illustrative purposes only) currently runs smoothly but slowly (approximately 5 minutes for a list of 3600 tweets).
import dataset import datetime import json with open("postgresConnecString.txt", 'r') as f: DB_CONNECTIONSTRING = f.readline() DB = dataset.connect(DB_CONNECTIONSTRING) def load_tweet(tweet, tweets_saved): """Takes a tweet (dictionary) and upserts its contents to a PostgreSQL database""" try: data = {'tweet_id': tweet['tweet_id', 'tweet_json': json.dumps(tweet)}
So I was looking for an option to do this multithreaded. However, I have not yet found a way that I can:
- Multi-threaded extraction process;
- Print the counter for 100, 500 or 1000 tweets;
Going through this tutorial did not give me understanding to do it yet: the class concepts for each thread, what I need to configure in the class and implement the queue at the moment is very difficult for me to understand; I'm just getting started.
- Can anyone specify how I will use the script above using multiple threads?
- How many threads should I use? Python currently uses ~ 1% of my processor when running a script and ~ 10% of RAM (my system specifications )
- How can I take care of increasing the counter (using the Lock () function?) And print it when the counter hits% 100?
EDIT: as requested: here are the big snapshots from the profiling result (using dataset.upsert):
ncalls tottime percall cumtime percall filename:lineno(function) 5898 245.133 0.042 245.133 0.042 :0(_connect) 5898 12.137 0.002 12.206 0.002 :0(execute)
Here is the second attempt with 'dataset.insert' instead of 'dataset.upsert':
1386332 function calls (1382960 primitive calls) in 137.255 seconds ncalls tottime percall cumtime percall filename:lineno(function) 2955 122.646 0.042 122.646 0.042 :0 (_connect)
Last (and certainly not less), here is the time you run the psycopg2 source code.
63694 function calls (63680 primitive calls) in 2.203 seconds
In conclusion, do not use the data set for performance (although it took me 10 minutes to write psycopg2 code, and that is โ 10 seconds for the .upsert data set)
- Now, regarding the original question. Will I be able to reduce a 2-second file per file in multi-threading? How?
Full code can be found here.