"IncompleteRead" error while retrieving Twitter data using Python

When running this program to retrieve Twitter data using Python 2.7.8:

#imports from tweepy import Stream from tweepy import OAuthHandler from tweepy.streaming import StreamListener #setting up the keys consumer_key = '…………...' consumer_secret = '………...' access_token = '…………...' access_secret = '……………..' class TweetListener(StreamListener): # A listener handles tweets are the received from the stream. #This is a basic listener that just prints received tweets to standard output def on_data(self, data): print (data) return True def on_error(self, status): print (status) #printing all the tweets to the standard output auth = OAuthHandler(consumer_key, consumer_secret) auth.set_access_token(access_token, access_secret) stream = Stream(auth, TweetListener()) t = u"سوريا" stream.filter(track=[t]) 

after running this program for 5 hours I received this error message:

 Traceback (most recent call last): File "/Users/Mona/Desktop/twitter.py", line 32, in <module> stream.filter(track=[t]) File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/tweepy/streaming.py", line 316, in filter self._start(async) File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/tweepy/streaming.py", line 237, in _start self._run() File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/tweepy/streaming.py", line 173, in _run self._read_loop(resp) File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/tweepy/streaming.py", line 225, in _read_loop next_status_obj = resp.read( int(delimited_string) ) File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.py", line 543, in read return self._read_chunked(amt) File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.py", line 612, in _read_chunked value.append(self._safe_read(chunk_left)) File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.py", line 660, in _safe_read raise IncompleteRead(''.join(s), amt) IncompleteRead: IncompleteRead(0 bytes read, 976 more expected) >>> 

Actually, I don’t know what to do with this problem !!!

+8
source share
4 answers

You should check to see if you can handle tweets fast enough using the stall_warnings parameter.

 stream.filter(track=[t], stall_warnings=True) 

These messages are handled by Tweepy (check out the implementation here ) and will let you know if you're behind. Lagging means you can't process tweets as fast as the Twitter API sends them to you. From Twitter docs:

Setting this parameter to true will cause the delivery of periodic messages if the client risks being disconnected. These messages are sent only when the client is behind, and will appear with a maximum frequency approximately every 5 minutes.

In theory, you should get a message about disconnecting from the API in this situation. However, this is not always the case:

The streaming API will attempt to deliver a message indicating why the stream was closed. Please note that if the break occurred due to network problems or the client was reading too slowly, this message may not be received.

IncompleteRead may also be due to a temporary network problem and may never happen again. If this happens reproducibly after about 5 hours, lag is a good bet.

+7
source

I had this problem. The other answer is actually correct, as it is almost certainly:

  • Your program does not support thread
  • You will receive a warning if this happens.

In my case, I read tweets in postgres for further analysis on a fairly dense geographical area, as well as keywords (in London, and about 100 keywords). It is possible that even if you simply print it, your local machine performs many other functions, and system processes become priority, so tweets will work until Twitter disconnects you. (This typically manifests itself as an obvious memory leak - the program increases in size until it is killed, or the twitter shuts down - whichever comes first.)

What made sense here was to push the processing into the queue. So I used the redis and django-rq solution - it took about 3 hours to deploy on dev and then on my production server, including researching, installing, re-creating existing code, stupid about my installation, testing and spelling errors, as I he went.

Now in your django directory (where ymmv is needed for direct python applications) do: python manage.py rqworker &

Now it's your turn! You can add tasks to it, for example, by changing your handler as follows: (At the top of the file)

 import django_rq 

Then in your handler section:

 def on_data(self, data): django_rq.enqueue(print, data) return True 

Aside - if you are interested in materials coming from Syria, instead of just mentioning Syria, you can add to the filter:

stream.filter(track=[t], locations=[35.6626, 32.7930, 42.4302, 37.2182]

This is a very crude geobox centered on Syria, but which will raise the Iraq / Turkey bit around the edges. Since this is an additional option, it is worth pointing out this:

Boundary fields do not act as filters for other filter parameters. For example, track = twitter & location = -122.75,36.8, -121.75,37.8 will correspond to any tweets containing the term Twitter (even non-geo-tweets) OR coming from the San Francisco area.

From this answer that helped me, and ditter .

Edit: from your subsequent posts, you can see that you continue to use the Twitter API, so hopefully you are sorted anyway, but hopefully it will be useful to someone else! :)

+2
source

Where do these breakdown warnings appear? I set this parameter to true as I am getting the same error as the OP, but keep getting the error without warning.

0
source

It worked for me.

 l = StdOutListener() auth = OAuthHandler(consumer_key, consumer_secret) auth.set_access_token(access_token, access_token_secret) stream = Stream(auth, l) while True: try: stream.filter(track=['python', 'java'], stall_warnings=True) except (ProtocolError, AttributeError): continue 
0
source

Source: https://habr.com/ru/post/977433/


All Articles