Print over 20 posts from the Tumblr API

Good afternoon,

I am very new to Python, but I am trying to write code that will allow me to download all messages (including "notes") from the specified Tumblr account to my computer.

Given my inexperience in coding, I tried to find a ready-made script that would allow me to do this. I found some brilliant scripts on GitHub, but none of them returned notes from Tumblr posts (as far as I can see, although please correct me if anyone knows about this!).

So I tried to write my own script. I had some success with the code below. It prints the last 20 messages from a given Tumblr (albeit in a rather ugly format - essentially hundreds of lines of text, all printed in one line of a notepad file):

#This script prints all the posts (including tags, comments) and also the #first 20notes from all the Tumblr blogs. import pytumblr # Authenticate via API Key client = pytumblr.TumblrRestClient('myapikey') #offset = 0 # Make the request client.posts('staff', limit=2000, offset=0, reblog_info=True, notes_info=True, filter='html') #print out into a .txt file with open('out.txt', 'w') as f: print >> f, client.posts('staff', limit=2000, offset=0, reblog_info=True, notes_info=True, filter='html') 

However, I want the script to continuously print messages until it reaches the end of the specified blog.

I searched this site and found a very similar question ( Receiving only 20 messages returned using PyTumblr ) answered by the user poke stackoverflow. However, I cannot actually implement the poke solution so that it works for my data. Indeed, when I run the following script, no output is generated at all.

 import pytumblr # Authenticate via API Key client = pytumblr.TumblrRestClient('myapikey') blog = ('staff') def getAllPosts (client, blog): offset = 0 while True: posts = client.posts(blog, limit=20, offset=offset, reblog_info=True, notes_info=True) if not posts: return for post in posts: yield post offset += 20 

It should be noted that on this site there are several messages (for example, Receiving more than 50 notes with the Tumblr API ) about Tumblr notes, most of which ask how to download more than 50 notes for each message. I am completely satisfied with only 50 notes per message, this is the number of messages that I would like to increase.

In addition, I marked this post as Python, however, if there is a better way to get the data I need using another programming language, this will be more than normal.

Thank you very much for your time!

+5
source share
1 answer

tl; dr If you want to just see the answer, it is below the heading A Amended version

The second piece of code is a generator that gives messages one by one, so you should use it as part of something like a loop, and then do something with the output. Here is your code with some additional code that iterates over the generator and prints the data it returns.

 import pytumblr def getAllPosts (client, blog): offset = 0 while True: posts = client.posts(blog, limit=20, offset=offset, reblog_info=True, notes_info=True) if not posts: return for post in posts: yield post offset += 20 # Authenticate via API Key client = pytumblr.TumblrRestClient('myapikey') blog = ('staff') # use the generator getAllPosts for post in getAllPosts(client, blog): print(post) 

However, there are a couple of errors in this code. getAllPosts will not only output every post, it will also return other things, because it will iterate over the API response, as you can see from this example, I launched ipython in my shell.

 In [7]: yielder = getAllPosts(client, 'staff') In [8]: next(yielder) Out[8]: 'blog' In [9]: next(yielder) Out[9]: 'posts' In [10]: next(yielder) Out[10]: 'total_posts' In [11]: next(yielder) Out[11]: 'supply_logging_positions' In [12]: next(yielder) Out[12]: 'blog' In [13]: next(yielder) Out[13]: 'posts' In [14]: next(yielder) Out[14]: 'total_posts' 

This is because the posts object in getAllPosts is a dictionary that contains much more than just every post from the staff blog - it also has elements such as the number of blog posts, description of the blog when it was last updated, etc. . As-is code could potentially lead to an infinite loop, as the following condition:

 if not posts: return 

It will never be true due to the structure of the response, because the empty Tumblr API response from pytumblr looks like this:

 {'blog': {'ask': False, 'ask_anon': False, 'ask_page_title': 'Ask me anything', 'can_send_fan_mail': False, 'can_subscribe': False, 'description': '', 'followed': False, 'is_adult': False, 'is_blocked_from_primary': False, 'is_nsfw': False, 'is_optout_ads': False, 'name': 'asdfasdf', 'posts': 0, 'reply_conditions': '3', 'share_likes': False, 'subscribed': False, 'title': 'Untitled', 'total_posts': 0, 'updated': 0, 'url': 'https://asdfasdf.tumblr.com/'}, 'posts': [], 'supply_logging_positions': [], 'total_posts': 0} 

if not posts will be checked for this structure, and not for the posts field (this is an empty list here), so the condition will never end, because the answer dictionary is not empty (see Checking the value of truth in Python ).


Fixed version

Here's the code (mostly verified / verified) that captures the loop from your getAllPosts implementation and then uses this function to retrieve the messages and passes them to a file called (BLOG_NAME)-posts.txt .

 import pytumblr def get_all_posts(client, blog): offset = 0 while True: response = client.posts(blog, limit=20, offset=offset, reblog_info=True, notes_info=True) # Get the 'posts' field of the response posts = response['posts'] if not posts: return for post in posts: yield post # move to the next offset offset += 20 client = pytumblr.TumblrRestClient('secrety-secret') blog = 'staff' # use our function with open('{}-posts.txt'.format(blog), 'w') as out_file: for post in get_all_posts(client, blog): print >>out_file, post # if you're in python 3.x, use the following # print(post, file=out_file) 

It will be just a text dump of the responses of the API messages, so if you need to make it more attractive or something else you need.

+2
source

Source: https://habr.com/ru/post/1273403/


All Articles