How can I programmatically generate appropriate tags for a database of URLs?

I am writing an RSS reader in python as a training exercise, and I would really like to be able to mark individual entries with search keywords. Unfortunately, most real-world channels do not include keyword metadata. Currently, I have about 60,000 entries in a test database of approximately 600 channels, so manual tagging will not be effective. So far I have managed to find only two solutions:

1: Use the Natural Language Toolkit to extract keywords:

  • Pros: flexible; lack of dependencies on external services;
  • Cons: you can only index the summary of the article, not the article; nontrivial: writing a high-quality keyword extraction tool is the project itself;

2: Use the Google Adwords API to get keyword suggestions from an article url:

  • Pros: high quality keywords; based on the entire text of the article; easy to use;
  • Cons: Not free (?); Request rate limits are unknown; I am afraid that my account is prohibited and will not be able to run advertising campaigns for my commercial sites.

Can anyone suggest any suggestions? My fears that my adwords account is blocked are unfounded?

+4
source share
2 answers

You can use the APIs with the excellent tags offered .

An example of using api through python http://www.michael-noll.com/projects/delicious-python-api/

Another alternative is Open Calais

+1
source

There are a number of free and commercial text annotation tools / services that you can consider depending on your specific needs, listed below:

Is there a better tool than OpenCalais? .

Some of them provide entities, some of them provide a measure of keyword relevance, and others provide topic tags.

+2
source

Source: https://habr.com/ru/post/1335796/


All Articles