I am writing an RSS reader in python as a training exercise, and I would really like to be able to mark individual entries with search keywords. Unfortunately, most real-world channels do not include keyword metadata. Currently, I have about 60,000 entries in a test database of approximately 600 channels, so manual tagging will not be effective. So far I have managed to find only two solutions:
1: Use the Natural Language Toolkit to extract keywords:
- Pros: flexible; lack of dependencies on external services;
- Cons: you can only index the summary of the article, not the article; nontrivial: writing a high-quality keyword extraction tool is the project itself;
2: Use the Google Adwords API to get keyword suggestions from an article url:
- Pros: high quality keywords; based on the entire text of the article; easy to use;
- Cons: Not free (?); Request rate limits are unknown; I am afraid that my account is prohibited and will not be able to run advertising campaigns for my commercial sites.
Can anyone suggest any suggestions? My fears that my adwords account is blocked are unfounded?
source share