Python: extracting hashtags from a text file

Question

Python: extracting hashtags from a text file

So, I wrote the code below to extract the hashtags as well as the tags using "@", and then add them to the list and sort them in descending order. The fact is that the text cannot be completely formatted and has no spaces between each individual hashtag, and the following problem may occur: since this can be checked using the #print operator inside the for loop: # Socality # thisismycommunity # themoderndayexplorer # modernoutdoors # mountaincultureelevated

So the .split () method is not relevant to these. What would be the best practice on this?

Here is the .txt file

Grateful for your time.

name = input("Enter file:")
if len(name) < 1 : name = "tags.txt"
handle = open(name)
tags = dict()
lst = list()

for line in handle :
    hline = line.split()
    for word in hline:
        if word.startswith('@') : tags[word] = tags.get(word,0) + 1
        else :
            tags[word] = tags.get(word,0) + 1
        #print(word)

for k,v in tags.items() :
    tags_order = (v,k)
    lst.append(tags_order)

lst = sorted(lst, reverse=True)[:34]
print('Final Dictionary: ' , '\n')
for v,k in lst :
    print(k , v, '')

+4

python hashtag

Rui torres Feb 05 '18 at 22:13

source share

1 answer

usr2564301 · Accepted Answer · 2018-02-05T22:30:56+0000

. ; # @, .

import re
tags = []
with open('../Downloads/tags.txt','Ur') as file:
    for line in f.readline():
        tags += re.findall(r'[#@][^\s#@]+', line)

. ; , tags, , .

:

[#@] - #, @
[^\s#@]+ - (\s , , ), # @; , .

, findall , , "" .

findall , , :

for tag in re.findall(r'[#@][^\s#@]+', line):
    # process "tag" any way you want here

\r\n Windows, Mac . , Python.

Python: extracting hashtags from a text file

More articles: