Remove utm_ * parameters from url in Python

Question

Remove utm_ * parameters from url in Python

I am trying to remove all utm_ * parameters from a list of urls. The closest I found is this: https://gist.github.com/626834 .

Any ideas?

+6

python

Kostas di Jul 24 '12 at 22:42

source share

4 answers

Jon clements · Answer 1 · 2012-07-24T23:03:47+0000

It is a little long, but uses url * modules and avoids errors.

from urllib import urlencode from urlparse import urlparse, parse_qs, urlunparse url = 'http://whatever.com/somepage?utm_one=3&something=4&utm_two=5&utm_blank&something_else' parsed = urlparse(url) qd = parse_qs(parsed.query, keep_blank_values=True) filtered = dict( (k, v) for k, v in qd.iteritems() if not k.startswith('utm_')) newurl = urlunparse([ parsed.scheme, parsed.netloc, parsed.path, parsed.params, urlencode(filtered, doseq=True), # query string parsed.fragment ]) print newurl # 'http://whatever.com/somepage?something=4&something_else'

jadkik94 · Answer 2 · 2012-07-24T23:12:11+0000

Simple and works, and based on the link you sent, BUT it is again ... therefore, not sure if it will not break for some reason that I can’t think of :)

 import re def trim_utm(url): if "utm_" not in url: return url matches = re.findall('(.+\?)([^#]*)(.*)', url) if len(matches) == 0: return url match = matches[0] query = match[1] sanitized_query = '&'.join([p for p in query.split('&') if not p.startswith('utm_')]) return match[0]+sanitized_query+match[2] if __name__ == "__main__": tests = [ "http://localhost/index.php?a=1&utm_source=1&b=2", "http://localhost/index.php?a=1&utm_source=1&b=2#hash", "http://localhost/index.php?a=1&utm_source=1&b=2&utm_something=no#hash", "http://localhost/index.php?a=1&utm_source=1&utm_a=yes&b=2#hash", "http://localhost/index.php?utm_a=a", "http://localhost/index.php?a=utm_a", "http://localhost/index.php?a=1&b=2", "http://localhost/index.php", "http://localhost/index.php#hash2" ] for t in tests: trimmed = trim_utm(t) print t print trimmed print

mVChr · Answer 3 · 2012-07-24T23:00:08+0000

 import re from urlparse import urlparse, urlunparse url = 'http://www.someurl.com/page.html?foo=bar&utm_medium=qux&baz=qoo' parsed_url = list(urlparse(url)) parsed_url[4] = '&'.join( [x for x in parsed_url[4].split('&') if not re.match(r'utm_', x)]) utmless_url = urlunparse(parsed_url) print utmless_url # 'http://www.someurl.com/page.html?foo=bar&baz=qoo'

Adders · Answer 4 · 2017-10-29T09:03:46+0000

How about this one. Nice and easy:

 url = 'https://searchengineland.com/amazon-q3-ad-revenues-surpass-1-billion-roughly-2x-early-2016-285763?utm_source=feedburner&utm_medium=feed&utm_campaign=feed-main' print url[:url.find('?utm')] https://searchengineland.com/amazon-q3-ad-revenues-surpass-1-billion-roughly-2x-early-2016-285763

Remove utm_ * parameters from url in Python

More articles: