Parsing using templates

Requirements: I have a python project that parses data feeds from several sources in different formats (Atom, valid XML, invalid XML, csv, near-garbage, etc.) and inserts the received data into the database. A trap is the information necessary for parsing each channel, and must also be stored in a database.

Current solution: My previous solution was to store small python scripts that are calculated from raw data and return a data object for the analyzed data. I would really like to get away from this method, since it obviously opens an unpleasant security hole.

Ideal solution: What I am looking for is what I would describe as a syntax syntax agent for python, so that I can write a template file for each of the feed formats, and this template file will be used to understand various data formats.

I had limited success in finding something similar in the past, and I was hoping someone might have a good suggestion.

Thanks everyone!

+3
source share
1 answer

eval ing, , ? CSV - - , XML . , , , API ? , Python ( - DSL) .

, , - script :

:

...
def import_plugin(name):
    mod = __import__(name)
    components = name.split('.')
    for comp in components[1:]:
        mod = getattr(mod, comp)
    return mod

...
feed_parser = import_plugin('parsers.%s' % feed['format'])
data = feed_parser(...)
...

parsers/csv.py:

#!/usr/bin/python
from __future__ import absolute_import

import urllib2
import csv

def parse_feed(...):
    ...

, , , (, " " ).

class BaseParser(object):
    ...

class CSVParser(BaseParser):
    ...
register_feed_parser(CSVParser, ['text/plain', 'text/csv'])
...

parsers = get_registered_feed_parsers(feed['mime_type'])
data = None
for parser in parsers:
    try:
        data = parser(feed['data'])
        if data is not None: break
    except ParsingError:
        pass
...
+1

Source: https://habr.com/ru/post/1711091/


All Articles