Removing a template in python / php

Are there existing template extraction libraries in python or php? Perl has Template :: Extract , but I could not find a similar implementation in python or php.

The only thing I can find in python is TemplateMaker ( http://code.google.com/p/templatemaker/ ), but it is not really a library for extracting templates.

+3
source share
3 answers

After I forked out, I found a solution for exactly what I was looking for. filippo posted a list of python solutions to clean the screen in this post: Options for cleaning HTML files , including the scrapemark package ( http://arshaw.com/scrapemark/ ).

Hope this helps anyone looking for the same solution.

+2
source

TmeplateMakerreally does what you need, at least according to its documentation. Instead of receiving the template as input, it outputs ("learns") if from multiple documents. It then has a method extractfor extracting data from other documents that were created using this template.

This example shows:

# Now that we have a template, let extract some data.
>>> t.extract('<b>red and green</b>')
('red', 'green')
>>> t.extract('<b>django and stephane</b>')
('django', 'stephane')

# The extract() method is very literal. It doesn't magically trim
# whitespace, nor does it have any knowledge of markup languages such as
# HTML.
>>> t.extract('<b>  spacy  and <u>underlined</u></b>')
('  spacy ', '<u>underlined</u>')

# The extract() method will raise the NoMatch exception if the data
# doesn't match the template. In this example, the data doesn't have the
# leading and trailing "<b>" tags.
>>> t.extract('this and that')
Traceback (most recent call last):
...

, , , :

  • , - .
  • inferred .

, , Perl Template::Extract, , - .

+1

TemplateMaker http://www.holovaty.com/writing/templatemaker/

, .

- , ( ), lxml.html BeautifulSoup, python.

0

Source: https://habr.com/ru/post/1730332/


All Articles