XPath to extract text after Br tags in R

How to extract text after br tags in the following lines:

 <div id='population'> The Snow Leopard Survival Strategy (McCarthy <em>et al.</em> 2003, Table II) compiled national snow leopard population estimates, updating the work of Fox (1994). Many of the estimates are acknowledged to be rough and out of date, but the total estimated population is 4,080-6,590, as follows:<br> <br> Afghanistan: 100-200?<br> Bhutan: 100-200?<br> China: 2,000-2,500<br> India: 200-600<br> Kazakhstan: 180-200<br> Kyrgyzstan: 150-500<br> Mongolia: 500-1,000<br> Nepal: 300-500<br> Pakistan: 200-420<br> Russia: 150-200<br> Tajikistan: 180-220<br> Uzbekistan: 20-50 </div> 

Reached:

 xpathSApply(h, '//div[@id="population"]', xmlValue) 

but I'm stuck now ...

+6
source share
1 answer

This helps if you understand that the text is a node. All text in a div and then <br/> can be obtained:

 //div[@id="population"]/text()[preceding-sibling::br] 

Technically, <br/> tags mean:

 //div[@id="population"]/text()[preceding-sibling::br and following-sibling::br] 

... but I think that is not what you want at this moment.

+18
source

Source: https://habr.com/ru/post/919278/


All Articles