The following url has numbers and tables, and I like to read the first two columns of the table. The xpatahSApply command works fine, but I need to set more than two attributes, and I can't figure it out.
url ="http://floodobservatory.colorado.edu/SiteDisplays/1544data.htm" doc=htmlTreeParse(url,useInternal=TRUE)
sample of analyzed data
<tr height="20" style="height:15.0pt"> <td height="20" class="xl6521398" align="right" style="height:15.0pt">11-Oct-13</td> <td class="xl7321398">1853</td> <td class="xl7321398"></td> <td class="xl8121398">0.80</td> <td class="xl7221398" align="right">4.87</td> <td class="xl1521398"></td> <td class="xl1521398"></td> <td class="xl1521398"></td> <td class="xl1521398"></td> <td class="xl1521398"></td> <td class="xl1521398"></td> <td class="xl7421398"></td> <td class="xl7421398"></td> <td class="xl7421398"></td> <td class="xl7421398"></td> <td class="xl9621398"></td> <td class="xl7421398"></td> <td class="xl8121398"></td> </tr>
I need to read values ​​from two cells, one corresponds to the date, and the other corresponds to the flow rate and has the following attributes
<td height="20" class="xl6521398" ...> and [<td class="xl7321398"..]
Compared to the samples above, I need to capture 11-Oct-13 and 1853.
I used the following commands to get the "dates" and "reset the stream."
dates=xpathSApply(doc,"//td[@class='xl6521398']",xmlValue) streamflowdischarge=xpathSApply(doc,"//td[@class='xl7321398']",xmlValue)
They successfully retrieved the information, but the retrieved values ​​consist of values ​​from other tables / cells, and the important “dates” and “flow rate” do not match.
dates [1:10] [1] "1-Jan-98" "2-Jan-98" "3-Jan-98" "31-Mar-98" "4-Jan-98" "30-Apr-98 "" January 5 - 98 "[8]" May 31-98 "" 6-Jan-98 "" June 30-98
"31-Mar-98" is between "3-Jan-98" and "4-Jan-98" - something unintentional
streamflowdischarge [1:10] [1] "3108" "3076" "3051" "3111" "3064" "3043" "3007" "3066" "378" ""
"3108" does not match "1-Jan-98" - can be checked at the URL
It looks like there are tables / cells with the same attributes that I don't want to read / capture. In this regard, it seems to me that I need to pass the entire attribute, i.e.
<td height="20" class="xl6521398" align="right" style="height:15.0pt">
in order to get the “date”, and somehow I have to establish that the “stream stream” from the same table is retrieved.
Great deals as well as affordable options.
I tried readHTMLTable but got an "index out of bounds" error
Thanks Satish