For a nonprofit private school project, I am creating a piece of software that will search for lyrics based on the song that is currently playing on Spotify. I have to do this in C # (requirement), but I can use other languages ββif I want.
I found several sites that I can use to extract text. I already managed to get all the html code, but after that I am not sure what to do. I asked my teacher, she told me to use XML (which I also found difficult: p), so I read a little about it and looked for examples, but did not find anything that seems to be applicable to my case.
Time for some code.
Say I wanted to take text from musixmatch.com:
(Changes for humans) HTML:
<span data-reactid="199"> <p class="mxm-lyrics__content" data-reactid="200">First line of the lyrics! These words will never be ignored I don't want a battle </p> <div data-reactid="202"> <div class="inline_video_ad_container_container" data-reactid="203"> <div id="inline_video_ad_container" data-reactid="204"> <div class="" style="line-height:0;" data-reactid="205"> <div id="div_gpt_ad_outofpage_musixmatch_desktop_lyrics" data-reactid="206"> <script type="text/javascript"> </script> </div> </div> </div> </div> <p class="mxm-lyrics__content" data-reactid="207">But I got a war More fancy lyrics And lines That I want to fetch And display Tralala lala Trouble! </p> </div> </span>
Note that the first three lines of text are at the top, and the rest are at the bottom <p> . Also note that the two <p> tags have the same class. The full html source can be found here: view-source:https://www.musixmatch.com/lyrics/Bullet-for-My-Valentine/You-Want-a-Battle-Here%E2%80%99s-a-War Line 97 begins the fragment.
So, in this particular example, there are lyrics, and for me there is very little code that I do not need. So far I have tried to extract the html code with the following C #:
string source = "https://www.musixmatch.com/lyrics/Bullet-for-My-Valentine/You-Want-a-Battle-Here's-a-War";
Capturing all html works, but fetching fails. I am stuck in extracting text from html. . Since the lyrics for this page are not in the ID tag, I canβt just use GetElementbyId . Can someone point me in the right direction? I want to support multiple sites, so I have to do this several times for different sites.