Hey Guys -
I'm having difficulty doing something so thought I'd post. Overall, I'm trying to monitor an RSS Feed and have multiple URLS (one per feed post) listed between specific phrases downloaded automatically as they appear in the feed. Below is where I've gotten before getting stuck and finally my questions...
I started by using the below two strings to save the RSS Feed into an XML.
Invoke-WebRequest -Uri 'http://site.com/feed' -OutFile e:\RSSParser\ParsedFeed.xml
$Content = Get-Content e:\RSSParser\ParsedFeed.xml
Although successful, I found that the actual download URLs that I'm trying to capture are not listed in the XML as individual / specific properties per post. They are in one of the properties, but surrounded by a lot of HTML code which isn't individually parsed. The good news is that the text before and after each post's URL is standard. Due to that, I decided to try to extract each URL by searching for text between two phrases. Below is a snippet of one of the URLs with the code I tried afterwards:
XML Snippet
<p class="uk-card uk-card-body uk-card-default uk-card-hover"><a href="http://mirror.math.princeton.edu/pub/ubuntu-iso/18.10/ubuntu-18.10-desktop-amd64.iso">Ubunti 18.10</a></p>
PowerShell Code(In addition to above code)
$pattern = 'uk-card uk-card-body uk-card-default uk-card-hover"><a href='(.*?)'">'
$result = [regex]::match($Content, $pattern).Groups[1].Value
The expected result of course would be for $result to be http://mirror.math.princeton.edu/pub/ubuntu-iso/18.10/ubuntu-18.10-desktop-amd64.iso. The on the line before the URL and the two charachters after it (">)are the same for each post of the feed so should work for all of them
Unfortunately, despite trying many variations of the $pattern string with quotes and double quotes; I get errors whenever its run.
Questions
- Given that the URL per post isn't in a dedicated XML property, am I on the right track or would something else be better?
- If on the right track, what do I need to change in my ending code to achieve the desired result?
- Once i get $result to work correctly, I still want to grab all URLs that match the criteria from the feed/XML and preferabily write them to a txt file (one per line).
- Each time the RSS feed updates, it will still contain the old posts if there's been any change at all and I obviously don't want to redownload anything. What do you think would be the best solution to this? Prior to appending the URL to the txt file, have it search to see if it's listed already?
Any suggestions are appreciated - Thanks!!
Ben K.