Forum Discussion
Retrieve content of an aspx page
- Aug 15, 2023
Hi pasquaale,
are you talking about modern pages?
If you install "PnP Powershell" (https://pnp.github.io/powershell/) then you can use the "Export-PnPPage" commandExport-PnPPage -Identity Home.aspx -Out Home.xml
It will export the page including the text content into an XML file using the PnP Provisioning Schema (https://github.com/pnp/PnP-Provisioning-Schema)
Best Regards,
Sven
Hi SvenSieverding,
I managed to retrieve the xml file with this command but unfortunately the text ist not in a desired format
i.e.:
<pnp:CanvasControlProperties>
<pnp:CanvasControlProperty Key="Text" Value="<p>Inhaltsverzeichnis</p><p style="margin-left:40px;">1 - Mitarbeiterkürzel</p><p style="margin-left:40px;">2 - Abkürzungsverzeichnis Projektalltag</p><p style="margin-left:40px;">3 - Abkürzungsverzeichnis Bauwissen</p>" />
</pnp:CanvasControlProperties>see the value. I need to retrieve just the value in plain text. Would there be another solution?
I tried the command
Get-PnPWikiPageContentwith my respective url but the command outputs nothing
Hi pasquaale,
SvenSieverding pointed you in the right direction.
Once you have the xml file, you can easily proceed to extract the desired text from it.
The first step would be to load the contents of the xml file to a variable like so:
$data = [xml](Get-Content Home.xml -Encoding UTF8)
Then you can access the node that contains your text:
$textContents = $data.DocumentElement.SelectSingleNode("//*[@Key='Text']").Value
Now, depending on what is on your page, this can be plain text or HTML. In the latter case, you can for example remove the HTML tags like this:
$textContents -replace '<[^>]+>',' '
You might have to experiment a little with your specific pages to get a satisfactory result.
Hope this helps,
Matthias
P.S: Get-PnPWikiPageContent only works for Wiki pages, not for modern pages.