Forum Discussion
Retrieve content of an aspx page
- Aug 15, 2023
Hi pasquaale,
are you talking about modern pages?
If you install "PnP Powershell" (https://pnp.github.io/powershell/) then you can use the "Export-PnPPage" commandExport-PnPPage -Identity Home.aspx -Out Home.xml
It will export the page including the text content into an XML file using the PnP Provisioning Schema (https://github.com/pnp/PnP-Provisioning-Schema)
Best Regards,
Sven
Hi pasquaale,
are you talking about modern pages?
If you install "PnP Powershell" (https://pnp.github.io/powershell/) then you can use the "Export-PnPPage" command
Export-PnPPage -Identity Home.aspx -Out Home.xml
It will export the page including the text content into an XML file using the PnP Provisioning Schema (https://github.com/pnp/PnP-Provisioning-Schema)
Best Regards,
Sven
- pasquaaleAug 16, 2023Copper Contributor
Hi SvenSieverding,
I managed to retrieve the xml file with this command but unfortunately the text ist not in a desired format
i.e.:
<pnp:CanvasControlProperties> <pnp:CanvasControlProperty Key="Text" Value="<p>Inhaltsverzeichnis</p><p style="margin-left:40px;">1 - Mitarbeiterkürzel</p><p style="margin-left:40px;">2 - Abkürzungsverzeichnis Projektalltag</p><p style="margin-left:40px;">3 - Abkürzungsverzeichnis Bauwissen</p>" /> </pnp:CanvasControlProperties>see the value. I need to retrieve just the value in plain text. Would there be another solution?
I tried the commandGet-PnPWikiPageContentwith my respective url but the command outputs nothing
- SvenSieverdingAug 16, 2023Bronze Contributor
Hi pasquaale,
that value is XML escaped HTML<p>Inhaltsverzeichnis</p> <p style="margin-left:40px;">1 - Mitarbeiterkürzel</p> <p style="margin-left:40px;">2 - Abkürzungsverzeichnis Projektalltag</p> <p style="margin-left:40px;">3 - Abkürzungsverzeichnis Bauwissen</p>
you can use these command to convert it to html or text$value="<p>Inhaltsverzeichnis</p><p style="margin-left:40px;">1 - Mitarbeiterkürzel</p><p style="margin-left:40px;">2 - Abkürzungsverzeichnis Projektalltag</p><p style="margin-left:40px;">3 - Abkürzungsverzeichnis Bauwissen</p>" # Convert to html $html=$value -replace "<","<" -replace ">",">" -replace ""","""" # Extract text from html $text=$html -replace '<[^>]+>',''
Best Regards,
Sven - Matthias_GlubrechtAug 16, 2023
Microsoft
Hi pasquaale,
SvenSieverding pointed you in the right direction.
Once you have the xml file, you can easily proceed to extract the desired text from it.
The first step would be to load the contents of the xml file to a variable like so:
$data = [xml](Get-Content Home.xml -Encoding UTF8)Then you can access the node that contains your text:
$textContents = $data.DocumentElement.SelectSingleNode("//*[@Key='Text']").ValueNow, depending on what is on your page, this can be plain text or HTML. In the latter case, you can for example remove the HTML tags like this:
$textContents -replace '<[^>]+>',' 'You might have to experiment a little with your specific pages to get a satisfactory result.
Hope this helps,
Matthias
P.S: Get-PnPWikiPageContent only works for Wiki pages, not for modern pages.