Forum Discussion
pasquaale
Aug 15, 2023Copper Contributor
Retrieve content of an aspx page
Hey everyone,
how can we retrieve the content of our aspx pages. Specifically, we need to extract the text of each aspx page in Site Pages. Im not asking for a "save as html" or "press command+p" solution but an endpoint to retrieve the content of a specific aspx page.
Hi pasquaale,
are you talking about modern pages?
If you install "PnP Powershell" (https://pnp.github.io/powershell/) then you can use the "Export-PnPPage" commandExport-PnPPage -Identity Home.aspx -Out Home.xml
It will export the page including the text content into an XML file using the PnP Provisioning Schema (https://github.com/pnp/PnP-Provisioning-Schema)
Best Regards,
Sven
- pasquaaleCopper ContributorSvenSieverding Matthias_Glubrecht thank you very much, this will help me a lot!
- SvenSieverdingBronze Contributor
Hi pasquaale,
are you talking about modern pages?
If you install "PnP Powershell" (https://pnp.github.io/powershell/) then you can use the "Export-PnPPage" commandExport-PnPPage -Identity Home.aspx -Out Home.xml
It will export the page including the text content into an XML file using the PnP Provisioning Schema (https://github.com/pnp/PnP-Provisioning-Schema)
Best Regards,
Sven- pasquaaleCopper Contributor
Hi SvenSieverding,
I managed to retrieve the xml file with this command but unfortunately the text ist not in a desired format
i.e.:
<pnp:CanvasControlProperties> <pnp:CanvasControlProperty Key="Text" Value="<p>Inhaltsverzeichnis</p><p style="margin-left:40px;">1 - Mitarbeiterkürzel</p><p style="margin-left:40px;">2 - Abkürzungsverzeichnis Projektalltag</p><p style="margin-left:40px;">3 - Abkürzungsverzeichnis Bauwissen</p>" /> </pnp:CanvasControlProperties>
see the value. I need to retrieve just the value in plain text. Would there be another solution?
I tried the commandGet-PnPWikiPageContent
with my respective url but the command outputs nothing
- SvenSieverdingBronze Contributor
Hi pasquaale,
that value is XML escaped HTML<p>Inhaltsverzeichnis</p> <p style="margin-left:40px;">1 - Mitarbeiterkürzel</p> <p style="margin-left:40px;">2 - Abkürzungsverzeichnis Projektalltag</p> <p style="margin-left:40px;">3 - Abkürzungsverzeichnis Bauwissen</p>
you can use these command to convert it to html or text$value="<p>Inhaltsverzeichnis</p><p style="margin-left:40px;">1 - Mitarbeiterkürzel</p><p style="margin-left:40px;">2 - Abkürzungsverzeichnis Projektalltag</p><p style="margin-left:40px;">3 - Abkürzungsverzeichnis Bauwissen</p>" # Convert to html $html=$value -replace "<","<" -replace ">",">" -replace ""","""" # Extract text from html $text=$html -replace '<[^>]+>',''
Best Regards,
Sven