Forum Discussion
pasquaale
Aug 15, 2023Copper Contributor
Retrieve content of an aspx page
Hey everyone, how can we retrieve the content of our aspx pages. Specifically, we need to extract the text of each aspx page in Site Pages. Im not asking for a "save as html" or "press command+p" so...
- Aug 15, 2023
Hi pasquaale,
are you talking about modern pages?
If you install "PnP Powershell" (https://pnp.github.io/powershell/) then you can use the "Export-PnPPage" commandExport-PnPPage -Identity Home.aspx -Out Home.xml
It will export the page including the text content into an XML file using the PnP Provisioning Schema (https://github.com/pnp/PnP-Provisioning-Schema)
Best Regards,
Sven
pasquaale
Aug 16, 2023Copper Contributor
Hi SvenSieverding,
I managed to retrieve the xml file with this command but unfortunately the text ist not in a desired format
i.e.:
<pnp:CanvasControlProperties>
<pnp:CanvasControlProperty Key="Text" Value="<p>Inhaltsverzeichnis</p><p style="margin-left:40px;">1 - Mitarbeiterkürzel</p><p style="margin-left:40px;">2 - Abkürzungsverzeichnis Projektalltag</p><p style="margin-left:40px;">3 - Abkürzungsverzeichnis Bauwissen</p>" />
</pnp:CanvasControlProperties>see the value. I need to retrieve just the value in plain text. Would there be another solution?
I tried the command
Get-PnPWikiPageContentwith my respective url but the command outputs nothing
SvenSieverding
Aug 16, 2023Bronze Contributor
Hi pasquaale,
that value is XML escaped HTML
<p>Inhaltsverzeichnis</p>
<p style="margin-left:40px;">1 - Mitarbeiterkürzel</p>
<p style="margin-left:40px;">2 - Abkürzungsverzeichnis Projektalltag</p>
<p style="margin-left:40px;">3 - Abkürzungsverzeichnis Bauwissen</p>
you can use these command to convert it to html or text
$value="<p>Inhaltsverzeichnis</p><p style="margin-left:40px;">1 - Mitarbeiterkürzel</p><p style="margin-left:40px;">2 - Abkürzungsverzeichnis Projektalltag</p><p style="margin-left:40px;">3 - Abkürzungsverzeichnis Bauwissen</p>"
# Convert to html
$html=$value -replace "<","<" -replace ">",">" -replace """,""""
# Extract text from html
$text=$html -replace '<[^>]+>',''
Best Regards,
Sven