Forum Discussion

pasquaale's avatar
pasquaale
Copper Contributor
Aug 15, 2023
Solved

Retrieve content of an aspx page

Hey everyone,

how can we retrieve the content of our aspx pages. Specifically, we need to extract the text of each aspx page in Site Pages. Im not asking for a "save as html" or "press command+p" solution but an endpoint to retrieve the content of a specific aspx page. 

    • pasquaale's avatar
      pasquaale
      Copper Contributor

      Hi SvenSieverding,

       

      I managed to retrieve the xml file with this command but unfortunately the text ist not in a desired format

      i.e.:

                        <pnp:CanvasControlProperties>
                          <pnp:CanvasControlProperty Key="Text" Value="&lt;p&gt;Inhaltsverzeichnis&lt;/p&gt;&lt;p style=&quot;margin-left:40px;&quot;&gt;1 - Mitarbeiterkürzel&lt;/p&gt;&lt;p style=&quot;margin-left:40px;&quot;&gt;2 - Abkürzungsverzeichnis Projektalltag&lt;/p&gt;&lt;p style=&quot;margin-left:40px;&quot;&gt;3 - Abkürzungsverzeichnis Bauwissen&lt;/p&gt;" />
                        </pnp:CanvasControlProperties>

      see the value. I need to retrieve just the value in plain text. Would there be another solution?
      I tried the command

      Get-PnPWikiPageContent

        with my respective url but the command outputs nothing

      • SvenSieverding's avatar
        SvenSieverding
        Bronze Contributor

        Hi pasquaale,

        that value is XML escaped HTML

        <p>Inhaltsverzeichnis</p>
        <p style="margin-left:40px;">1 - Mitarbeiterkürzel</p>
        <p style="margin-left:40px;">2 - Abkürzungsverzeichnis Projektalltag</p>
        <p style="margin-left:40px;">3 - Abkürzungsverzeichnis Bauwissen</p>


        you can use these command to convert it to html or text

        $value="&lt;p&gt;Inhaltsverzeichnis&lt;/p&gt;&lt;p style=&quot;margin-left:40px;&quot;&gt;1 - Mitarbeiterkürzel&lt;/p&gt;&lt;p style=&quot;margin-left:40px;&quot;&gt;2 - Abkürzungsverzeichnis Projektalltag&lt;/p&gt;&lt;p style=&quot;margin-left:40px;&quot;&gt;3 - Abkürzungsverzeichnis Bauwissen&lt;/p&gt;"
        
        # Convert to html
        $html=$value -replace "&lt;","<"  -replace "&gt;",">"   -replace "&quot;",""""
        
        # Extract text from html
        $text=$html  -replace '<[^>]+>',''


        Best Regards,
        Sven

Resources