Forum Discussion

Kirk Liemohn's avatar
Kirk Liemohn
Brass Contributor
Aug 01, 2018
Solved

Migrating HTML to Modern Pages - Looks great until page is edited, then content is lost

We are migrating HTML to modern pages using code like:

 

var page = context.Web.AddClientSidePage(pageName, true);
page.PageTitle = "some title";
string htmlContent = "<some html content from another system>";
ClientSideText textWebPart = new ClientSideText() { Text = htmlContent };
page.AddControl(textWebPart, -1);
page.Save(pageName);
page.Publish();

 

This actually works really well.  The HTML comes over just fine until someone edits the page.  All you have to do is edit the page from the SharePoint UI and immediately publish it and some content/markup gets removed.  I have seen entire HTML tables removed and inline styling removed.  Other content is just fine and remains untouched (including <img> tags).

 

I thought that the PnP team was working on ways to migrate things like wiki pages to modern pages. If so, they must be dealing with some of the same issues.  Are we taking the wrong approach?

 

cc: VesaJuvonen

  • BertJansen's avatar
    BertJansen
    Aug 02, 2018

    The easiest way to understand what's valid HTML is to create a piece of text using all the layout and formatting options you need. Once you've that you can grab the page list item and look at the canvascontent1 field to obtain the generated HTML. You'll see that only a limited number of styles are supported and fixed set of classes for color and size information. Anything else you use can be initially displayed but will be lost during edit.

19 Replies

  • Warwick Ward's avatar
    Warwick Ward
    Bronze Contributor

    I'm facing the same issues now utilising PnP.Powershell to import html into text components.

     

    Transformation Framework currently only supports SharePoint Classic > SharePoint Modern pages.

     

    Would be good to know the HTML transformation it performs, is there a reference?

     

    *** Update looks like this is the transformation code

    https://github.com/pnp/modernization/blob/master/Tools/SharePoint.Modernization/SharePointPnP.Modernization.Framework/Transform/HtmlTransformator.cs

     

    Images

    - <img src=""> = is stripped (after import and Edit+Publish)

     

    HyperLinks

    - <a href=""></a> = is stripped (after import and Edit+Publish)

    - [[LINK | URL ]] = is transform (after import and Edit+Publish)

  • Haven't tested this, but I suspect that it is down to some html not being supported by the modern editor, or because some content is probably considered unsafe and it's being stripped out.
    May be worth trying to add the same content via the classic UI and see if it works in the same way.
    By adding it via code, you could potentially be avoiding some client side sanitizing that is applied before saving it?
    • Kirk Liemohn's avatar
      Kirk Liemohn
      Brass Contributor

      Yes, it that is certainly the case.

       

      I am finding that if I change:

       

      <div class="{custom-class">

      <table {stuff1}>

      {stuff2}

      </table>

      </div>

       

      to:

       

      <div class="canvasRteResponsiveTable">

      <div class="tableWrapper">

      <table title="Table" {stuff1}>

      {stuff2}

      </table>

      </div>

      </div>

       

      That my tables are now preserved.

       

      I have also found Bert Jansen has created the following which may help: https://github.com/SharePoint/PnP-Tools/blob/master/Solutions/SharePoint.Modernization/SharePointPnP.Modernization.Framework/Transform/HtmlTransformator.cs

       

      I'd really like to know what other edges he has found...

Resources