SOLVED

Migrating HTML to Modern Pages - Looks great until page is edited, then content is lost

Highlighted
Occasional Contributor

We are migrating HTML to modern pages using code like:

 

var page = context.Web.AddClientSidePage(pageName, true);
page.PageTitle = "some title";
string htmlContent = "<some html content from another system>";
ClientSideText textWebPart = new ClientSideText() { Text = htmlContent };
page.AddControl(textWebPart, -1);
page.Save(pageName);
page.Publish();

 

This actually works really well.  The HTML comes over just fine until someone edits the page.  All you have to do is edit the page from the SharePoint UI and immediately publish it and some content/markup gets removed.  I have seen entire HTML tables removed and inline styling removed.  Other content is just fine and remains untouched (including <img> tags).

 

I thought that the PnP team was working on ways to migrate things like wiki pages to modern pages. If so, they must be dealing with some of the same issues.  Are we taking the wrong approach?

 

cc: @Vesa Juvonen

14 Replies
Highlighted
Haven't tested this, but I suspect that it is down to some html not being supported by the modern editor, or because some content is probably considered unsafe and it's being stripped out.
May be worth trying to add the same content via the classic UI and see if it works in the same way.
By adding it via code, you could potentially be avoiding some client side sanitizing that is applied before saving it?
Highlighted
Perhaps some content cleaning on client side that you avoid when using c#?
Highlighted
Sorry the double reply...got a message saying that first one failed...
Highlighted

Yes, it that is certainly the case.

 

I am finding that if I change:

 

<div class="{custom-class">

<table {stuff1}>

{stuff2}

</table>

</div>

 

to:

 

<div class="canvasRteResponsiveTable">

<div class="tableWrapper">

<table title="Table" {stuff1}>

{stuff2}

</table>

</div>

</div>

 

That my tables are now preserved.

 

I have also found Bert Jansen has created the following which may help: https://github.com/SharePoint/PnP-Tools/blob/master/Solutions/SharePoint.Modernization/SharePointPnP...

 

I'd really like to know what other edges he has found...

Highlighted

The pnp page transformation engine does handle the transformation from wiki html to modern text editor compliant html. As you've noted you can simply assign html and it will show, but in order to edit the html it must be compliant with what the text editor supports.

 

See https://github.com/SharePoint/PnP-Tools/blob/master/Solutions/SharePoint.Modernization/SharePointPnP... for a starting basis for your own transformator.

Highlighted

Thanks, @Bert Jansen.  I was unable to @ mention you before for some reason...

 

I had found the HtmlTransformator.cs so it's good to know I was moving along the right path.  Do you have a list of gotchas that the text editor does not handle.  I can infer this from the code, but some of it does not seem to be a problem for me.  For example, I don't appear to need a <P> just before an image.  The biggest one so far has been <table> elements.  That and the fact that a bunch of inline css is just plain stripped which I can't do much about other than find workarounds using the techniques the current editor uses.

 

You clearly have a lot of knowledge here so I'll take any other information you can provide.

 

Thanks!

Kirk

Highlighted
Best Response confirmed by Kirk Liemohn (Occasional Contributor)
Solution

The easiest way to understand what's valid HTML is to create a piece of text using all the layout and formatting options you need. Once you've that you can grab the page list item and look at the canvascontent1 field to obtain the generated HTML. You'll see that only a limited number of styles are supported and fixed set of classes for color and size information. Anything else you use can be initially displayed but will be lost during edit.

Highlighted

Thanks, that helps.  This will be tedious.

Highlighted

This is an older thread, but if anyone is still here, there seems to be a bug in SharePoint Modern Pages where table content is added through the UI.  We added a small table of content to a page, saved it, published it, then when we went back to edit the content disappeared.

 

In other words, it might not just be a valid/invalid HTML thing. There might be a bigger issue here.

Highlighted

If you are using a "<table>" element, then you need to add wrap it with additional divs that have "canvasRteResponsiveTable" and "tableWrapper" classes.  See my comment further above for more details.

 

Highlighted

Thanks for the reply!  We are editing the content using the WYSIWYG editor, not through HTML, so adding an additional div is not relevant in this particular case.  Whatever HTML is being generated by the OOTB editor is not able to be rendered in edit mode.  As a result, switching to edit mode and saving the page with no changes will result in lost content.

Highlighted

This looks like a bug, tables created using the editor should always stay editable by the editor. Can you open a support case for this? Alternative describe a repro in here and I'll get the right folks to see it.

Highlighted

I agree that it is a bug.  You clearly said it was added through the UI, but I was missed that before.

Highlighted

Ugh, I was not able to reproduce this on a brand new page, so perhaps it's a combination of the web parts that are already on the page or perhaps it's the combination of styles that I chose at the time.  Pure speculation: maybe selecting multiple table cells and choosing a font size somehow caused bad HTML to be generated or something.

 

I copied the rendered content (not the code) from published page using ctrl+c, then used ctrl+v to paste the content back into the text webpart.  I am now able to successfully edit the page.  Weird.

 

Not quite reproduction steps, but additional info:

- the page is an intranet homepage, so it has a powerapp part, a yammer part, some links and text parts, etc.

- the text content I was struggling with was a Heading, a 4x4 HTML table, and some paragraph text.  The 4x4 HTML table had some font formatting all done through the UI, but I remember struggling with the interface to get the colors to stick.  For example, I would select some text, change it to Red, then select some OTHER text and was not able to change the color to Green until I clicked outside the editor box and let the page auto-save.

- I was able to reproduce the "disappearing content" bug consistently by restoring the version of the page

 

Thanks for the input in this thread! I agree I should open a support case.