Forum Discussion
Migrating HTML to Modern Pages - Looks great until page is edited, then content is lost
We are migrating HTML to modern pages using code like:
var page = context.Web.AddClientSidePage(pageName, true);
page.PageTitle = "some title";
string htmlContent = "<some html content from another system>";
ClientSideText textWebPart = new ClientSideText() { Text = htmlContent };
page.AddControl(textWebPart, -1);
page.Save(pageName);
page.Publish();
This actually works really well. The HTML comes over just fine until someone edits the page. All you have to do is edit the page from the SharePoint UI and immediately publish it and some content/markup gets removed. I have seen entire HTML tables removed and inline styling removed. Other content is just fine and remains untouched (including <img> tags).
I thought that the PnP team was working on ways to migrate things like wiki pages to modern pages. If so, they must be dealing with some of the same issues. Are we taking the wrong approach?
cc: VesaJuvonen
The easiest way to understand what's valid HTML is to create a piece of text using all the layout and formatting options you need. Once you've that you can grab the page list item and look at the canvascontent1 field to obtain the generated HTML. You'll see that only a limited number of styles are supported and fixed set of classes for color and size information. Anything else you use can be initially displayed but will be lost during edit.
19 Replies
- Warwick WardBronze Contributor
I'm facing the same issues now utilising PnP.Powershell to import html into text components.
Transformation Framework currently only supports SharePoint Classic > SharePoint Modern pages.
Would be good to know the HTML transformation it performs, is there a reference?
*** Update looks like this is the transformation code
https://github.com/pnp/modernization/blob/master/Tools/SharePoint.Modernization/SharePointPnP.Modernization.Framework/Transform/HtmlTransformator.cs
Images
- <img src=""> = is stripped (after import and Edit+Publish)
HyperLinks
- <a href=""></a> = is stripped (after import and Edit+Publish)
- [[LINK | URL ]] = is transform (after import and Edit+Publish)
- BertJansen
Microsoft
Here's the latest code (https://github.com/pnp/pnpframework/blob/dev/src/lib/PnP.Framework/Modernization/Transform/HtmlTransformator.cs) which should be fairly similar. When it comes to inline images: these were never fully supported by the Page Transformation tech, during transformation the default behavior was to create the images as separate image web parts and split the html in parts so the image web parts could fit in between. Now that inline images are supported on modern pages Page Transformation should get updated to support those, that's a two step approach. First the underlying pages API implementation needs to be updated and that's done (see https://pnp.github.io/pnpcore/using-the-sdk/pages-webparts.html#using-inline-images-in-text-parts). Next is updating the page transformation bits themselves, but that work is still pending.
- Sorry the double reply...got a message saying that first one failed...
- Perhaps some content cleaning on client side that you avoid when using c#?
- Haven't tested this, but I suspect that it is down to some html not being supported by the modern editor, or because some content is probably considered unsafe and it's being stripped out.
May be worth trying to add the same content via the classic UI and see if it works in the same way.
By adding it via code, you could potentially be avoiding some client side sanitizing that is applied before saving it?- Kirk LiemohnBrass Contributor
Yes, it that is certainly the case.
I am finding that if I change:
<div class="{custom-class">
<table {stuff1}>
{stuff2}
</table>
</div>
to:
<div class="canvasRteResponsiveTable">
<div class="tableWrapper">
<table title="Table" {stuff1}>
{stuff2}
</table>
</div>
</div>
That my tables are now preserved.
I have also found Bert Jansen has created the following which may help: https://github.com/SharePoint/PnP-Tools/blob/master/Solutions/SharePoint.Modernization/SharePointPnP.Modernization.Framework/Transform/HtmlTransformator.cs
I'd really like to know what other edges he has found...
- BertJansen
Microsoft
The pnp page transformation engine does handle the transformation from wiki html to modern text editor compliant html. As you've noted you can simply assign html and it will show, but in order to edit the html it must be compliant with what the text editor supports.
See https://github.com/SharePoint/PnP-Tools/blob/master/Solutions/SharePoint.Modernization/SharePointPnP.Modernization.Framework/Transform/HtmlTransformator.cs for a starting basis for your own transformator.