Aug 01 2018 07:00 AM
We are migrating HTML to modern pages using code like:
var page = context.Web.AddClientSidePage(pageName, true);
page.PageTitle = "some title";
string htmlContent = "<some html content from another system>";
ClientSideText textWebPart = new ClientSideText() { Text = htmlContent };
page.AddControl(textWebPart, -1);
page.Save(pageName);
page.Publish();
This actually works really well. The HTML comes over just fine until someone edits the page. All you have to do is edit the page from the SharePoint UI and immediately publish it and some content/markup gets removed. I have seen entire HTML tables removed and inline styling removed. Other content is just fine and remains untouched (including <img> tags).
I thought that the PnP team was working on ways to migrate things like wiki pages to modern pages. If so, they must be dealing with some of the same issues. Are we taking the wrong approach?
cc: @VesaJuvonen
Aug 01 2018 10:30 AM
Aug 01 2018 10:32 AM
Aug 01 2018 10:33 AM
Aug 01 2018 11:00 AM
Yes, it that is certainly the case.
I am finding that if I change:
<div class="{custom-class">
<table {stuff1}>
{stuff2}
</table>
</div>
to:
<div class="canvasRteResponsiveTable">
<div class="tableWrapper">
<table title="Table" {stuff1}>
{stuff2}
</table>
</div>
</div>
That my tables are now preserved.
I have also found Bert Jansen has created the following which may help: https://github.com/SharePoint/PnP-Tools/blob/master/Solutions/SharePoint.Modernization/SharePointPnP...
I'd really like to know what other edges he has found...
Aug 01 2018 01:26 PM
The pnp page transformation engine does handle the transformation from wiki html to modern text editor compliant html. As you've noted you can simply assign html and it will show, but in order to edit the html it must be compliant with what the text editor supports.
See https://github.com/SharePoint/PnP-Tools/blob/master/Solutions/SharePoint.Modernization/SharePointPnP... for a starting basis for your own transformator.
Aug 01 2018 05:56 PM
Thanks, @BertJansen. I was unable to @ mention you before for some reason...
I had found the HtmlTransformator.cs so it's good to know I was moving along the right path. Do you have a list of gotchas that the text editor does not handle. I can infer this from the code, but some of it does not seem to be a problem for me. For example, I don't appear to need a <P> just before an image. The biggest one so far has been <table> elements. That and the fact that a bunch of inline css is just plain stripped which I can't do much about other than find workarounds using the techniques the current editor uses.
You clearly have a lot of knowledge here so I'll take any other information you can provide.
Thanks!
Kirk
Aug 01 2018 11:24 PM
SolutionThe easiest way to understand what's valid HTML is to create a piece of text using all the layout and formatting options you need. Once you've that you can grab the page list item and look at the canvascontent1 field to obtain the generated HTML. You'll see that only a limited number of styles are supported and fixed set of classes for color and size information. Anything else you use can be initially displayed but will be lost during edit.
Aug 02 2018 04:16 AM
Thanks, that helps. This will be tedious.
Oct 18 2018 09:48 AM
This is an older thread, but if anyone is still here, there seems to be a bug in SharePoint Modern Pages where table content is added through the UI. We added a small table of content to a page, saved it, published it, then when we went back to edit the content disappeared.
In other words, it might not just be a valid/invalid HTML thing. There might be a bigger issue here.
Oct 18 2018 10:02 AM
If you are using a "<table>" element, then you need to add wrap it with additional divs that have "canvasRteResponsiveTable" and "tableWrapper" classes. See my comment further above for more details.
Oct 18 2018 10:56 AM
Thanks for the reply! We are editing the content using the WYSIWYG editor, not through HTML, so adding an additional div is not relevant in this particular case. Whatever HTML is being generated by the OOTB editor is not able to be rendered in edit mode. As a result, switching to edit mode and saving the page with no changes will result in lost content.
Oct 18 2018 11:38 PM
This looks like a bug, tables created using the editor should always stay editable by the editor. Can you open a support case for this? Alternative describe a repro in here and I'll get the right folks to see it.
Oct 19 2018 02:11 AM
I agree that it is a bug. You clearly said it was added through the UI, but I was missed that before.
Oct 19 2018 06:00 AM
Ugh, I was not able to reproduce this on a brand new page, so perhaps it's a combination of the web parts that are already on the page or perhaps it's the combination of styles that I chose at the time. Pure speculation: maybe selecting multiple table cells and choosing a font size somehow caused bad HTML to be generated or something.
I copied the rendered content (not the code) from published page using ctrl+c, then used ctrl+v to paste the content back into the text webpart. I am now able to successfully edit the page. Weird.
Not quite reproduction steps, but additional info:
- the page is an intranet homepage, so it has a powerapp part, a yammer part, some links and text parts, etc.
- the text content I was struggling with was a Heading, a 4x4 HTML table, and some paragraph text. The 4x4 HTML table had some font formatting all done through the UI, but I remember struggling with the interface to get the colors to stick. For example, I would select some text, change it to Red, then select some OTHER text and was not able to change the color to Green until I clicked outside the editor box and let the page auto-save.
- I was able to reproduce the "disappearing content" bug consistently by restoring the version of the page
Thanks for the input in this thread! I agree I should open a support case.
Feb 10 2022 09:03 PM
I am getting the similar issue with SharePoint Online - migrated HTML content is lost while editing and saving the page. I can roll back to the previous version and get to the content. I think some security feature is blocking the content which is migrated programmatically through a script.
It appears to be a bug in the product or a security feature that sanitize the content after each edit and save but interestingly this is not an issue at first when the page is getting created programmatically and MS SPO is allowing the paged to be saved and published. Very Strange.
It is like I have ticket I can board the train first time but I can get-off and but can not board the same train again 🙂 Microsoft is really up to something with no or very little information available to diagnose the issue.
Please do contribute in case anyone is facing the similar issue on Microsoft SharePoint Online and have any suggestions.
Thanks in Advance. Happy to chat further.
Feb 11 2022 09:01 AM
Feb 13 2022 03:56 PM - edited Feb 13 2022 04:16 PM
I'm facing the same issues now utilising PnP.Powershell to import html into text components.
Transformation Framework currently only supports SharePoint Classic > SharePoint Modern pages.
Would be good to know the HTML transformation it performs, is there a reference?
*** Update looks like this is the transformation code
Images
- <img src=""> = is stripped (after import and Edit+Publish)
HyperLinks
- <a href=""></a> = is stripped (after import and Edit+Publish)
- [[LINK | URL ]] = is transform (after import and Edit+Publish)
Feb 14 2022 12:03 AM
Jan 13 2024 11:37 PM
@BertJansen
It might be too late for you but anyone who's still facing the issue, this is how I solved it.
For images
$@"
<div class=""imagePlugin"" style=""background-color:transparent;position:relative;"" data-alignment=""Center""
data-imageurl=""<Full Image Url>"" data-uploading=""0""
data-widthpercentage=""100"">
<div style=""display: flex; flex-direction: column; position: relative; margin: 0px auto; width: 100%; align-items: center;"">
<div class="""" style=""position: relative; outline: none;"">
<div class="""" data-automation-id=""imageRead"">
<figure tabindex=""0"" role=""button"" class="""" >
<div class="">
<div style="" class="">
<img style='width:100%' alt='<alt-text>' data-sp-originalimgsrc=""<Full Image Url>""
src='<Full Image Url>' />
</div>
</div>
</figure>
</div>
</div>
</div>
</div>");
For tables
// using HtmlAgilityPack here
// idea again is the same, wrap table in a few other html nodes as SP itself does.
var htmlDoc = new HtmlDocument();
ListDictionary tableNodes = new ListDictionary();
if (tables != null)
{
foreach (var tag in tables)
{
HtmlNode figureNode = HtmlNode.CreateNode("<figure class=\"table canvasRteResponsiveTable tableLeftAlign\" title=\"Table\">");
var clonedTableNode = tag.Clone();
clonedTableNode.AddClass("ck-table-resized");
figureNode.AppendChild(clonedTableNode);
tableNodes.Add(tag.OuterHtml, figureNode.OuterHtml);
Console.WriteLine(figureNode.OuterHtml);
}
}
foreach (DictionaryEntry tableNodeDict in tableNodes)
{
htmlDoc.DocumentNode.InnerHtml = htmlDoc.DocumentNode.InnerHtml.Replace(tableNodeDict.Key.ToString(), tableNodeDict.Value.ToString());
}
return htmlDoc.DocumentNode.InnerHtml;
Aug 01 2018 11:24 PM
SolutionThe easiest way to understand what's valid HTML is to create a piece of text using all the layout and formatting options you need. Once you've that you can grab the page list item and look at the canvascontent1 field to obtain the generated HTML. You'll see that only a limited number of styles are supported and fixed set of classes for color and size information. Anything else you use can be initially displayed but will be lost during edit.