Migrating Rich Text data - embedded images

Copper Contributor



I am migrating rich text values from an SQL table column (save to csv) to an enhanced rich text column. The RTF in the SQL column starts with {rtf  . The code that I developed is as follows - 



import base64,io
import aspose.words as aw
from office365.runtime.auth.authentication_context import AuthenticationContext
from office365.sharepoint.client_context import ClientContext
from office365.runtime.http.request_options import RequestOptions
from office365.runtime.http.http_method import HttpMethod
import json
import pandas as pd

site_url = 'https://aaaaaaaaaaaaaa.sharepoint.com/sites/ABCDevelopment'
client_id = 'aaaaaaa-7abe-4a8e-ac20-bbbbbbbbbbb'
client_secret = 'HaaaaaaaaaaaaaaapDH56KHoV448gYOBOJz/8M='
SharePointListName = "TestList"
df = pd.read_csv(r'C:\Users\ESS\Desktop\test2.csv',header=0)
context_auth = AuthenticationContext(url=site_url)  
context_auth.acquire_token_for_app(client_id=client_id, client_secret=client_secret)  
for i in range(len(df)):
# for i in range(10):
    rtfText1 = base64.b64decode(df.iloc[i, 1]).decode('UTF-16', "ignore")
    #Create Aspose document with rtf string
    input_stream = io.BytesIO(rtfText1.encode())
    doc = aw.Document(input_stream)

    # Convert RTF to HTML
    output_stream = io.BytesIO()
    # Use a SaveOptions object to specify the encoding for a document that we will save.
    save_options = aw.saving.HtmlSaveOptions()
    save_options.save_format = aw.SaveFormat.HTML
    #save_options.encoding = "utf-8"
    save_options.export_images_as_base64 = True
    save_options.images_folder = r"C:\Users\ESS\Desktop\AsposeImageFolder"

    # Get the HTML content as a string
    html_string = output_stream.getvalue().decode("utf-8")

    ctx = ClientContext(site_url, context_auth)
    target_list = ctx.web.lists.get_by_title(SharePointListName)
    target_list.add_item({'Title':df.iloc[i, 0],'TestRTF':html_string})



This creates the required HTML retaining bold, italics, bullet points, links etc. It also includes images in the original as embedded <img> in the HTML. When the list item is created in the SharePoint list though, the images show up as blank boxes. Example - 




Would anyone have an idea regarding this. We are migrating around 15,000 records from an SQL table to the SharePoint List.


1 Reply

To add, this is an example of the html that is sent to SharePoint -




But what ends up in the RTF field in sharepoint is only - 



The src attribute is somehow scraped out by SharePoint.