Forum Discussion

Ross_Cullen's avatar
Ross_Cullen
Copper Contributor
May 28, 2024

What to do with document "content" from docx file

Hello,

I can successfully connect to and call a Graph API to retrieve a document from a sharepoint document library using

 

https://graph.microsoft.com/v1.0/sites/{siteId}/drives/{driveId}/items/{itemId}/content

 

there is a screenshot below of the returned value using postman for a sample

 

What exactly is this?
What am I supposed to do with this?

A lot of googling seems to imply I need to write this to a local docx file on my system, but is there no way to convert/parse this to json or something so I am query it directly

Ultimately the end goal will be do something like
"external system places a document in a sp folder"
"use graph to get the document"
"do some magic with the document gotten from graph"
write a test to say "does the document contain the text 'my string value'"

I am using typescript with playwright to write automated tests

 

1 Reply

  • Ross_Cullen's avatar
    Ross_Cullen
    Copper Contributor
    Replying for anyone in the future, as I found a solution for what I need and may help others
    const myDocument = "call to graph api that returns the content of the docx file"

    public static async ExtractDocumentTextContent(myDocument) {
    const mammoth = require('mammoth');
    try {
    const chunks: Buffer[] = [];
    for await (const chunk of myDocument) {
    chunks.push(chunk);
    }
    const buffer = Buffer.concat(chunks);

    const result = await mammoth.extractRawText({ buffer });
    const textContent = result.value;

    return textContent;
    } catch (error) {
    console.error('Error extracting document content:', error);
    }
    }

    This returns the textContent of the docx file, which I can then use in assertions/expects

    expect(textContent).toContains("My expected string");

Resources