Forum Discussion
Ross_Cullen
May 28, 2024Copper Contributor
What to do with document "content" from docx file
Hello,
I can successfully connect to and call a Graph API to retrieve a document from a sharepoint document library using
https://graph.microsoft.com/v1.0/sites/{siteId}/drives/{driveId}/items/{itemId}/content
there is a screenshot below of the returned value using postman for a sample
What exactly is this?
What am I supposed to do with this?
A lot of googling seems to imply I need to write this to a local docx file on my system, but is there no way to convert/parse this to json or something so I am query it directly
Ultimately the end goal will be do something like
"external system places a document in a sp folder"
"use graph to get the document"
"do some magic with the document gotten from graph"
write a test to say "does the document contain the text 'my string value'"
I am using typescript with playwright to write automated tests
1 Reply
Sort By
- Ross_CullenCopper ContributorReplying for anyone in the future, as I found a solution for what I need and may help others
const myDocument = "call to graph api that returns the content of the docx file"
public static async ExtractDocumentTextContent(myDocument) {
const mammoth = require('mammoth');
try {
const chunks: Buffer[] = [];
for await (const chunk of myDocument) {
chunks.push(chunk);
}
const buffer = Buffer.concat(chunks);
const result = await mammoth.extractRawText({ buffer });
const textContent = result.value;
return textContent;
} catch (error) {
console.error('Error extracting document content:', error);
}
}
This returns the textContent of the docx file, which I can then use in assertions/expects
expect(textContent).toContains("My expected string");