Forum Discussion

hpage910's avatar
hpage910
Copper Contributor
Feb 21, 2024

Microsoft Graph API Sharepoint Search

I have written an some python to send a request to the graph API to search a OneDrive for files containing keywords.

def get_results(keywords, drive_id, next_url=None, total_results=[]):
    search_query = ' AND '.join(keywords)
    top = 500  # Set the desired number of results to retrieve
    select_properties = "id,name,createdDateTime"  # Example properties to select

    if not next_url:
        base_search_url = f"https://graph.microsoft.com/v1.0/drives/{drive_id}/root/search(q=\'{search_query}\')"
        top_search = f"$top={top}"
        search_url = f"{base_search_url}?{top_search}"               
    else:
        search_url = next_url

    headers = {
        'Authorization': 'Bearer {}'.format(access_token),
        'Content-Type': 'application/json'
    }

    response = requests.get(search_url, headers=headers)
    search_results = response.json()
    total_results += search_results.get('value', [])

    if '@odata.nextLink' in search_results:
        next_url = search_results['@odata.nextLink']
        get_results(keywords, drive_id, next_url, total_results)

    return total_results


However, when I run this, the result is that I get around 7k results, but I know for a fact there should be around 10k results. When I search the files in the OneDrive using body: keyword1 AND keyword2 I get all files returned, but with this method using the skip tokens, some files are not returned.

On top of this, I also get duplicates, so if the 7k files, only 6k are unique.

I'm not sure what I've done wrong with this method, I'm wondering if I could get some assistance on this. 

Thanks 🙂

 

  • lulup2085's avatar
    lulup2085
    Copper Contributor

    hpage910 I'm commenting to bring awareness to this issue. We ran into this exact same problem and incorrectly assumed it was the SDK we were using to make the api calls. However, when we decided to test it using the web graph explorer we still ran into duplicate search results.

    We uploaded a folder containing 10k unique text files (about 1kb each) to OneDrive using the OneDrive Windows Sync Client. We then gave it a few weeks to make sure everything synced across the graph api service.

    The test files have a modified date of April 15th so we did a Modified Date Range query between April 10th - April 20th. The Graph API correctly identifies the correct "Total" of 10k documents in the response object, using pagination of 50 files per request it fetched 10k documents but when we analyzed the document ids in the request we were getting about 8,902 unique documents and 1,098  duplicates. 

    There are many people running into this issue and there is no clarity about a solution. Please Microsoft Support fix this. we need the api to be deterministic and accurate. We tried using the "trimDuplicates" property in the request, and adding a sort property to prevent it from sorting by relevancy and still the same results

Resources