Forum Discussion

YYHoe's avatar
YYHoe
Copper Contributor
Jun 24, 2022
Solved

Help with Power Query from an online database

Hi all,

 

I am trying to set up a dynamic Excel sheet that allows me to input certain keywords into a cell (or two) to search for matching scientific publications from Pubmed.  I managed to set up two tables using Power Query to show the total publications but the keywords are not dynamic - see picture below & attached link; where Attributes are different keywords, Value is just a text string, and Column1 from row 3 to 6 are simply increasing stringency of search parameters with their corresponding return values.  The hyperlink and xml file as a result of Power Query from B6 are also attached below.

 

 

Questions:

1. How can I turn the Attributes at A2 & L2 into dynamic text string to fit into Power Query?

2. How can I label Column1s at A3..A6 & L3..L6 into proper labels?

 

I also want to extend the dynamic query at row 6 to capture and list additional information like PMIDs and list them out (B9..B18; maybe with a limit of 20) (see below).

 

 

And finally using a Pubmed API function efetch to extract keywords associated with each of the articles defined by its PMIDs to display across the corresponding row from column D.  For example, the cell at D9 contains something like this: https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=pubmed&id=30796508&retmode=xml where the keywords I want to extract to be listed from row D9 are Clinical trial, Curcumin, Diabetes, Metabolic syndrome, NAFLD and Phosphatidylserine. (see below).

 

 

I hope someone will be able to help.  Any advice and suggestion will be very appreciated.  You can either reply to this thread or directly edit into Book1 file.  Thank you in anticipation.

 

Book1 

Power Query at B6 

https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=pubmed&id=35708857&retmode=xml 

11 Replies

  • Lorenzo's avatar
    Lorenzo
    Silver Contributor

    Hi YYHoe 

     

    Part1 - Stress & Cortisol query
    See attached file where I created a table in sheet PARAM with the appropriate links. Up to you to change the Labels

    • My [Number of Results] for Cortisol are accurate, you were summing the [Year] column
    • The Source step of queries Stress & Cortisol is:
    = #sections[Section1][StressCortisolLinks]

    This is a "trick" to prevent a Power Query Firewall error

     

    Part2 - PMIDs

    In principal a similar approach can be taken. However, no idea at all what info. you want to extract for each ID nor in which Table that information is strored. Querying i.e. https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=pubmed&id=35708557&retmode=xml returns a Table with a bunch of nested Tables, some of them having nested Tables as well

    ==> If you can't make it share a query that returns what you expect for one PMID

     

    Part3 - extract keywords by PMID

    In which nested Table are the Keywords stored assuming they exist (i.e. no keyword for 35708557 & 35707866)?

    • YYHoe's avatar
      YYHoe
      Copper Contributor

      Hi Lorenzo ,

       

      Thank you so much for your suggestions.  Let me respond to you in parts below.

       

      Part1 - Stress & Cortisol query

      Great spot there on my mistake summing the wrong information!  Those numbers looked odd to me but I ignored them until I can solve the bigger issues.  Your worksheet showed everything I wanted to do.  I just need to figure out the details of what you did there and why, so thank you!  And thank you for sharing your "trick" that I had absolutely no knowledge of.

       

      Part2 - PMIDs

      PMIDs are id tags for all articles that fulfil the search criteria from Part 1.  In this instance, there is actually an API using a hyperlink that returns an xml file with all the information I need nested within <IdList></IdList> (as below).  It limits the results to max 20 IDs so I was hoping Power Query can help me extend that list.  Otherwise, 20 IDs will be a good start for me if not possible. 

       

      I managed to extract the IDs using a query in Excel separately.  However, this seems less elegant & lost the link to Part 1 queries.  This discontinuity will prevent me from my ultimate goal in creating a dynamic keyword search in Excel with all these information displayed with one step.

       

      https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&term=stress[tiab]+clinical%20trial[pt]+human[tiab]+extract[tiab]

       

      Part3 - extract keywords by PMID

      To me, this is the most challenging part of all because the results are varied & are nested many levels.  There are keywords for PMIDs 35708557 & 35707866, yes, but I suppose there maybe articles with no keywords too.  This is, unfortunately, also the most crucial information I need for this project.  What I hope to achieve here is to have a query linking to the ID results in Part 2 to scrape all keywords associated to each PMID.  I suspect all keywords are nested inside both <KeywordList> and <MeshHeadingList>/<MeshHeading> when a specific API is run (see link below).  I tried Power Query but all I got was blank.

       

      https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=pubmed&id=35707866&retmode=xml

       

      Final goal

       

      What I hope to achieve is to create a dynamic keyword input from cells (perhaps A1..A3) as part of search/query resulting in total matching articles in Part 1 (A2..B6), then linking the matching article PMIDs in Part 2 ($B9..) and the specific keywords (MeshHeadings?) in each PMID in Part 3 (D..$9).  An example of the final output is shown below.

       

       

    • YYHoe's avatar
      YYHoe
      Copper Contributor
      I will try out your suggestions in a few hours. Appreciate your reply very much, L z!

Resources