Jun 23 2022 08:27 PM
Hi all,
I am trying to set up a dynamic Excel sheet that allows me to input certain keywords into a cell (or two) to search for matching scientific publications from Pubmed. I managed to set up two tables using Power Query to show the total publications but the keywords are not dynamic - see picture below & attached link; where Attributes are different keywords, Value is just a text string, and Column1 from row 3 to 6 are simply increasing stringency of search parameters with their corresponding return values. The hyperlink and xml file as a result of Power Query from B6 are also attached below.
Questions:
1. How can I turn the Attributes at A2 & L2 into dynamic text string to fit into Power Query?
2. How can I label Column1s at A3..A6 & L3..L6 into proper labels?
I also want to extend the dynamic query at row 6 to capture and list additional information like PMIDs and list them out (B9..B18; maybe with a limit of 20) (see below).
And finally using a Pubmed API function efetch to extract keywords associated with each of the articles defined by its PMIDs to display across the corresponding row from column D. For example, the cell at D9 contains something like this: https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=pubmed&id=30796508&retmode=xml where the keywords I want to extract to be listed from row D9 are Clinical trial, Curcumin, Diabetes, Metabolic syndrome, NAFLD and Phosphatidylserine. (see below).
I hope someone will be able to help. Any advice and suggestion will be very appreciated. You can either reply to this thread or directly edit into Book1 file. Thank you in anticipation.
https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=pubmed&id=35708857&retmode=xml
Jun 24 2022 01:24 AM
Hi @YYHoe
Part1 - Stress & Cortisol query
See attached file where I created a table in sheet PARAM with the appropriate links. Up to you to change the Labels
= #sections[Section1][StressCortisolLinks]
This is a "trick" to prevent a Power Query Firewall error
Part2 - PMIDs
In principal a similar approach can be taken. However, no idea at all what info. you want to extract for each ID nor in which Table that information is strored. Querying i.e. https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=pubmed&id=35708557&retmode=xml returns a Table with a bunch of nested Tables, some of them having nested Tables as well
==> If you can't make it share a query that returns what you expect for one PMID
Part3 - extract keywords by PMID
In which nested Table are the Keywords stored assuming they exist (i.e. no keyword for 35708557 & 35707866)?
Jun 24 2022 03:13 AM
Jun 24 2022 08:10 PM - edited Jun 24 2022 08:12 PM
Hi @L z. ,
Thank you so much for your suggestions. Let me respond to you in parts below.
Part1 - Stress & Cortisol query
Great spot there on my mistake summing the wrong information! Those numbers looked odd to me but I ignored them until I can solve the bigger issues. Your worksheet showed everything I wanted to do. I just need to figure out the details of what you did there and why, so thank you! And thank you for sharing your "trick" that I had absolutely no knowledge of.
Part2 - PMIDs
PMIDs are id tags for all articles that fulfil the search criteria from Part 1. In this instance, there is actually an API using a hyperlink that returns an xml file with all the information I need nested within <IdList></IdList> (as below). It limits the results to max 20 IDs so I was hoping Power Query can help me extend that list. Otherwise, 20 IDs will be a good start for me if not possible.
I managed to extract the IDs using a query in Excel separately. However, this seems less elegant & lost the link to Part 1 queries. This discontinuity will prevent me from my ultimate goal in creating a dynamic keyword search in Excel with all these information displayed with one step.
Part3 - extract keywords by PMID
To me, this is the most challenging part of all because the results are varied & are nested many levels. There are keywords for PMIDs 35708557 & 35707866, yes, but I suppose there maybe articles with no keywords too. This is, unfortunately, also the most crucial information I need for this project. What I hope to achieve here is to have a query linking to the ID results in Part 2 to scrape all keywords associated to each PMID. I suspect all keywords are nested inside both <KeywordList> and <MeshHeadingList>/<MeshHeading> when a specific API is run (see link below). I tried Power Query but all I got was blank.
https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=pubmed&id=35707866&retmode=xml
Final goal
What I hope to achieve is to create a dynamic keyword input from cells (perhaps A1..A3) as part of search/query resulting in total matching articles in Part 1 (A2..B6), then linking the matching article PMIDs in Part 2 ($B9..) and the specific keywords (MeshHeadings?) in each PMID in Part 3 (D..$9). An example of the final output is shown below.
Jun 24 2022 09:29 PM
I took a closer look at the file StressCortisol.xlsx. The numbers in the tables are a little off.... For instance, the actual numbers extracted manually using the same hyperlinks are as listed in C10..C13 (see below). I wonder where the numbers in the table came from. Will try to go through the query details & see if I can catch them.
Jun 25 2022 10:13 PM
Hi @YYHoe
Clarification
Re. "there is actually an API...It limits the results to max 20 IDs so I was hoping Power Query can help me extend that list"
Power Query pulls data through a data connector - an API in your case. In other terms, it doesn't connect directly to a/your database. So, it no way, Power Query can overcome any limitation imposed by the connector in use. If the API limits the result to 20 PMIDs, the max. you can expect is... 20
More to come...
Jun 26 2022 12:30 AM
SolutionHi @YYHoe
I think I undestand what you ultimately expect to achieve. Hopefully the following will help
Part 1 to 2
Your first query that pulls numbers from i.e. https://pubmed.ncbi.nlm.nih.gov/?term=stress&filter=pubt.clinicaltrial&sort=pubdate returns only 2 fields: [Year] & [# Results] ==> No way to pull the corresponding list of PMIDs
At looked at the construction of the above link and the link to pull PMIDs but couldn't find something consistent enough to auto. construct the second link from the first
Part 2 & 3 - Get keywords for PMIDs
With the URLs you provided at looked at getting the PMIDs and their respective Keyworks from the nested Tables - see attached file
One thing to note regarding the URL for PMIDs. The one you provided is:
I also tested:
https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&term=stress[tiab]+clinical%20tr...]
So, same search criteria but [pt] after clinical%20trial in case 1, [tiab] after clinical%20trial in case 2. Results are different:
You'll find the correspoding queries in the attached. Hope this helps a bit
Jun 26 2022 07:05 AM
Hi @YYHoe
Re. The numbers in the tables are a little off....
IMHO you should look at the query that's actually executed from the web page. For example - still with filter Clinical trial - if you only provide the term stress in the Search box, result = 35644, the actual query is:
("stress"[All Fields] OR "stressed"[All Fields] OR "stresses"[All Fields] OR "stressful"[All Fields] OR "stressfulness"[All Fields] OR "stressing"[All Fields]) AND (clinicaltrial[Filter])
After reading a bit the E-utilities doc. if I translate the above query as follow and pass it to PQ, result = 35617 (pretty close):
https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&term=%28stress+OR+stressed+OR+stresses+OR+stressful+OR+stressfulness+OR+stressing%29+AND+%28clinical%20trial[Filter]%29
Similarly, with your example (stress AND human AND plant) OR (stress AND human AND herb), result = 919, actual query is:
((("stress"[All Fields] OR "stressed"[All Fields] OR "stresses"[All Fields] OR "stressful"[All Fields] OR "stressfulness"[All Fields] OR "stressing"[All Fields]) AND ("human s"[All Fields] OR "humans"[MeSH Terms] OR "humans"[All Fields] OR "human"[All Fields]) AND ("plant s"[All Fields] OR "planted"[All Fields] OR "planting"[All Fields] OR "plantings"[All Fields] OR "plants"[MeSH Terms] OR "plants"[All Fields] OR "plant"[All Fields])) OR (("stress"[All Fields] OR "stressed"[All Fields] OR "stresses"[All Fields] OR "stressful"[All Fields] OR "stressfulness"[All Fields] OR "stressing"[All Fields]) AND ("human s"[All Fields] OR "humans"[MeSH Terms] OR "humans"[All Fields] OR "human"[All Fields]) AND "herb"[All Fields])) AND (clinicaltrial[Filter])
translated as follow and passed to PQ, result = 918 (almost perfect):
https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&term=%28%28%28stress+OR+stressed+OR+stresses+OR+stressful+OR+stressfulness+OR+stressing%29+AND+%28human s+OR+humans[MeSH Terms]+OR+humans+OR+human%29+AND+%28plant%20s+OR+planted+OR+planting+OR+plantings+OR+plants[MeSH Terms]+OR+plants+OR+plant%29%29+OR+%28%28stress+OR+stressed+OR+stresses+OR+stressful+OR+stressfulness+OR+stressing%29+AND+%28human%20s+OR+humans[MeSH Terms]+OR+humans+OR+human%29+AND+herb%29%29+AND+%28clinical%20trial[Filter]%29
Hope this helps
Jun 27 2022 12:03 AM
Jun 27 2022 05:05 AM
Hi @YYHoe
Forgot to mention the other day... re. the limit of 20 PMIDs. This is the Default limit. You can extend the list (assuming more than 20 ID exist) with the optional retmax parameter (tested here, works no problem)
Jun 28 2022 01:13 AM
Hi @L z.,
You have done it perfectly! Thank you very, very much!
Now I just have to understand all the steps you did in the file . . . . a daunting task for me.
Jun 28 2022 11:23 AM
Hi @YYHoe
Glad I could help a bit
In the meantime I tested a few things and there's always a significant diff. between what the Web site says (# results), the PubMed query (https://pubmed.ncbi.nlm.nih.gov/?term=...) and the esearch query (https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&term=...) - TBH I don't understand
On the other hand I figure out that the esearch query provides a bunch of information, including the PMIDs (this closes the gap between Part 1 & 2). Attached is my last version if this can help
Good luck with your project
Jun 26 2022 12:30 AM
SolutionHi @YYHoe
I think I undestand what you ultimately expect to achieve. Hopefully the following will help
Part 1 to 2
Your first query that pulls numbers from i.e. https://pubmed.ncbi.nlm.nih.gov/?term=stress&filter=pubt.clinicaltrial&sort=pubdate returns only 2 fields: [Year] & [# Results] ==> No way to pull the corresponding list of PMIDs
At looked at the construction of the above link and the link to pull PMIDs but couldn't find something consistent enough to auto. construct the second link from the first
Part 2 & 3 - Get keywords for PMIDs
With the URLs you provided at looked at getting the PMIDs and their respective Keyworks from the nested Tables - see attached file
One thing to note regarding the URL for PMIDs. The one you provided is:
I also tested:
https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&term=stress[tiab]+clinical%20tr...]
So, same search criteria but [pt] after clinical%20trial in case 1, [tiab] after clinical%20trial in case 2. Results are different:
You'll find the correspoding queries in the attached. Hope this helps a bit