Forum Discussion
Help with Power Query from an online database
- Jun 26, 2022
Hi YYHoe
I think I undestand what you ultimately expect to achieve. Hopefully the following will help
Part 1 to 2
Your first query that pulls numbers from i.e. https://pubmed.ncbi.nlm.nih.gov/?term=stress&filter=pubt.clinicaltrial&sort=pubdate returns only 2 fields: [Year] & [# Results] ==> No way to pull the corresponding list of PMIDs
At looked at the construction of the above link and the link to pull PMIDs but couldn't find something consistent enough to auto. construct the second link from the first
Part 2 & 3 - Get keywords for PMIDs
With the URLs you provided at looked at getting the PMIDs and their respective Keyworks from the nested Tables - see attached file
One thing to note regarding the URL for PMIDs. The one you provided is:https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&term=stress[tiab]+clinical%20trial[pt]+human[tiab]+extract[tiab]
I also tested:
https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&term=stress[tiab]+clinical%20trial[tiab]+human[tiab]+extract[tiab]So, same search criteria but [pt] after clinical%20trial in case 1, [tiab] after clinical%20trial in case 2. Results are different:
You'll find the correspoding queries in the attached. Hope this helps a bit
Hi Lorenzo ,
Thank you so much for your suggestions. Let me respond to you in parts below.
Part1 - Stress & Cortisol query
Great spot there on my mistake summing the wrong information! Those numbers looked odd to me but I ignored them until I can solve the bigger issues. Your worksheet showed everything I wanted to do. I just need to figure out the details of what you did there and why, so thank you! And thank you for sharing your "trick" that I had absolutely no knowledge of.
Part2 - PMIDs
PMIDs are id tags for all articles that fulfil the search criteria from Part 1. In this instance, there is actually an API using a hyperlink that returns an xml file with all the information I need nested within <IdList></IdList> (as below). It limits the results to max 20 IDs so I was hoping Power Query can help me extend that list. Otherwise, 20 IDs will be a good start for me if not possible.
I managed to extract the IDs using a query in Excel separately. However, this seems less elegant & lost the link to Part 1 queries. This discontinuity will prevent me from my ultimate goal in creating a dynamic keyword search in Excel with all these information displayed with one step.
https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&term=stress[tiab]+clinical%20trial[pt]+human[tiab]+extract[tiab]
Part3 - extract keywords by PMID
To me, this is the most challenging part of all because the results are varied & are nested many levels. There are keywords for PMIDs 35708557 & 35707866, yes, but I suppose there maybe articles with no keywords too. This is, unfortunately, also the most crucial information I need for this project. What I hope to achieve here is to have a query linking to the ID results in Part 2 to scrape all keywords associated to each PMID. I suspect all keywords are nested inside both <KeywordList> and <MeshHeadingList>/<MeshHeading> when a specific API is run (see link below). I tried Power Query but all I got was blank.
https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=pubmed&id=35707866&retmode=xml
Final goal
What I hope to achieve is to create a dynamic keyword input from cells (perhaps A1..A3) as part of search/query resulting in total matching articles in Part 1 (A2..B6), then linking the matching article PMIDs in Part 2 ($B9..) and the specific keywords (MeshHeadings?) in each PMID in Part 3 (D..$9). An example of the final output is shown below.
Hi YYHoe
I think I undestand what you ultimately expect to achieve. Hopefully the following will help
Part 1 to 2
Your first query that pulls numbers from i.e. https://pubmed.ncbi.nlm.nih.gov/?term=stress&filter=pubt.clinicaltrial&sort=pubdate returns only 2 fields: [Year] & [# Results] ==> No way to pull the corresponding list of PMIDs
At looked at the construction of the above link and the link to pull PMIDs but couldn't find something consistent enough to auto. construct the second link from the first
Part 2 & 3 - Get keywords for PMIDs
With the URLs you provided at looked at getting the PMIDs and their respective Keyworks from the nested Tables - see attached file
One thing to note regarding the URL for PMIDs. The one you provided is:
https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&term=stress[tiab]+clinical%20trial[pt]+human[tiab]+extract[tiab]
I also tested:
https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&term=stress[tiab]+clinical%20trial[tiab]+human[tiab]+extract[tiab]
So, same search criteria but [pt] after clinical%20trial in case 1, [tiab] after clinical%20trial in case 2. Results are different:
You'll find the correspoding queries in the attached. Hope this helps a bit
- YYHoeJun 28, 2022Copper Contributor
Hi Lorenzo,
You have done it perfectly! Thank you very, very much!
Now I just have to understand all the steps you did in the file . . . . a daunting task for me.
- LorenzoJun 28, 2022Silver Contributor
Hi YYHoe
Glad I could help a bit
In the meantime I tested a few things and there's always a significant diff. between what the Web site says (# results), the PubMed query (https://pubmed.ncbi.nlm.nih.gov/?term=...) and the esearch query (https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&term=...) - TBH I don't understand
On the other hand I figure out that the esearch query provides a bunch of information, including the PMIDs (this closes the gap between Part 1 & 2). Attached is my last version if this can help
Good luck with your project