Apr 20 2017 08:38 PM
Today a team member did a random public search on Bing using some keywords from an ongoing project. The first result was a link to a file on our internal MS Teams site. Is this supposed to happen? How can we prevent this?
Apr 20 2017 10:28 PM
Apr 21 2017 12:47 AM
Wow, I have just tried to search by one of the tenant names I manage using Google and I got returned the Url of the SPO Admin....^-^ cc @Vasil Michev @Tony Redmond FYI
Apr 21 2017 01:42 AM
I'm not seeing anything like this for multiple tenants I have accounts on, including longstanding ones not just demo/dev tenants.
Documents should absolutely not be showing up, not even just URLs that put you through an auth flow as those documents are not exposed to the public internet whatsoever so the search engines have no access to see them. Like Loryan, I'd want to see proof of this as that is a big deal if it is true. I'd also want full details of your configuration, especially whether you are running in hybrid and if so I'd recommend a serious audit of your network config to make sure the world isn't running around inside your SharePoint environment.
For URLs of your tenant that are internet facing, it might not shock me for a search engine to catch those but someone would need credentials to get in obviously. All that is probably surfaced is whatever information the URL itself would offer, and if you're naming your tenants top secret names you should rethink that. :)
Apr 21 2017 02:42 AM
Um, that sounds bad. I'm not able to reproduce it though, at least not with Team site/documents. My SPO admin site is visible in multiple articles, but that's my own fault :)
Apr 21 2017 03:53 AM
I don't see any problems for my tenant but I would like to see the steps required to reproduce the issue to check against them...
TR
Apr 21 2017 04:09 AM
Was the file shared with an anonymous link and was it referenced within a public page somewhere?
Apr 21 2017 05:05 AM
I've just seen the same issue. this doesn't look right.
Apr 21 2017 05:31 AM
Apr 21 2017 05:34 AM
Yeap, try to simply make a search in google of your tenant name and tell me what you get :-)...just adding screenshot here
Apr 21 2017 05:34 AM - edited Apr 21 2017 05:35 AM
Apr 21 2017 05:34 AM - edited Apr 21 2017 05:35 AM
It only happens with Bing for me.
Apr 21 2017 05:35 AM
Apr 21 2017 06:35 AM
It doesn't show up in Google, but it does show up in Yahoo and DuckDuckGo which if I am correct, use the Bing engine. See screenshot below. I have blocked out the search terms as this is an ongoing project.
Apr 21 2017 06:40 AM
Apr 21 2017 07:23 AM
I have a site level repro of this. Not down to documents or folders, but the site itself. This is a modern team site spun up as the result of creation in Microsoft Teams, which fired the corresponding Office 365 Group and everything that comes with that.
Something seems to be visible to the search engines around URLs at least. @Vesa Juvonen, @Adam Harmetz this is probably something you guys would want to know about.
Apr 21 2017 08:45 AM
Thx @David Rosenthal for looping us on this. Started internal investigation right away to avoid confusion.
Apr 21 2017 09:56 AM
Thanks for looping us in.
If a search crawler discovers a link to an authenticated SharePoint Online site, it may add the link to the index. Because the site requires authentication, the site title and contents will not be indexed – only the presence of the URL. This should only occur when there is a link to the site collection somewhere on the public internet (e.g. someone might have use anonymous link sharing and posted that link somewhere where a internet search crawler could find it).
The timing of this particular thread is an interesting coincidence, as we are doing work in this area. Just this month, to mitigate this, we have added a default robots.txt file to every site collection which instructs search crawlers not to index this URL. This will prevent search crawlers from adding new sites to their index. Existing sites should be removed from the search results next time the site is indexed by the search crawler.
You can actually see the robots.txt file for your domain at https://<tenantURL>/robots.txt The change is rolling out - should be mostly complete but might be a few deployments that do not yet have it.
Hope this helps!
Apr 21 2017 10:19 AM
Apr 21 2017 10:34 AM
Great to hear of the work to better protect even the URLs. There seem to be other ways these are getting surfaced though, as my tenant does not and has never had anonymous sharing turned on. I can't rule it out, but I would also be very surprised to find that someone from my team had put one of these URLs out on the public internet.
Is this affected by Guest Access in Office 365 Groups or anything like that? Does that URL have to become visible to the world somehow in order for guests to get to it?