When to use and how to get querying proficiency

Brass Contributor

Alright, I will try to be short here but anyone let me know if more info is needed.

We had this new project started 3 months ago, that was increasing the amount of data it was going inside our Elastic Search today, I suggested migrating to azure search for this new phase of the project, and so I started migrating.

Q1. Any recommended way for fast ingesting? 
 Using azure durable functions to ingest about 600k documents was taking an absurd amount of time, like over 4 hours; using the dotnet sdk, with batching by 100 documents. Documents are not large, but a bit complex.
Q2. Querying, anything out of the basics its really hard to find information for specially when using the dotnet sdk 11. Example, I have a sort that needs to sort results by one of the fields value inside an object inside the document,(2nd level) where fieldX= A, should come first and fieldX=B should come second and so on...  which seemed like a good case for semantic scoring... but again, couldn't find any example on how to do that... (suggestion, a querying playground with the hotel data set would be very helpful on that)
Q3 - Maybe a bug, any way to manipulate the field conversion during casting of result set? Background info->searchClient.SearchAsync<DocumentsResult>(query, options ); one of the fields inside DocumentResult is a string which the value can contain "8" or "A8", I can see the data when querying the index on the portal its there, but when calling the value for this field is always null...

Sorry for the long post....
anyway, long story short migrated everything to managed sql server, where querying is extremely easy last week(due to time crunch) but since the AMA was coming up figured I could try getting some answers for this for next time or phase of the project.


Thanks

Pedro

 

12 Replies
Hey Pedro - I'm not quite sure I understand your third question. Are you trying to select a certain field but the data for it isn't being returned? Are you serializing the data and that's when it becomes null?
Hi @Pedro, let me start with Q1. You are correct that pushing content and doing batching will be the most optimal way of getting content into the search service quickly. Just as a side note, the S2 and higher is backed by premium storage which also allows indexing to happen faster. However, the added cost does not always warrant the increase perf. You might also be interested in this code that we have for optimizing indexing performance that helps understand optimal batch sizes. https://github.com/Azure-Samples/azure-search-dotnet-samples/tree/master/optimize-data-indexing

In addition, please keep in mind that you can also parallelize uploads which can allow you to push data even fasters. However, if you do this it is important to keep track of throttling and back off exponentially if you start seeing this. The above sample helps walk through this as well.

Hope that helps!

Liam
That is correct, in the deserialization process the data end up null.
Hello Pedro - thank you for your question.

Regarding your 2nd question, you could use the "orderby" parameter to include your custom ordering functions. Please refer to https://docs.microsoft.com/en-us/azure/search/search-query-odata-orderby and https://docs.microsoft.com/en-us/rest/api/searchservice/Search-Documents for more details. If you have issues using the .Net SDK,

Did you mean 'semantic search' when you said 'semantic scoring'? If not, can you please elaborate on that.
Thanks for the clarification. The reason that the data is ending up null is because a decision was made that we want customers to own their data models as a best practice so the in-built data types don't necessarily serialize all fields (you'll run into a similar issue with counts on facets). To get around the issue you're running into, I would recommend mapping the data from the response into classes that you create. You can do this pretty quickly with a package like Mapster: https://github.com/MapsterMapper/Mapster
Thank you Liam, I will look into this example more.
Punnet I'm not sure order by would work for me since I just want to display resulst with valus a and b higher in the list, sorry for the confusion, I was referring to Relevance Scoring https://docs.microsoft.com/en-us/azure/search/index-add-scoring-profiles
Hello Pedro - thank you for the additional information. You could leverage tag scoring profile in this particular case.
Derek that sounds very promising, do you have any examples of that? I'm confuse because trough the sdk the only way for me to call the search is to pass a class for it to automatically deserialize the results for me. https://docs.microsoft.com/en-us/dotnet/api/overview/azure/search.documents-readme
so, i'm wondering, how do I control that field mapping conversion myself? Is it trough field mappings with annotations? like the mapster example show? (never seem mapster, seems very interesting, have only used automapper mysef).
do you have any example on how to create the tag scoring profile for a object inside my main object?
Ahh the issue you're experiencing might be different than I was thinking. I've seen this issue when users try to pass data to their front-end and then nulls appear. If you're not getting the data at all, it might be a different issue.

I would recommend submitting an issue on the GitHub repo so one of the developers of the SDK can look into it and provide guidance: https://github.com/Azure/azure-sdk-for-net
Will do that, thank you Derek! Much appreciated the time!