How to build an intelligent travel journal using Azure AI

Microsoft

Jan 26, 2021

AI capabilities can enhance many types of applications, enabling you to improve your customer experience and solve complex problems. With Azure Cognitive Services, you can easily access and customize industry-leading AI models, using the tools and languages of your choice.

In this blog, we’ll walk through an exercise which you can complete in under an hour, to get started using Azure AI Services. Many of us are dreaming of traveling again, and building this intelligent travel journal app can help you capture memories from your next trip, whenever that may be. We’ll provide high level guidance and sample code to get you started, and we encourage you to play around with the code and get creative with your solution!

Features of the application:

Capture voice memos, voice tag photos, and transcribe speech to text.
Automatically tag your photos based on key phrase extraction and analysis of text in pictures.
Translate tags and text into desired language.
Organize your memos by key phrase and find similar travel experiences you enjoyed with AI-powered search.

Prerequisites:

If you don't have an Azure subscription, create a free account before you begin. If you have a subscription, log in to the Azure Portal.
To run the provided sample code, you will need Visual Studio 2019 and .NET Core 3.1 or above (for FotoFly)
Refer to this tutorial for detailed guidance on how to publish a console app.

Key Azure technologies:

Speech Service batch transcription for speech to text transcription
Text Analytics for key phrase/intent extraction
Computer Vision for analyzing text in images
Translator to normalize tags/text into desired language.
Open Source FotoFly library for photo tagging. Alternatively, you can use blob metadata but functionality will be limited.
Azure Cognitive Search for AI-powered search.

NOTE: For more information refer to the “References.txt” file under respective folders within JournalHelper library project in the provided sample solution with this blog.

Solution Architecture

App Architecture Description:

User records a voice memo; for example, to accompany an image they’ve captured. The recorded file is stored in a file repository (alternatively, you could use a database).
The recorded voice memo (e.g. .m4a) is converted into desired format (e.g. .wav), using Azure’s Speech Service batch transcription capability.
The folder containing voice memos is uploaded to a Blob container.
Images are uploaded into a separate container for analysis of any text within the photos, using Azure Computer Vision.
Use Translator to translate text to different languages, as needed. This may be useful to translate foreign street signs, menus, or other text in images.
Extract tags from the generated text files using Text Analytics, and send tags back to the corresponding image file. Tags can be travel related (#milan, #sunset, #Glacier National Park), or based on geotagging metadata, photo metadata (camera make, exposure, ISO), and more.
Create a search indexer with Azure Cognitive Search, and use the generated index to search your intelligent travel journal.

Implementation

Sample code

The entire solution code is available for download at this link. Download/clone and follow instructions in ReadMe.md solution item for further setup.

Implementation summary

The sample is implemented using various client libraries and samples available for Azure Cognitive Services. All these services are grouped together into a helper library project named “journalhelper”. In the library we introduce a helper class to help with scenarios that combine various Cognitive Services to achieve desired functionality.

We use “.Net Core console app” as the front end to test the scenarios. This sample also uses another open source library (FotoFly), which is ported to .Net Core here, to access and edit image metadata.

High level overview of steps, along with sample code snippets for illustration:

Start by batch transcribing voice memos and extracting key tags from the text output. Group the input voice memos into a folder, upload them into an Azure Blob container or specify a list of their URls, and use batch transcription to get results back into the Azure Blob container, as well as a folder in your file system. The following code snippet illustrates how helper functions can be grouped together for a specific functionality. It combines local file system, Azure storage containers, and Cognitive Services speech batch transcription API.

Console.WriteLine("Uploading voice memos folder to blob container...");
Helper.UploadFolderToContainer(
HelperFunctions.GetSampleDataFullPath(customSettings.SampleDataFolders.VoiceMemosFolder),
customSettings.AzureBlobContainers.InputVoiceMemoFiles, deleteExistingContainer);
Console.WriteLine("Branch Transcribing voice memos using containers...");
//NOTE: Turn the pricing tier for Speech Service to standard for this below to work.

await Helper.BatchTranscribeVoiceMemosAsync(
customSettings.AzureBlobContainers.InputVoiceMemoFiles,
customSettings.AzureBlobContainers.BatchTranscribedJsonResults,
          customSettings.SpeechConfigSettings.Key,
          customSettings.SpeechConfigSettings.Region);

Console.WriteLine("Extract transcribed text files into another container and folder, delete the intermediate container with json files...");

await Helper.ExtractTranscribedTextfromJsonAsync(
customSettings.AzureBlobContainers.BatchTranscribedJsonResults,
customSettings.AzureBlobContainers.InputVoiceMemoFiles,
customSettings.AzureBlobContainers.ExtractedTranscribedTexts,
HelperFunctions.GetSampleDataFullPath(customSettings.SampleDataFolders.BatchTranscribedFolder), true);

Next, create tags from the transcribed text. Sample helper function using the Text Analytics client library is listed below.

//text analytics
public static void CreateTagsForFolderItems(string key, string endpoint, string batchTranscribedFolder, string extractedTagsFolder)
{
    if (!Directory.Exists(batchTranscribedFolder))
    {
       Console.WriteLine("Input folder for transcribed files does not exist");
       return;
    }

    // ensure destination folder path exists
    Directory.CreateDirectory(extractedTagsFolder);
    TextAnalyticsClient textClient = TextAnalytics.GetClient(key, endpoint);

    var contentFiles = Directory.EnumerateFiles(batchTranscribedFolder);
    foreach(var contentFile in contentFiles
    {
var tags = TextAnalytics.GetTags(textClient, 
contentFile).ConfigureAwait(false).GetAwaiter().GetResult();

// generate output file with tags 
string outFileName = Path.GetFileNameWithoutExtension(contentFile);
                outFileName += @"_tags.txt";
string outFilePath = Path.Combine(extractedTagsFolder, outFileName);
File.WriteAllLinesAsync(outFilePath, tags).Wait() ;
    }
}

The actual client library or service calls are made as shown:

static public async Task<IEnumerable<string>> GetTags(TextAnalyticsClient 
client, string inputTextFilePath)
{
   string inputContent = await File.ReadAllTextAsync(inputTextFilePath);
   var entities = EntityRecognition(client, inputContent);
   var phrases = KeyPhraseExtraction(client, inputContent);
   var tags = new List<string>();
   tags.AddRange(entities);
   tags.AddRange(phrases);
   return tags;
}

Update tags to the photo/image file, using the open source FotoFly library. Alternatively, you can update the Blob metadata with these tags and include that in the search index, but the functionality will be limited to using Azure Blob storage.

string taggedPhotoFile = photoFile.Replace(inputPhotosFolder,    
      OutPhotosFolder);
File.Copy(photoFile, taggedPhotoFile, true);

if (tags.Count > 0)
{
    ImageProperties.SetPhotoTags(taggedPhotoFile, tags);
}

Other useful functions to complete the scenario are:
1. Helper.ProcessImageAsync, and
2. Helper.TranslateFileContent

The first one can be used to extract text from images using OCR or regular text processing using Computer Vision. The second can detect the source language, translate using Azure’s Translator service into the desired output language, and then create more tags for an image file.

Finally, use Azure Cognitive Search to create an index from the extracted text files saved in the Blob container, enabling you to search for documents and create journal text files. For example, you can search for images by cities or countries visited, date, or even cuisines. You can also search for images by camera-related metadata or geolocation.

In this sample we have demonstrated simple built-in skillsets for entity and language detection. The solution can be further enhanced by adding additional data sources to process tagged images and their metadata, and adding additional information to the searches.

NOTE: The helper functions can be made more generic to take additional skillset input.

public static async Task CreateSearchIndexerAsync(
    string serviceAdminKey, string searchSvcUrl,
    string cognitiveServiceKey,
    string indexName, string jsonFieldsFilePath,
    string blobConnectionString, string blobContainerName
    )
{
    // Its a temporary arrangment.  This function is not complete
    IEnumerable<SearchField> fields = SearchHelper.LoadFieldsFromJSonFile(jsonFieldsFilePath);

    // create index
    var searchIndex = await 
Search.Search.CreateSearchIndexAsync(serviceAdminKey, 
searchSvcUrl, indexName, fields.ToList());

    // get indexer client
    var indexerClient = 
Search.Search.GetSearchIndexerClient(serviceAdminKey, searchSvcUrl);

    // create azure blob data source
    var dataSource = await 
Search.Search.CreateOrUpdateAzureBlobDataSourceAsync(indexerClient, 
blobConnectionString, indexName, blobContainerName);

    // create indexer

    // create skill set with minimal skills
    List<SearchIndexerSkill> skills = new List<SearchIndexerSkill>();
            skills.Add(Skills.CreateEntityRecognitionSkill());
            skills.Add(Skills.CreateLanguageDetectionSkill());
     var skillSet = await 
Search.Search.CreateOrUpdateSkillSetAsync(indexerClient,
             indexName + "-skillset", skills, cognitiveServiceKey);

     var indexer = await Search.Search.CreateIndexerAsync(indexerClient, 
dataSource, skillSet, searchIndex);

     // wait for some time to have indexer run and load documents
     Thread.Sleep(TimeSpan.FromSeconds(20));

     await Search.Search.CheckIndexerOverallStatusAsync(indexerClient, 
             indexer);
}

Finally, search documents and generate the corresponding journal files, utilizing the following functions:

Helper.SearchDocuments
Helper.CreateTravelJournal

Additional Ideas

In addition to the functionality described so far, there are many other ways you can leverage Azure AI to further enhance your intelligent travel journal and learn more advanced scenarios. We encourage you to explore some the following ideas to enrich your app:

Add real time voice transcription and store transcriptions in an Azure managed database, to correlate voice transcription with images in context.
Include travel tickets and receipts as images for OCR-based image analysis (Form Recognizer) and include them as journal artifacts.
Use multiple data sources for a given search index. We have simplified and only included text files to index in this sample, but you can include the tagged photos from a different data source for the same search index.
Add custom skills and data extraction for search indexer. Extract metadata from images and include as search content.
Extract metadata from video and audio content using Video Indexer.
Experiment with Language Understanding and generate more elaborate and relevant search content based on top scoring intents and entities. Sample keywords and questions related to current sample data are included in Objectives.docx solution item.
Build a consumer front-end app that stitches all of this together and displays the journal in a UI.