Forum Widgets
Latest Discussions
How to process multiple receipts on one scan
Dear Community, I am building a simple receipt recognizer solution and I face the issue when users tend to fill a full A4 with receipts. As far as I understand, the built-in receipt model does not work with multiple items on one page. Tried a few libraries, to process these scans with Python (like cv2), but in general, edge detection does not work, because many times there's no visible edge, receipts are rotated or partly pushed under each other. Is there any (AI) solution that can help me to first extract the distinct receipts before feeding it to document intelligence? I don't need the text at this point, only the receipts as image. Thanks, vmvlmJan 10, 2025Copper Contributor21Views0likes0CommentsPrinciple Does not have Access to API/Operation
Hi all, I am trying to connect Azure OpenAI service to Azure AI Search service to Azure Gen 2 Data lake. In the Azure AI Foundry studio Chat Playground, I am able to add my data source, which is a .csv file in the data lake that has been indexed successfully. I use "System Assigned Managed Identity". The following RBAC has been applied: AI Search service has Cognitive Services OpenAI Contributor in Azure Open AI service Azure OpenAI service has Search Index Data Reader in AI Search Service Azure OpenAI service has Search Service Contributorin AI Search Service AI Search Service has Storage Blob Data Reader in Storage account (Data Lake) As mentioned when adding the data source it passes validation but when I try to ask a question, I get the error "We couldn't connect your data Principal does not have access to API/Operation"fingers3775Jan 09, 2025Copper Contributor17Views0likes0Commentsmy subscription disabled
Hello every one , I am MSc student and I am still not familiar with Azure. I need VM . My plan i wanted pay as you go subscription. I tried different times configure this, ultimately Microsoft Team disabled all my subscription. how to re enable it again and how to subscribe correctly I need powerful GPY and RAM for my MSc thesis about 3D image segmentation.RihamJan 08, 2025Copper Contributor23Views0likes1CommentGraphQL API: Unlimited flexibility for your AI applications
Building a Modern Speech-to-Text Solution with GraphQL and Azure AI Speech Intro Have you ever wondered how to build a modern AI enhanced web application that handles audio transcription while keeping your codebase clean and maintainable? In this post, I’ll walk you through how we combined the power of GraphQL with Azure’s AI services to create a seamless audio transcription solution. Let’s dive in! The Challenge In today’s world, converting speech to text is becoming increasingly important for accessibility, content creation, and data processing. But building a robust solution that handles file uploads, processes audio, and manages transcriptions can be complex. Traditional REST APIs often require multiple endpoints, leading to increased complexity and potential maintenance headaches. That’s where GraphQL comes in. What is GraphQL GraphQL is an open-source data query and manipulation language for APIs, and a runtime for executing those queries with your existing data. It was developed by Facebook in 2012 and publicly released in 2015. To break that down formally: It’s a query language specification that allows clients to request exactly the data they need It’s a type system that helps describe your API’s data model and capabilities It’s a runtime engine that processes and validates queries against your schema It provides a single endpoint to interact with multiple data sources and services In technical documentation, GraphQL is officially described as: “A query language for your API and a server-side runtime for executing queries by using a type system you define for your data.“ Why GraphQL? GraphQL has revolutionized how we think about API design. Instead of dealing with multiple endpoints for different operations, we get a single, powerful endpoint that handles everything. This is particularly valuable when dealing with complex workflows like audio file processing and transcription. Here’s what makes GraphQL perfect for our use case: Single endpoint for all operations (uploads, queries, mutations) Type-safe API contracts Flexible data fetching Real-time updates through subscriptions Built-in documentation and introspection Solution Architecture Our solution architecture centers around a modern web application built with a powerful combination of technologies. On the frontend, we utilize React to create a dynamic and responsive user interface, enhanced by Apollo Client for seamless GraphQL integration and Fluent UI for a polished and visually appealing design. The backend is powered by Apollo Server, providing our GraphQL API. To handle the core functionality of audio processing, we leverage Azure Speech-to-Text for AI-driven transcription. File management is streamlined with Azure Blob Storage, while data persistence is ensured through Azure Cosmos DB. Finally, we prioritize security by using Azure Key Vault for the secure management of sensitive information. This architecture allows us to deliver a robust and efficient application for audio processing and transcription. Flow Chart Key Technologies Frontend Stack React for a dynamic user interface Apollo Client for GraphQL integration Fluent UI for a polished look and feel Backend Stack Apollo Server for our GraphQL API Azure Speech-to-Text for AI-powered transcription Azure Blob Storage for file management Azure Cosmos DB for data persistence Azure Key Vault for secure secret management Architecture Diagram GraphQL Schema and Resolvers: The Foundation At its core, GraphQL requires two fundamental components to function: a Schema Definition Language (SDL) and Resolvers. Schema Definition Language (SDL) The schema is your API’s contract – it defines the types, queries, and mutations available. Here’s an example: import { gql } from "apollo-server"; const typeDefs = gql` scalar Upload type UploadResponse { success: Boolean! message: String! } type Transcription { id: ID! filename: String! transcription: String! fileUrl: String! } type Query { hello: String listTranscriptions: [Transcription!] getTranscription(id: ID!): Transcription } type Mutation { uploadFile(file: Upload!): UploadResponse! } `; export default typeDefs; Resolvers Resolvers are functions that determine how the data for each field in your schema is fetched or computed. They’re the implementation behind your schema. Here’s a typical resolver structure: import axios from "axios"; import { BlobServiceClient } from "@azure/storage-blob"; import { SecretClient } from "@azure/keyvault-secrets"; import { DefaultAzureCredential } from "@azure/identity"; import * as fs from "fs"; import FormData from "form-data"; import { GraphQLUpload } from "graphql-upload"; import { CosmosClient } from "@azure/cosmos"; import { v4 as uuidv4 } from "uuid"; import { pipeline } from "stream"; import { promisify } from "util"; const pipelineAsync = promisify(pipeline); // Key Vault setup const vaultName = process.env.AZURE_KEY_VAULT_NAME; const vaultUrl = `https://${vaultName}.vault.azure.net`; const credential = new DefaultAzureCredential({ managedIdentityClientId: process.env.MANAGED_IDENTITY_CLIENT_ID, }); const secretClient = new SecretClient(vaultUrl, credential); async function getSecret(secretName) { try { const secret = await secretClient.getSecret(secretName); console.log(`Successfully retrieved secret: ${secretName}`); return secret.value; } catch (error) { console.error(`Error fetching secret "${secretName}":`, error.message); throw new Error(`Failed to fetch secret: ${secretName}`); } } // Cosmos DB setup const databaseName = "TranscriptionDB"; const containerName = "Transcriptions"; let cosmosContainer; async function initCosmosDb() { const connectionString = await getSecret("COSMOSCONNECTIONSTRING"); const client = new CosmosClient(connectionString); const database = client.database(databaseName); cosmosContainer = database.container(containerName); console.log(`Connected to Cosmos DB: ${databaseName}/${containerName}`); } // Initialize Cosmos DB connection initCosmosDb(); const resolvers = { Upload: GraphQLUpload, Query: { hello: () => "Hello from Azure Backend!", // List all stored transcriptions listTranscriptions: async () => { try { const { resources } = await cosmosContainer.items.query("SELECT c.id, c.filename FROM c").fetchAll(); return resources; } catch (error) { console.error("Error fetching transcriptions:", error.message); throw new Error("Could not fetch transcriptions."); } }, // Fetch transcription details by ID getTranscription: async (parent, { id }) => { try { const { resource } = await cosmosContainer.item(id, id).read(); return resource; } catch (error) { console.error(`Error fetching transcription with ID ${id}:`, error.message); throw new Error(`Could not fetch transcription with ID ${id}.`); } }, }, Mutation: { uploadFile: async (parent, { file }) => { const { createReadStream, filename } = await file; const id = uuidv4(); const filePath = `/tmp/${id}-${filename}`; try { console.log("---- STARTING FILE UPLOAD ----"); console.log(`Original filename: ${filename}`); console.log(`Temporary file path: ${filePath}`); // Save the uploaded file to /tmp const stream = createReadStream(); const writeStream = fs.createWriteStream(filePath); await pipelineAsync(stream, writeStream); console.log("File saved successfully to temporary storage."); // Fetch secrets from Azure Key Vault console.log("Fetching secrets from Azure Key Vault..."); const subscriptionKey = await getSecret("AZURESUBSCRIPTIONKEY"); const endpoint = await getSecret("AZUREENDPOINT"); const storageAccountUrl = await getSecret("AZURESTORAGEACCOUNTURL"); const sasToken = await getSecret("AZURESASTOKEN"); console.log("Storage Account URL and SAS token retrieved."); // Upload the WAV file to Azure Blob Storage console.log("Uploading file to Azure Blob Storage..."); const blobServiceClient = new BlobServiceClient(`${storageAccountUrl}?${sasToken}`); const containerClient = blobServiceClient.getContainerClient("wav-files"); const blockBlobClient = containerClient.getBlockBlobClient(`${id}-${filename}`); await blockBlobClient.uploadFile(filePath); console.log("File uploaded to Azure Blob Storage successfully."); const fileUrl = `${storageAccountUrl}/wav-files/${id}-${filename}`; console.log(`File URL: ${fileUrl}`); // Send transcription request to Azure console.log("Sending transcription request..."); const form = new FormData(); form.append("audio", fs.createReadStream(filePath)); form.append( "definition", JSON.stringify({ locales: ["en-US"], profanityFilterMode: "Masked", channels: [0, 1], }) ); const response = await axios.post( `${endpoint}/speechtotext/transcriptions:transcribe?api-version=2024-05-15-preview`, form, { headers: { ...form.getHeaders(), "Ocp-Apim-Subscription-Key": subscriptionKey, }, } ); console.log("Azure Speech API response received."); console.log("Response Data:", JSON.stringify(response.data, null, 2)); // Extract transcription const combinedPhrases = response.data?.combinedPhrases; if (!combinedPhrases || combinedPhrases.length === 0) { throw new Error("Transcription result not available in the response."); } const transcription = combinedPhrases.map((phrase) => phrase.text).join(" "); console.log("Transcription completed successfully."); // Store transcription in Cosmos DB await cosmosContainer.items.create({ id, filename, transcription, fileUrl, createdAt: new Date().toISOString(), }); console.log(`Transcription stored in Cosmos DB with ID: ${id}`); return { success: true, message: `Transcription: ${transcription}`, }; } catch (error) { console.error("Error during transcription process:", error.response?.data || error.message); return { success: false, message: `Transcription failed: ${error.message}`, }; } finally { try { fs.unlinkSync(filePath); console.log(`Temporary file deleted: ${filePath}`); } catch (cleanupError) { console.error(`Error cleaning up temporary file: ${cleanupError.message}`); } console.log("---- FILE UPLOAD PROCESS COMPLETED ----"); } }, }, }; export default resolvers; Server – Apollo Finally the power behind all, the Apollo Server. We need to install with npm and in addition add the client to the frontend. It can easily integrate in our Javascript ExpressJS: import { ApolloServer } from "apollo-server-express"; import express from "express"; import cors from "cors"; // Add CORS middleware import { graphqlUploadExpress } from "graphql-upload"; import typeDefs from "./schema.js"; import resolvers from "./resolvers.js"; const startServer = async () => { const app = express(); // Add graphql-upload middleware app.use(graphqlUploadExpress()); // Configure CORS middleware app.use( cors({ origin: "https://<frontend>.azurewebsites.net", // Allow only the frontend origin credentials: true, // Allow cookies and authentication headers }) ); const server = new ApolloServer({ typeDefs, resolvers, csrfPrevention: true, }); await server.start(); server.applyMiddleware({ app, cors: false }); // Disable Apollo's CORS to rely on Express const PORT = process.env.PORT || 4000; app.listen(PORT, "0.0.0.0", () => console.log(`🚀 Server ready at http://0.0.0.0:${PORT}${server.graphqlPath}`) ); }; startServer(); Getting Started Want to try it yourself? Check out our GitHub repository for: Complete source code Deployment instructions Configuration guides API documentation Conclusion This project demonstrates the powerful combination of GraphQL and Azure AI services, showcasing how modern web applications can handle complex audio processing workflows with elegance and efficiency. By leveraging GraphQL’s flexible data fetching capabilities alongside Azure’s robust cloud infrastructure, we’ve created a scalable solution that streamlines the audio transcription process from upload to delivery. The integration of Apollo Server provides a clean, type-safe API layer that simplifies client-server communication, while Azure AI Speech Services ensures accurate transcription results. This architecture not only delivers a superior developer experience but also provides end-users with a seamless, professional-grade audio transcription service. References GraphQL – Language API What is GraphQL for Azure? Azure Key Vault secrets in JavaScript Azure AI Speech Fast transcription API CloudBlogger – MultiAgent SpeechKonstantinosPassadisJan 02, 2025Learn Expert76Views2likes0CommentsAzure AI speech studio - synthesis failed
Hi, in my TTS project all files created so far cause a failure when I hit the Play button. I get following error msg: Response status code does not indicate success: 400 (Synthesis failed. StatusCode: FailedPrecondition, Details: '=' is an unexpected token. The expected token is ';'. Line 1, position 535..). Connection ID: c2e319c0-c447-11ef-8937-33bd13f92760 Changing voices does not solve it. Location of the speech service is "Germany West Central"kobajeDec 27, 2024Copper Contributor19Views0likes0CommentsAidemos Microsoft site doesn't work https://aidemos.microsoft.com/
Hello MS team, I am learning AI-900 in Coursera. The course guides me to try AI demos onhttps://aidemos.microsoft.com/. But it seems broken for weeks. According to the error message, it could be the issue of the backend. Could the MS team fix it, please? Best Regards, DaleDale_CuiDec 20, 2024Copper Contributor3.7KViews1like11CommentsWhere can I talk with someone about Azure AI product for businesses?
Where can I talk with someone about Azure AI product for businesses?TelecomboyDec 17, 2024Copper Contributor77Views2likes2CommentsAzure inside PowerBI/Fabric
Hello! I work for a human performance research group who delivers reports made in Fabric/PowerBI and we wish to add a chatbot / LLM element to the reports using Azure. This would be used to examine report data then ask immediate questions without having to contact first. My first question is feasibility? Is this possible? I assume yes? Second, we wish for the LLM to learn based on the internal information and data our organization has collected. If we have x amount of data on 'recovery' we wish to only have it use that info then be able to make generalizations or answer questions. Is this also a possibility through Azure? Thank you!!!!Jider11Nov 26, 2024Copper Contributor43Views0likes0CommentsAI Foundry vs Github Marketplace
I was introduced to AI foundry and Github Model Marketplace at Ignite and there seems like some overlap when evaluating models. Can anybody give me use cases for each, how they might work in concert, if AI Foundry can be used in lieu of Github Marketplace, or anything else useful about the Veen diagram of these two products.cjgallNov 20, 2024Copper Contributor114Views0likes1CommentGet Rewarded for Sharing Your Experience with Microsoft Azure AI
We invite our valued Microsoft Azure AI customers to share your firsthand experience developing with Azure AI by writing a review on Gartner Peer Insights. Your review will not only assist other developers and technical decision-makers but also help shape the future of our AI products.Thank you for your time and contribution, and we are excited to hear your thoughts! To Write a Review & Claim Your Reward: Read our blog for next steps.You will receive a $25 gift card, a 3-month subscription to Gartner research, or a donation to a charitable cause as a token of our appreciation.Carina_BustosNov 15, 2024Microsoft6.4KViews7likes2Comments
Resources
Tags
- AMA74 Topics
- AI Platform54 Topics
- TTS50 Topics
- azure ai services9 Topics
- azure ai6 Topics
- Community5 Topics
- ai5 Topics
- azure openai3 Topics
- azure machine learning3 Topics
- AI Search3 Topics