Mar 05 2023 04:44 AM
Hi,
I am hoping to run my python code that uses large AI models over large videos. Please can someone best advise me on the best Azure technology to use.
Currently I hosted my FastAPI python application on a web app (CPU), and now I am facing challenges since the requests need to be complete within 230 seconds. We have a microservices architecture and can easily deploy different API's (i.e long running tasks on different infrastructure).
However, the one task that takes extremely long, can't really be batched or broken down effectively meaning Durable functions are not an options. In addition the code must run on a GPU.
Naturally, this makes me think AKS could be an option. However, I see online that you can use a NVIDIA Triton server via Azure ML for real time inference
Durable functions are not an option
The requirements are:
- User can upload a video from our website (1080p) up to 1 hour long.
- GPU processes video extracting information
- Processing may take longer than, 230 seconds, there shouldn't be a time out for long request
- Scalable - Handle multiple requests from multiple users. Right now a 1 hour video can be processed in about 3 mins MAX with a large model, I'm guessing based on location, internet, and number of requests being sent this could extend.
Sometimes I am surprised that companies can open source chat gpt and have millions of users make requests at once, how are they building these real time inference systems!
Thanks,
Mar 06 2023 04:34 PM