Running Phi-3-vision via ONNX on Jetson Platform

Copper Contributor

Aug 05, 2024

I was able to sucessfully build the container using the docker files after removing the parallel flag. It took 30 minutes.

docker build --tag phi3_vision .
[+] Building 1963.8s (8/8) FINISHED                              docker:default
 => [internal] load build definition from dockerfile                       0.0s
 => => transferring dockerfile: 134B                                       0.0s
 => [internal] load metadata for docker.io/dustynv/onnxruntime:r36.2.0     0.0s
 => [internal] load .dockerignore                                          0.0s
 => => transferring context: 2B                                            0.0s
 => [internal] load build context                                          0.0s
 => => transferring context: 856B                                          0.0s
 => CACHED [1/3] FROM docker.io/dustynv/onnxruntime:r36.2.0                0.0s
 => [2/3] COPY build_genai.sh /tmp/genai/                                  0.0s
 => [3/3] RUN /tmp/genai/build_genai.sh                                 1922.4s
 => exporting to image                                                    41.1s 
 => => exporting layers                                                   41.0s 
 => => writing image sha256:82ebcdac2fc6a77810c1a10c8af3465107b53fe1ad2df  0.0s 
 => => naming to docker.io/library/phi3_vision                             0.0s

The .whl file also installed correctly when installing to Dusty's onnx image.

However, I think the build_genai.sh file forgets to include

pip3 install /ort/*.whl

Before installing, I got

Traceback (most recent call last):
  File "/home/phi3v.py", line 9, in <module>
    import onnxruntime_genai as og
ModuleNotFoundError: No module named 'onnxruntime_genai'

Unfortunately, after installing the whl file, and running the example script I got a different error than you posted

$ python3 phi3v.py -m cuda-int4-rtn-block-32
Loading model...
terminate called after throwing an instance of 'onnxruntime::OnnxRuntimeException'
  what():  /opt/onnxruntime/onnxruntime/contrib_ops/cuda/bert/tensorrt_fused_multihead_attention/cudaDriverWrapper.cc:42 onnxruntime::contrib::cuda::CUDADriverWrapper::CUDADriverWrapper() handle != nullptr was false. 

Aborted (core dumped)

I went ahead and installed cuDDN 9 but received the same error.

Blog Post

Running Phi-3-vision via ONNX on Jetson Platform