I was able to sucessfully build the container using the docker files after removing the parallel flag. It took 30 minutes.
docker build --tag phi3_vision .
[+] Building 1963.8s (8/8) FINISHED docker:default
=> [internal] load build definition from dockerfile 0.0s
=> => transferring dockerfile: 134B 0.0s
=> [internal] load metadata for docker.io/dustynv/onnxruntime:r36.2.0 0.0s
=> [internal] load .dockerignore 0.0s
=> => transferring context: 2B 0.0s
=> [internal] load build context 0.0s
=> => transferring context: 856B 0.0s
=> CACHED [1/3] FROM docker.io/dustynv/onnxruntime:r36.2.0 0.0s
=> [2/3] COPY build_genai.sh /tmp/genai/ 0.0s
=> [3/3] RUN /tmp/genai/build_genai.sh 1922.4s
=> exporting to image 41.1s
=> => exporting layers 41.0s
=> => writing image sha256:82ebcdac2fc6a77810c1a10c8af3465107b53fe1ad2df 0.0s
=> => naming to docker.io/library/phi3_vision 0.0s
The .whl file also installed correctly when installing to Dusty's onnx image.
However, I think the build_genai.sh file forgets to include
pip3 install /ort/*.whl
Before installing, I got
Traceback (most recent call last):
File "/home/phi3v.py", line 9, in <module>
import onnxruntime_genai as og
ModuleNotFoundError: No module named 'onnxruntime_genai'
Unfortunately, after installing the whl file, and running the example script I got a different error than you posted
$ python3 phi3v.py -m cuda-int4-rtn-block-32
Loading model...
terminate called after throwing an instance of 'onnxruntime::OnnxRuntimeException'
what(): /opt/onnxruntime/onnxruntime/contrib_ops/cuda/bert/tensorrt_fused_multihead_attention/cudaDriverWrapper.cc:42 onnxruntime::contrib::cuda::CUDADriverWrapper::CUDADriverWrapper() handle != nullptr was false.
Aborted (core dumped)
I went ahead and installed cuDDN 9 but received the same error.