it seems GPU is slower, what should be the reason?
(base) PS C:\Projects\Virtualization-Documentation\windows-container-samples\directx> docker run --isolation process --device class/5B45201D-F2F2-4F3B-85BB-30FF1F953599 winml-runner
Created LearningModelDevice with CPU device
Created LearningModelDevice with GPU: Intel(R) UHD Graphics 630
Loading model (path = C:\App\tiny_yolov2\model.onnx)...
=================================================================
Name: Example Model
Author: OnnxMLTools
Version: 0
Domain: onnxconverter-common
Description: The Tiny YOLO network from the paper 'YOLO9000: Better, Faster, Stronger' (2016), arXiv:1612.08242
Path: C:\App\tiny_yolov2\model.onnx
Support FP16: false
Input Feature Info:
Name: image
Feature Kind: Image (Height: 416, Width: 416)
Output Feature Info:
Name: grid
Feature Kind: Float
=================================================================
Binding (device = CPU, iteration = 1, inputBinding = CPU, inputDataType = Tensor, deviceCreationLocation = WinML)...[SUCCESS]
Evaluating (device = CPU, iteration = 1, inputBinding = CPU, inputDataType = Tensor, deviceCreationLocation = WinML)...[SUCCESS]
Binding and Evaluating 999 more times...
Results (device = CPU, numIterations = 1000, inputBinding = CPU, inputDataType = Tensor, deviceCreationLocation = WinML):
First Iteration Performance (load, bind, session creation, and evaluate):
Load: 86.8923 ms
Bind: 0.1449 ms
Session Creation: 40.6742 ms
Evaluate: 34.9017 ms
Working Set Memory usage (evaluate): 27.8672 MB
Working Set Memory usage (load, bind, session creation, and evaluate): 155.789 MB
Peak Working Set Memory Difference (load, bind, session creation, and evaluate): 187.664 MB
Dedicated Memory usage (evaluate): 0 MB
Dedicated Memory usage (load, bind, session creation, and evaluate): 0 MB
Shared Memory usage (evaluate): 0 MB
Shared Memory usage (load, bind, session creation, and evaluate): 0 MB
Average Performance excluding first iteration. Iterations 2 to 1000. (Iterations greater than 1 only bind and evaluate)
Average Bind: 0.0815504 ms
Average Evaluate: 26.6584 ms
Average Working Set Memory usage (bind): 3.91016e-06 MB
Average Working Set Memory usage (evaluate): 0.00594735 MB
Average Dedicated Memory usage (bind): 0 MB
Average Dedicated Memory usage (evaluate): 0 MB
Average Shared Memory usage (bind): 0 MB
Average Shared Memory usage (evaluate): 0 MB
Binding (device = GPU, iteration = 1, inputBinding = CPU, inputDataType = Tensor, deviceCreationLocation = WinML)...[SUCCESS]
Evaluating (device = GPU, iteration = 1, inputBinding = CPU, inputDataType = Tensor, deviceCreationLocation = WinML)...[SUCCESS]
Binding and Evaluating 999 more times...
Results (device = GPU, numIterations = 1000, inputBinding = CPU, inputDataType = Tensor, deviceCreationLocation = WinML):
First Iteration Performance (load, bind, session creation, and evaluate):
Load: 86.8923 ms
Bind: 1.3861 ms
Session Creation: 4486.82 ms
Evaluate: 38.5659 ms
Working Set Memory usage (evaluate): 65.3438 MB
Working Set Memory usage (load, bind, session creation, and evaluate): 245.676 MB
Peak Working Set Memory Difference (load, bind, session creation, and evaluate): 186.244 MB
Dedicated Memory usage (evaluate): 0 MB
Dedicated Memory usage (load, bind, session creation, and evaluate): 0 MB
Shared Memory usage (evaluate): 0 MB
Shared Memory usage (load, bind, session creation, and evaluate): 0 MB
Average Performance excluding first iteration. Iterations 2 to 1000. (Iterations greater than 1 only bind and evaluate)
Average Bind: 0.260335 ms
Average Evaluate: 27.0888 ms
Average Working Set Memory usage (bind): 0 MB
Average Working Set Memory usage (evaluate): -0.0607131 MB
Average Dedicated Memory usage (bind): 0 MB
Average Dedicated Memory usage (evaluate): 0 MB
Average Shared Memory usage (bind): 0 MB
Average Shared Memory usage (evaluate): 0 MB