Accelerate PyTorch transformer model training with ONNX Runtime – a deep dive