This isn't real world as the machine you selected has a 1200 MiB/s throughput. I mean this is all fine and well for a company who can afford a 64 core machine and SQL license for it, but getting down to real life the most commonly deployed server used is going to be an 8 to 16 core machine. Striping 2 P30s to get 10K PIOPs sounds great, but let's look at the machine you chose. The E64_v3 has Max uncached disk throughput IOPS - 80000 / MBps - 1200 for 64 cores. A more realistic DS13_V2 has 25600/384.
That 384 number is what I am concerned about. A single P30 disk can do 5000 PIOPs and has a provisioned throughput of 200MiB/s. So right there if you choose a server with lower cores such as the DS13_V2 and stripe 2 P30s your disk throughput already outperforms your VM selection. Since you had room to go you can add more P30s. Very interesting how "doubling" your P30s still only got you 200MiB/s throughput which means you had to have 4 P30s to get the published provisioned throughput of a single P30. Which again begs the question even though you had 4 P30s why did you cap just under the published throughput limit of a single P30 when you should have seen more throughput.