JamesKehrYeah, it's a common misconception that a bigger number means better. In iperf's case, the number indicates different development teams with different goals. Iperf 3 is a misnomer as it's a not a follow on to the original iperf. Iperf 2 is based from the original code.
I'd suggest trying iperf 2 on Windows. We're doing some 2.2.1 work around 64b binaries that should be released soon. Some engineers at Meta are helping. That might be worth evaluating when it's released within a few weeks.
Also, something you might want to mention is that only iperf 2 has statistical versification. We test across lots of test beds and multiple days before doing a release. The tool needs to come up with the same numbers otherwise it's broken. We use https://www.itl.nist.gov/div898/handbook/mpc/section2/mpc221.htm to make sure the tool performs the same. My guess is most other tools don't do this level of qualifications of the tool itself.
Finally, it might be good if Windows systems added support for latency testing. This is non trivial but very important. All the features can be found in the https://iperf2.sourceforge.io/iperf-manpage.html. You can reach out to me on sourceforge if Windows developers want to help take this on.