Hi hongsod thanks for your reply.
It's not just about bandwidth, but also about low-pass cutoffs that are almost always present to reduce the load on audio codecs. The factors influencing digital audio quality are:
- bitrate
- Variable bitrate constraints
- actual codec used (modern codecs such as Opus should be near CD-quality at 128kbit, but older ones such as MP3, Vorbis and Sylk aren't)
- VBR vs. CBR (former is harder to manage in unreliable video streams)
- Bandpass filters optimized for voice (=narrowband) vs. music (=wideband)
Stop by HydrogenAudio for some excellent discussions in to digital audio quality or Opus Codec (opus-codec.org) for something more specific.
Of course the audio I/O hardware plays an important role. But our clients typically use premium Surface devices or external mics designed for ASMR.
As mentioned, "impulse" or "noise" type sounds need very high accuracy to sound ... real. "S" and "T" are classic, as well as cymbals and other percussion. Breathing sounds are also hard to reproduce accurately with typical filters.
Also: AI is often designed to ELIMINATE a lot of the extraneous sounds we make while speaking in an attempt to make speech more understandable. In 90% of the time, this is super useful. But in certain use cases, you WANT it to sound as if someone is whispering in your ear. Good benchmark: Norah Jones - "Nightengale". If you can make that transmit over Teams and still sound great - you're awesome.
We realize that UHQ audio is a fringe use case compared to "normal" Teams customers. But bear in mind that Zoom supports audibly better quality by default - without a performance or resource penalty. Modern devices can handle the minute extra CPU (NPU?) load easily and internet bandwidth shoudn't be an issue either. We're talking a few extra kbit or 1-2 mbit/s maximum. And re-allocating some bits from video to audio shouldn't be hard either.
You can assume that customers who care about HQ audio will also have matching hardware for it. 
Others would like to broadcast their school's music and band performances and have no other option because Teams is the only permitted calling tool.
Please consider this. And also an option to enable it by default - unilaterally please! (= the meeting organizer determines the setting for everyone on the call by default, participants can disable optionally)