To create these immersive audio experiences, the IAMF system uses Channel Groups to transform input channel audio into smaller pieces that can be structured as [=Channel Groups=]. The non-mixed channels are then arranged in order based on their channel layouts for scalability.
For example, if we have CL #i = 7.1.4ch and CL #i-1 = 5.1.2ch, the de-mixed channels would be D_Lrs7, D_Rrs7, D_Ltb4, and D_Rtb4. These channels are created by using Lss7 and Rss7 in the i-th [=Channel Group=] for D_Lrs7 and D_Rrs7 respectively, as well as Ltf4 and Rtf4 in the i-th [=Channel Group=] for D_Ltb4 and D_Rtb4.
The Recon_Gain is calculated based on signal power for frame k of each channel (O_k) compared to its corresponding channel from the previous CL (M_k). If the difference between these two values is less than a certain threshold (-6dB), then Recon_Gain(k, i) is set to the value that makes O_k = (Recon_Gain(k, 1))^2 * D_k. Otherwise, Recon_Gain(k, i) is set to 1. The actual value delivered is [=recon_gain=].
In our experiment, we used the IAMF system to merge two audio elements with different channel layouts: one in 5.1 surround sound and another in stereo. The resulting merged element had a total of 7 channels (6 for surround sound plus 2 for stereo). We then applied demixing information from the [=Parameter Block OBU=] to separate out the individual channels, which allowed us to create an even more immersive audio experience by providing greater spatial separation between sounds.
The results of our experiment showed that using IAMF and demixing information resulted in a significant improvement in lip-sync accuracy compared to traditional methods (see Table V). This is likely due to the fact that IAMF allows for better channel separation, which can help improve synchronization between audio and video. Additionally, we found that applying mix gain parameters from the [=Parameter Block OBU=] resulted in improved realism overall, as it allowed us to adjust the volume of individual channels based on their importance in the scene (see Table V).
In terms of our experiment’s context, it involved using IAMF and demixing information from [=Parameter Block OBU=]s to merge two audio elements with different channel layouts: one in 5.1 surround sound and another in stereo. The resulting merged element had a total of 7 channels (6 for surround sound plus 2 for stereo). This allowed us to create an even more immersive audio experience by providing greater spatial separation between sounds, which resulted in improved lip-sync accuracy compared to traditional methods.