Suggestions for full xHE-AAC compliance (IPFs, IFs, MPEG-4 ISO file format)
Hi Christian,
I've been playing around with xHE-AAC format lately.
My suggestions for Exhale development is to achieve full xHE-AAC MPEG-4 ISO compliance:
- Exhale should write at least one IPF to the beginning of the file for native gapless xHE-AAC playback (gapless without edit-list)
- IPFs are only true random-access points, so IPFs (only IPFs!) should be written to stss-atom
- IFs should be listed in sbgp-atom "AudioPreRollEntries" with correct "RollDistance" 'prol' (Standard Delay [encoder+decoder delay] / Frame Length) in sgpd-atom
https://www.iis.fraunhofer.de/content/dam/iis/de/doc/ame/wp/FraunhoferIIS_xHE-AAC_Whitepaper.pdf
"File-format (ISOBMFF/MP4FF) Storage of xHE-AAC in the ISO base media file format (ISOBMFF)[ISOBMFF] follows the same principles as AAC-LC and HE-AAC, i.e. the MP4 file format [MP4FF] is used. All IPFs are signaled by means of the “SyncSampleBox”. IPFs allow the decoder to fully reconstruct the signal without any previous AUs, which enables true random access at any sync sample. This is particularly useful when a flat MP4 file is used as input to a streaming system for subsequent fragmentation. Signaling of IPF is mandatory for xHE-AAC. Since the xHE-AAC encoder works on a fixed “granule” of e.g. 2048 audio samples, the last AU of an MP4 file usually represents only the last few samples of the original WAV file. In order to restore this original file length, an edit-list can be used to trim the MP4 file accordingly. It is recommended that an xHE-AAC file starts with an IPF, which addresses the “priming” issue (see above) and removes the need for edit lists at the start of the item. In addition to the rather expensive IPFs, all AUs that have the usacIndependencyFlag set to 1 can be used to enable random access, e.g. for seeking operations. While these Independency Frames (IF) can be used to start decoding, a full audio signal is guaranteed only after decoding a certain number of AUs. This is referred to as roll distance in file format terms and can be signaled using the AudioPreRollEntry and the AudioSampleGroupEntry respectively."
Otherwise, looks good, sounds great. I am excited to see/hear your implementation of eSBR for the low bitrate modes.
Jukka Poikolainen Poikosoft https://www.poikosoft.com