The MP4 mux now supports re-ordered frames using a CTTS table.

Timestamps in an MP4 file are derived from a list of durations. The time for each sample is simply the sum of the durations of all samples that come before it. This works fine for most types of data in MP4 files, but is not sufficient in some cases, where the frames need to be decompressed in a different order.

Many video compression schemes (such as mpeg's IPB mechanism) allow frames to refer to key frames that have not yet been displayed. For example, a typical MPEG-2 video sequence has I and P reference frames bracketing several B frames; the B frames contain only the changes from the two reference frames. This means that the decoder must decode the reference frame at the end of the group before it can decode the B frames in the middle. To allow this, the compressed frames are stored in a different order, and there are two sets of timestamps, one indicating the time to decode this frame, and the other indicating the time to present it. This frame re-ordering is supported in mpeg-4 video, but is very rarely used. It is, however, sometimes used with H.264 videos.

The MP4 sum-of-durations time gives the decode time. If the frames are not re-ordered, then this is also the presentation time. However, for re-ordered frames, an offset is needed for each frame which converts the decode time into presentation time; in an MP4 file, this offset is called the composition time offset and is contained in the CTTS table. The MP4 demux always supported this table; however, until now, the mux was unable to create it.

In DirectShow, the mux filter does not receive decode times. The timestamps attached to the IMediaSample objects are presentation times. Each sample has both a start and end presentation time, but in many multiplexing graphs, these are not very useful: the end time is either wrong (typically start+1) or it is simply derived from the start of the next sample. For this reason, the mux ignores the sample end times and creates the duration table just from the sample start times. However, in the case of frame re-ordering, it is possible to recreate the decode time from the timestamps if both start and end times are valid.

The latest version of the mux ( looks at the first few sample timestamps and decides which mode to use. If both start and end times look reasonable and some of the start times are out of order, the mux will create a CTTS table. Otherwise the mux will behave as before, using just the sample start times to create only the duration table.

25 March 2009