About how the segment duration and each track's duration are calculated
Looking at the new track statistics tags generated by mkvmerge you may notice that neither of the track's duration matches the segment duration in the global segment info block.
Calculating the segment duration
mkvmerge has always calculated the segment's duration (the file's duration) with a simple metric: it's the difference between the highest end timecode of all the packets in the segment and the lowest start timecode of all packets in the segment. The end timecode of a packet is calculated as its start timecode + its duration.
Note that the very last packet in the segment doesn't have to be the one with the highest end timecode. Due to the packet ordering of B frames (packets are ordered in decode order but timestamped in display order meaning you can get sequences such as I@0ms P@120ms B@40ms B@80ms…) earlier packets may have higher end timecodes than the last packets.
Note also that the packet's duration is often not immediately obvious from looking at e.g. mkvinfo's output. A packet's duration can be derived from multiple factors:
an explicit duration element present in the packet's block structure,
the track's default duration,
the implicit difference between the current and the following packet in display order.
Calculating each track's duration
The method for calculating a track's duration matches the one for calculating the segment's duration: it's the difference between the highest end timecode of all the packets from this track in the segment and the lowest start timecode of all packets from this track in the segment. The only difference is that only packets for this track are considered during the calculation while the calculation for the segment's duration looks at all packets regardless of which track they belong to.
Putting it all together
Most of the time no track will have exactly the same duration as the segment. The video and audio tracks may come close, but often enough the audio starts a bit later than the video but extends a bit further.
Another factor are subtitle tracks. Often their very first packet occors after quite some time has elapsed already (e.g. after the first couple of minutes), and their last packets are usually visible long before the video as a whole ends (e.g. before the credits). Therefore subitle tracks are much shorter than the whole segment.