DirectShow and Beyond

When I wrote my first filter in the autumn of 1994, I really did not think that I would still be creating DirectShow filters 18 years later. DirectShow has not just lasted well, it has thrived. Present on hundreds of millions of computers, it is used by a wide range of developers, in cars, in lecture rooms and in brain surgery as well as more traditional entertainment. It has scaled well, despite the fact that, for most of the initial development cycle, we had no multiprocessor systems and no hardware acceleration to test it with.

But in recent months, most of my development work has been on other platforms. Not because DirectShow is less capable, but simply because it is not present on the platforms that count.

Media Foundation

Microsoft began its move away from DirectShow with the release of Windows Vista in 2007, which included an alternative multimedia platform: Media Foundation. This borrows heavily from the architecture of DirectShow, but is less flexible. Instead of the open graph, it aims to provide a more centrally-controlled model in which the components are more plug-ins than primary agents. This provides a more secure environment for playback of DRM-protected media.

However, the initial release of MF had far fewer features than DirectShow, and proved harder to use for many simple tasks — developers who’ve long complained about DirectShow’s steep learning curve will appreciate the irony of that. The result was that MF was little used and, for the life of Vista and Windows 7, DirectShow continued to be the architecture of choice for third-party developers.

With the benefit of hindsight, it looks as though Microsoft bet the wrong way. During the development of Vista, Microsoft tried to placate content owners by offering a secure system that would protect the content owners’ rights. The effect this had on third-party developers was ignored —Microsoft assumed that they would have no choice but to follow their edicts (or perhaps thought that third-party developers no longer mattered). Five years later, DRM seems to be of much less importance while platforms stand or fall on the strength of third-party software support.

Windows 8

In Vista and Windows 7, MF was an alternative to DirectShow. Windows 8 goes one step further. DirectShow is not available on Windows RT (the ARM version) or to any Windows Store apps (formerly Metro) on either ARM or Intel systems. These apps have access to a cut-down MF API. Only the Desktop apps, on Intel systems, can still use DirectShow.

Developers writing desktop apps for Windows 8 will have a choice between the Windows Store environment and the traditional desktop. It’s far from clear which will see the bulk of development effort. To me, as a user, the desktop environment feels like a legacy platform; users who drop into the desktop environment will feel as though they have been exposed to Window’s raw underbelly. But on the other hand, I don’t really want to run all my apps full-screen on my 27-inch display, and the gestures designed for touch-screen tablets do not work on a desktop.

It’s too early to say how much impact this will have on developers’ priorities. Upgrade rates for new versions of Windows are often quite slow: by June 2012, nearly three years after release, Windows 7 was installed on 600 million systems, compared to about 675 million systems still running Windows XP. And sales of Windows 8 are reported to be rather slower. So it’s likely that, for some considerable time to come, most Windows systems will be running XP or Windows 7, and DirectShow will be a useful tool for some years to come, even on Windows 8. But, ultimately, the writing is on the wall.

Mixing MF and DirectShow

There is a lot of similarity between MF and DirectShow, and it is certainly possible to mix components, with a certain amount of code. I’ve not had the need to run DirectShow filters in MF, but I’ve written wrapper filters for MF objects and it is not that difficult.

Apple

Of course one of the most significant reasons for the decline in importance of DirectShow is the return to prominence of Apple. Apple’s importance as a computer manufacturer, and the success of iOS, means that many developers are writing for Apple platforms as a first choice.

Even in the darkest days of the previous century, Apple computers were the preferred tool of most people dealing with computer-based video and audio, and the Quicktime framework long predates anything created by Microsoft. However, the move to 64-bit has not been executed flawlessly, and the divergence and subsequent realignment of OS X and iOS means that there are multiple choices of API frameworks for digital video software and, surprisingly, none are ideal.

There are three options: QuickTime, QTKit and AV Foundation.

QuickTime

The QuickTime API has been around since 1991 and supports a wide range of formats, including third-party container format support and third-party video and audio codecs. However, it is limited to 32-bit applications —no 64-bit API is (or will be) provided for QuickTime.

QuickTime X is available in both 32-bit and 64-bit versions, and includes support for MPEG PS and TS containers and mpeg-2 video decoding. However, there is no API to access QuickTime X directly. The C API provides access only to the QuickTime 7 framework .

QTKit

QTKit is an Objective-C framework that wraps QuickTime functionality. It looks as if Apple intended QTKit to develop into a replacement for QuickTime as an API, but has since abandoned that in favour of AV Foundation.

QTKit supports both 64-bit and 32-bit apps. In a 64-bit process, older 32-bit QuickTime codecs are still available: QTKit launches a 32-bit proxy process and opens the clip in that process. It will use both QuickTime 7 and QuickTime X, but QuickTime X and 64-bit functionality is only available for playback, not any sort of editing, transcoding or processing.

AV Foundation

Both QTKit and QuickTime are only available on OS X. AV Foundation is the media framework for iOS, and has been brought to OS X starting with 10.7 (Lion). It is an Objective-C framework that supports a range of playback, editing and transcoding features, for 32-bit and 64-bit applications.

However, it is limited in the containers and codecs it supports, and there is no option for third-party extensions to support other formats. It supports MPEG-2 PS and TS files (on input) and MOV/MP4 files with common codec formats (H264/mpeg-4, AAC/MP3), but there is no way to open MKV files or decode DNxHD streams, for example.

You also cannot access the encoders and decoders separately, with third-party container support. So, for example, if you want to encode video and write to a TS file, you must write to an MP4 file (which will take advantage of hardware-accelerated encoding), but then you must read the elementary stream data from the MP4 file to feed to your own TS multiplexor (it is possible, with a little work, to do this during encoding before the file is finalised, but it’s certainly a little fiddly).

If you need support for 3rd party containers or codecs, you will need to use Quicktime 7, either in a 32-bit C application or in Objective-C in 32-bit (or 64-bit, with a 32-bit proxy process for the decoding). Otherwise, you will probably choose AV Foundation (from Objective-C).