Introduction
This article discusses from the end-user perspective, how to play HD video on Windows based desktops/laptops most efficiently. I am strictly a HD video consumer I dont do video editing nor am I a programmer or a subject matter expert in the area. This documents what I think is correct from by wiki and google searches, also from my practical experiment on the computers I have access to. However I am quite sure I would have the wrong understanding here and there, if the gurus out there can point out the mistakes I am happy to update this going forward.
This document only addresses Windows based computers. It does not apply to Apple OSX or Linux, for reasons that will be made clearer further down.
Why we bother at all
Playing high definition video (1080i or 1080p) can be taxing on even high end dual or quad core computers, especially when the player is poorly configured. On the new wave of netbooks and CULV laptops it may even seem impossible to play smoothly without choppiness.
Most computer and GPU (video card) vendors make claims about different video acceleration capabilities but they are without exception extremely vague. There is also no easy way for the user in front of the computer to verify whether hardware acceleration is utilized or not when playing video. Trouble shooting and support from the vendors is non-existent.
Disclaimer
Every thing that I am writing from here, I only learnt in the last two weeks. I was a complete noob is the area. A significant driver of writing this is for myself to organize all the threads of thoughts that I have had and to make sure that I have someplace to re-read this after Ive completely forgotten about it in a few years time. It also says here that I do not claim to be a guru in the subject, just that the application of my knowledge has led to repeatable and predictable results in my video playback experience
The Objective
I start out with this objective from the beginning to utilize the GPU as much as I can during video playback so that I can keep CPU utilization as low as possible. A partially busy GPU and partially busy CPU is much more preferable than an idle GPU and a maxed out CPU.
I plan to do two parts First to discuss the technology involved in video playback configurations, and the second discussing some practical application of the theory using my favourite media players GOMPlayer, Media Player Classic and Windows Media Player.
The Easiest Way Forward for the Disinclined
This article will be quite long. If you are already bored, do know that the simplest and foolproof way to achieve the objective is to use the built in Windows Media Player! Its feature set is severely lacking most notably very poor support for keyboard controls but in terms of tight integration between the O/S, Video Player and the GPU, nothing I know beats WMP.
All you have to do is to make sure DirectX Video Acceleration is enabledI It is just a checkbox in Options -> Performance -> Advanced
Of course it does not guarantee smooth playback, as it is possible that your computer in its current hardware configuration is simply not up to the task. To find out for sure, please read on.
The Real Journey Begins Here
First it is important to understand what video playback involves. For the end users, the simplest way to describe playback is:
Start from a video file (*.wmv, *.mp4, *.avi etc), end with moving picture on the computer screen and sound from the speakers.
The end user can be oblivious to the processes that go from start to the end, but the computer must know exactly how it gets from A to B. The fun starts here.
Video Playback Frameworks
Different Operating Systems have different pathways in taking a video file and turning it into video and audio output. Thats why I can only address Windows based computer here. What I write here is mostly like not applicable to OSX and Linux, I dont know enough about them to make a judgment.
Windows (Microsoft) have current 2 distinctly different frameworks (the technical term for pathways) on video playback. The first is called DirectShow (DS below), which is introduced in Windows XP and is continued in Windows Vista and 7. The second is called Media Foundation (MF below), which is introduced in Windows Vista and used in Windows 7. Microsoft introduces MF with the expressed intent to replace DS in the future. We are right in the middle of the transition period so both DS and MF are used.
To summarize, XP only uses DS, while Windows Vista and 7 can use both DS and MF.
In terms of players, as MF is still quite new, the only media player based on MF is Windows Media Player on Vista and 7. It also has backward compatibility on DS. Most other (free) players such as Media Player Classic (MPC), GOMPlayer, KMPlayer are purely DS based. I am not sure if MF support is forthcoming on those players.
As Windows Media Player has next to none configuration option, there is not a lot to learn to dissecting the MF framework. Lets now concentrate on the DS framework.
DirectShow (DS) Video Playback Framework
DS breaks down the video processing in to distinct steps. They can be summarized as split, decode, and render
Split
A video file (think *.avi, *.wmv, *.mp4, *.mkv etc) is actually a container with multiple streams of data in it. Have you had the experience that the video and audio outputs are slightly out of sync, like a split second faster or slower? That is because the video and the audio are stored in the file in separate data streams, and it is a requirement on the media player to make sure the two streams are presented to the end user in sync. When the player does a poor job at that, an out of sync output is the result.
A typical video container would have a video stream and an audio stream. Some newer format like mkv also have the capability to include a subtitle stream thats why you can toggle subtitles on and off, they are not in fact part of the video stream.
The video and the audio needs to be splitted from one another, to be fed to the decoder in the next phase. This is typically a piece software, usally referred to as the source filter (a.k.a splitter). Different video formats require different source filters. Windows has wmv splitter built in that can be used by Windows Media Player. Microsoft also has opened the internal splitter for other media player to leverage off. They are known as WM ASF Reader and Windows Media Source Filter. For the other file type you need other splitters. One very popular splitter is the open source Haali Media Splitter, which is used to split mp4 and mkv files among many others.
Decode
All video files are highly compressed. How high? Take a HD 1920x1080p, 30 frames per second video for example. The number of pixels displayed per second is:
1920x1080x30 = 62,208,000 (62 mega) pixels per second. On 32-bit color depth, thats 62,208,000*32 = 1,990,656,000 bits, or 236 MByte per second! Every minute of uncompressed video would occupy 14.2GB of your hard drive.
So the GPU must paint 62 mega pixels on the computer screen every second during full screen HD playback. Incredibly even the oldest and lowest end GPU has no problem achieving that. A GPUs Fill rate indicates the number of pixels it can draw per second on the 4 year old Intel integrated GMA950, the fill rate is 1.6G so the 62 mega pixels required by 1080p video uses only about 3% of that capacity. However the file size is so large that the bottleneck is in transmitting and storing the video. They need to make the file a lot smaller.
A codec is the algorithm used to compress and decompress the raw video and audio. Note that video and audio are processed by separate codec (see Split above). For the moment concentrate on the video codec.
The common codecs include H.264 (in newer mp4 files and mkv files), divx and xvid (in avi files), MPEG1 and MPEG2. The codec used for wmv is also called wmv, but there are now three generations of wmv codecs (called wmv1, wmv2 and wmv3) that offers progressively better compression.
In terms of compression performance, H.264 and wmv3 is heads and shoulders above the others, hence most of the high definition video you can find on the internet would be of either format. DivX and Xvid are older algorithm when the video resolution was smaller and is now rarely used. MPEG2 is rarely used in the internet as the file size would be huge, but it is used in over the air digital TV.
A one hour H.264 full HD file is about 1.5GB. An equivalent wmv3 is about 1.8GB. If uncompressed, the file would be 14.2GBx60 = 852GB in size! A MPEG2 full HD file, in comparison, would be over 4GB if I remember correctly. Not hard to see why H.264 and wmv3 are dominating the codec war.
Note that each decoder is married to the framework its designed for. Therefore, Media Foundation (MF) Framework uses distinct decoders to DirectShow (DS) Framework. Due to it being in place a lot longer, there are a lot more choices in decoder for DS. As MF is only used in WMP for Vista and W7, MF based decoders are only used on WMP. That may change later when external MF based player starts to emerge.
Even in the DS world there are multiple decoders that do the same thing. For example, GOMPlayer comes with its own set of internal decoders for the common video formats, but also allows you to use your own nominated decoder. One Common DS based decoder is FFDShow, which has a decoder (called Filter) for just about every video format known to man. You can also install codec pack that bundles the Splitter and Decoder together. One good example is Combined Community Codec Pack (CCCP).
As there are multiple decoders that do the same thing, your computer needs to know which one to choose when given a video stream. In DirectShow, thats is done by decoder merit which is a 8-digit hexadecimal number. The decoder with higher merit gets chosen first. The problem is, the decoder developer gets to specify the decoder merit himself! So inevitably people start using progressively higher merits and the system is blown to bits. Thankfully most media player lets you override the merit system and choose your own preferred decoder.
Compressed video must be decompressed before it can be shown on the computer screen. And there lies the catch. A high performance codec that results in small file size also requires a lot of processing power to decompress. All bottlenecks in modern HD video playback center around this, and this is what GPU Hardware Video Acceleration targets to overcome.
But before we look at decoding further, lets briefly cover the last part of video processing render.
Render
After the decoder has decompressed the video stream, its time to put that picture on the screen. But wait a minute. What if the video is 1920x1080, but the laptop screen is only 1024x768? What if there is also movie subtitles and special effects to be put on top? The massage of raw video data into screen output is done by the renderer.
As time goes by the requirement for renderer has evolved rapidly. Microsoft renderer has been included in the DirectX suite. It has gone from Overlay Mixer in DirectX 6, to Video Mixing Renderer 7 at the beginning of XP era in 2001, to Video Mixing Renderer 9 in DirectX 9, to Enhanced Video Renderer in Windows Vista and Windows 7. Enhanced Video Renderer supports both the newer Media Foundation (MF) and DirectShow (DS) Frameworks, whereas the older renderers only support DS Framework. This is an important distinction, and it will be made more clear further down.
GPU Hardware Video Acceleration What is it
As discussed above the video playback constraints is on the decoding. Splitting and Rendering are relatively stress free tasks.
The decoder is typically a piece of software, so it consumes CPU cycles when at work. Two problem arises (1) Most current software decoder are single thread, so it limits the amount it can process to the clock speed of just one core. For the CULV CPU, there is not enough raw CPU power on one core to decode full 1080p H.264 video. Just take that from me as fact (2) Even if your CPU can muster enough power to decode the video, your CPU utilization goes through the roof, generating a lot of heat and dramatically reduces battery life.
GPU to the rescue. Intel, Nvidia and ATI all started implementing a hardware decoder on their integrated and discreet graphsics solution. It is a piece of hardware video component on the video card (GPU) specialized in decoding video. This decoder component has nothing to do with 3D processing, so it has zero impact on games. It is there just to process video stream. Generally, the hardware video accelerator on a low-end graphics card and a high-end graphics card from the same vendor is the same. There can be small difference (usually around post-processing, i.e. prettying up of the video output) but its not important for the purpose of this article.
Therefore, as a general rule, you get the same performance in video playback on a $50 GPU and a $200+ GPU!
Different vendor have different implementation of it Nvidia calls its version PureVideo and ATI calls its version Avivo. Even Intel Integrated GPU have hardware acceleration components for years, even on the ancient GMA950 on the 4+ year old laptops and made popular again on the Atom based netbooks.
As the hardware acceleration component is a specialized piece of hardware, they are exactly that they can only process video that its programmed to process. The oldest hardware decoders only decode MPEG1/2, whereas the newer ones decode WMV3 and/or H.264 also.
In my disposal I have laptops with Intel GMA950, Intel X3100, Nvidia 9400M, and desktops with Nvidia 8600GTS and ATI4350. So I downloaded a freeware called DXVA Checker, which can tell me what hardware decoders have been implemented in the video card. This is the results
Intel GMA950 MPEG1/2 decoding only.
Intel X3100 MPEG1/2 (meh) plus WMV3 but not H.264
Nvidia 8600GTS and 9400M, ATI 4350 MPEG1/2, WMV, H.264 (thats everything I want)
As the Nvidia and ATI cards I have are all low end cards, I am confident that any 2 year old or newer card from either company would have all the hardware decoding youll need. I also have very good reasons to believe that the Intel X4500 would have the same video decoding support as Nvidia and ATI, as X4500 claims to fully support hardware acceleration Adobe Flash 10.1. But I cant test it myself.
But hardware is not the be all and end all.
Obviously, a piece of hardware cannot do any thing by itself unless its given instructions. That communication is performed by the GPU driver. At the same time, the software decoder needs to know that (1) there is help available in the form of GPU video decode; and (2) how to talk to the GPU driver to utilize the video decoder. To achieve this, Microsoft introduces a set of application programming interface (APIs) called DirectX Video Acceleration (DXVA). You probably have seen DXVA in many other places already.
DirectX Video Acceleration (DXVA)
DXVA is the bridge between the software decoder and the GPU drivers. Intel, Nvidia and ATI graphics drivers all have support for DXVA.
Remember the two Microsoft video frameworks discussed above? There are two distinct DXVA standards, DXVA1 and DXVA2. DXVA1 only support DirectShow (DS) Framework, whereas DXVA2 can support both DS and Media Foundation (MF) Framework.
DXVA2 is capable of offloading a lot more video decoding tasks to the CPU than DXVA1. In other words, DXVA2 results in much lower CPU utilization compare to DXVA1.
Further remember that different frameworks require different renderers DXVA1 only works with the VMR7 and VMR9 (Video Mixing Renderers), whereas DXVA2 only works with EVR (Enhanced Video Renderer)!!
Another thing to note is that different combination of DXVA and Framework has different video decoder capability! I dont know why it is like that, but this is a table on all the combinations available:
DirectShow + DXVA1: WMV1,2,3 Support, No H.264 Support (Needs VMR7/VMR9 Renderer)
DirectShow + DXVA2: No WMV support, H.264 Support (Needs EVR Renderer)
Media Foundation + DXVA1: Not Implemented
Media Foundation + DXVA2: WMV1,2,3 Support, H.264 Support
Like any other interfaces, both the software decoder and GPU drivers needs to be DXVA aware. The latest GPU drivers from Intel, Nvidia and ATI already support both DXVA1 and 2. What about software codec?
Put simply, most existing software decoders do not support DXVA. But thats fine. All we need is one that does for each video format.
e.g. FFDShow does not support DXVA, but the team has released a new a separate ffdshow DXVA Decoder that does what the name suggest. Its included with the base FFDShow as a package. I have added the software decoders that I know would work for the combination:
DirectShow + DXVA1: WMV1,2,3 Support, No H.264 Support (Needs VMR7/VMR9 Renderer) (Requires Decoder WMVidea Decoder DMO, which is the built in Microsoft DirectShow Decoder for WMV)
DirectShow + DXVA2: No WMV support, H.264 Support (Needs EVR Renderer)(I suggest ffdshow DXVA Video Decoder)
Media Foundation + DXVA1: Not Implemented
Media Foundation + DXVA2: WMV1,2,3 Support, H.264 Support
(MF requires Windows Media Player for Vista and 7, no other configuration needed)
When the end user apply the above configurations they need to also consider the hardware capability on the video card that I listed earlier. Remember both the hardware and the software need to support hardware video decoding for the end to end to work.
This is the end of the theory. For the inquisitive, this is all you need to know to have a play with your favorite DirectShow based Media Player configure DXVA. I shall go into some practical configurations in part 2, as I think I have already seriously stretched every ones attention. Thank you if you have read up to here, your feedback and corrections will be very welcome!
-
Great work.
BTW can you explain why 1080p 30FPS WMV file start sucking my free memory when i try to fast forward? (WMP/MPC Both have this issue)
Every time when I drag it to forward WMP hang and eat my memory." Don't have enough memory please close other applications..."
Eg: If movie is 2Gb, WMV will eat 2Gb system memory before close the video
System spec : ATI HD4200 IGP With 256MB DDR2 With Latest 10.3 driver /AMD Turion II M500/4GB RAM/Win7 x64 -
bigspin,
Thanks. I don't have experience with ATI IGP so I am guessing here. When I wrote earlier that a 1080p video uncompressed takes over 200MB of data per second - That's data that needs to reside in the video card memory to be pushed to the video renderer. As the IGP uses system memory, that needs to come out of your 2GB. Fast forward would drain the available ram very quickly if the video driver does not dump old data fast enough, and virtual memory has no chance at all in keeping pace.
If there is an option to limit the amount of RAM the IGP can use, that should help your problem, but it also mean that your fast forward will become choppy, but that's still better than crashing. -
-
If you suspect it's encoding issue, there is a very old freeware called "ASF Tools" that can sometimes fix problematic WMV files. Over the years I have roughly 50% success rate in fixing broken (non-seekable, or have bad blocks) wmvs. For free software it's not bad. This is a link:
http://www.softpedia.com/get/Multimedia/Audio/Other-AUDIO-Tools/AsfTools.shtml -
-
ViciousXUSMC Master Viking NBR Reviewer
I have never used gpu accelerated playback and never had issues even while multi tasking. Used to be the codecs to use it were bad for quality, cost money, or limited to players that I did not like.
It has come a long way since, but I wonder if it really has become equal to standard ffdshow.
I do use EVR now instead of VMR7/9 so looks like I am half way there. -
ViciousXUSMC, I think you are right, I did read elsewhere that Atom handles 720p fine. Of course 1080p is more than double the bit rate so it completely blows it away.
Also, not all 720p are made equal. H.264 format will requires more CPU than WMV, for example. Also most codec is single thread (one exception is WMP which offload some processing to DWM.exe, sort of like pseudo multi-tread). Multi-core does not help there if your one core cannot handle it.
If your netbook uses the z-series Atom instead of N, the GMA500 on it have some hardware acceleration capability that is automatically activated when you use Vista/7 WMP. In that case it may even play 1080p okay for some video format. I never had one to test on though.
I was thinking like you for years in fact. I only started looking into this a few weeks ago, and only only serious in the last 2 weeks. I also cried tear of joy when I was playing 1080p wmv and H.264 and looking at my CPU (P8600) running at 8%. Temperature dropped significantly as a result. -
-
A nice article. Thanks for the writeup. Waiting for the next part
-
ViciousXUSMC Master Viking NBR Reviewer
But yeah when I said 720P on the atom I meant H264, its the only codec I would EVER use for HD stuff. Well its the only codec I ever use period but all HD content I have ever downloaded was in H264.
If somebody did a bad job encoding a video it may have way more bitrate than what it needs for perceivable quality, its common with new encoders to do this. So you can find those 1080p files that just do not play well. Its not even just the bitrate its the profile used for H264, things like number of b frames for example can make it a lot harder to play and something you wont know only the encoder can (well if they know, they probably used a push one button encoder and picked the highest preset profile)
So while I wont say the argument is void about how not every file can be played with just the cpu, do take that to mind, and I can definitly produce a video file that you cant play even with your gpu acceleration just to show how much it can very depending on encoding settings, while also producing an identical looking file that will play on an athlon64 without a sweat. -
The only way my computer can play full HD without stuttering is using CoreAVC with Overlay Mixer as the renderer in MPC-HC. All of the Windows options (EVR, DirectX 7,9) make the video stutter.
Even with CoreAVC, the CPU usage is 80-100%, which I wouldn't expect for a T7700. Of course, I guess that chip is sort of obsolete now.. I mean, it did come out a WHOLE ENTIRE 2.5 years ago.. -
-
An introduction to HD video playback using hardware acceleration - Part 1
Discussion in 'Hardware Components and Aftermarket Upgrades' started by eyclai, Feb 26, 2010.