libvpx 1.0.0 Duclair: VP8 Video Encoding Benchmarked

Google's WebM project team has recently launched version 1.0.0 "Duclair" of the VP8 codec SDK. The build has added some new features for streaming and promised a further little performance enhancement. So it's time to have a quick look at the state of the VP8 video encoding performance.

The benchmarks were done with the Oil Rush gameplay video (1280x720, 30 fps), if you want to compare your own system, you can just download it.

The video was re-encoded with ffmpeg 0.7.8 with single pass VP8 encoding and a target video bitrate of 5 Mbit/s. The following command was used:

    time ffmpeg -benchmark -i oilrush10_mphd.webm -vcodec libvpx -threads x -b 5M -acodec copy -f webm -y foo.webm

There after the total number of frames was divided due the total run time to get the average number of frames per seconds. Note, the run time also includes the processing time for demuxing, decoding, copying, muxing and file operations.

There were two different systems used:

    * Intel Core i7 920, 2667 MHz, 4 cores, 8 threads, 8 GiB DDR3 RAM, unknown memory speed, MSI X58 Pro-E (MS-7522) BIOS 8.6, 3.2.1-gentoo-r2 x86_64, preemptive, 1000 Hz, GCC 4.5.3

    * AMD Phenom II X4 955 BE, 3200 MHz, 4 cores, 4 threads, 8 GiB DDR3 10600, ASRock M3N78D BIOS 1.61, 3.1.0-gentoo x86_64, preemptive, 1000 Hz, GCC 4.5.3

The results are composed into the following graph.

You can see, that the Intel and AMD processor have both a similar performance, but keep in mind, the AMD processor needs a over 500 MHz higher clock speed to catch up.

The VP8 encoder utilizes multi-threading, but the performance enhancement from 1 thread to 4 threads is far away from the theoretical optimum of 4 times faster, which proves the encoder still has a lot of room for future improvements. You see constant performance gains with every new build.

Note: the result of version 0.9.6 with 8 threads on the Phenom processor is surprisely low and can be reproduced. There was some bug causing a huge system overhead in that build. Also the result with 3 threads and version 0.9.7 on the i7 is a little lower than expected. When doing the benchmark series, I first forgot this setting and after downgrading back to 0.9.7, it runs a little slower. A reason for this behavior couldn't be found yet.