April 18, 2018

measureBlock: How Does Performance Testing Work In iOS?

I was working on a mini project at Square involving performance unit testing for iOS. Essentially I’m looking into how we could introduce performance unit testing, what our options are and how it would scale on our CI. In this post I’ll just focus on the one tool Apple provides as part of its unit testing suite: the magical measureBlock method. The question was: how does it work? and is this going to work for us and our CI (continuous integration) process?

What’s measureBlock? #

For those who don’t know measureBlock here’s a little background. When you write a unit test in XCTest, a feature allows you to measure how long a block of code takes to execute. It looks like this in Objective-C:

- (void)testMyCodesPerformance
{
    [self measureBlock:^{
        [someClass doSomethingFancy];
    }];
}

Or in Swift:

func testMyCodesPerformance() {
    measureBlock {
        someClass.doSomethingFancy()
    }
}

Xcode runs this a bunch of times and establishes a baseline. If on a given subsequent test the standard deviation is too far off from that baseline then the test fails. Additionally, Xcode provides you with a nifty popover showing you the durations of various runs and lets you pick your own baseline and settings:

How Does It Work? #

The Basics #

So unfortunately everything I found on Google was pretty shallow, however I stumbled upon some old slides from a 2014 WWDC session.

This document explains that measureBlock runs your block 10 times and calculates the average it takes to run your block. This average is then used as a baseline. The very first time you run your test, it will fail because no baseline has been established yet as it gets calculated on the first run. You may modify that baseline manually. On subsequent test runs, measureBlock still runs your block 10 times but this time it will compare the standard deviation for the run time to the baseline. If it is more than 10% off either up or down then your test will fail. All these settings can be manually changed too.

Baseline vs. Average #

Xcode shows a little popover that displays both a baseline and an average. The difference between the two is the average is the actual average time it took to run your block of code the last time you ran your test. The baseline is a fixed setting of your choosing (automatically set by Xcode if you don’t do it). The standard deviation is compared to the baseline, the average displayed in the popover does not have any effect on your test.

Why Use Standard Deviation #

Here is a graph that shows the run times for 10 runs of a given code block:

The average time is 1 second. (from Apple WWDC slides)

Now here’s a second graph where the average time is also 1 second:

(from Apple WWDC slides)

Clearly the average doesn’t tell the whole story. This is why measureBlock compares the standard deviation to the baseline because the standard deviations tells us about the spread of the measurements.

Where Is The Baseline Stored? #

So now the big question for those who work at big companies who run CI on multiple machines: where does the baseline get stored? I figured this out simply by using git, adding a performance unit test and looking at the file diff.
Xcode stores your baselines within the project file package, under project.xcodeproj/xcshareddata/xcbaselines/.... This folder will contain one .plist listing all performance test settings for a given host machine+run target combination, and a Info.plist with a list of all host machines. Baselines are specific to both the host machine running the tests and the targeted device (eg. iPhone 7 simulator). Xcode generates a unique UUID to identify the combo (machine+target) and ties all the performance settings to it. The combo is defined by the specs of the machine so if you run your performance test on a different machine that has the exact same specs then that same baseline will be pulled (see screenshot below for what specs are used to define the combo).

The Info.plist that indexes all host machine and target combinations looks like this:

And this is an example of a .plist for a given host machine’s performance test settings:

So while these get checked into your code repository, every machine will have to have its own settings. This is reasonable, given performance will vary from a host machine to another, and from a simulator to another. However it may get tricky if you have hundreds of virtualized machines at a large company.

Q & A #

What version of Xcode is this? Xcode 9.2
How did you figure out the plists? I made changes to the baselines using Xcode and used git to detect file changes. Then I open the plists, tried to modify them by hand and looked at the results. I also tried to run my test on another machine to see which baselines get pulled.
Can I generate these plists with a script? Yes (as long as Apple doesn’t change things). You can generate a random UUID to name a combo and plug in all specs you want. Just make sure to not forget any fields. The Info.plist file needs to contain a reference to the combo (machine+target) and contains the specs. You also need a file named after the UUID with a .plist extension that contains all the test names and associated baselines.
Can I remove fields from the plist to make it more general and reuse a “combo” for multiple types of machines or targets? No. If a combo is not a perfect match with a machine’s specs then Xcode will generate a new UUID and fill it with the machine’s exact specs. This will requires you adding new baselines to tie to this UUID.

Conclusion #

The key things I figured out about measureBlock:

measureBlock runs your code block 10 times.
Compares standard deviation to a baseline.
Baseline is computed by Xcode but can be set by hand.
Baselines for your tests are stored in the .xcodeproj file but are both host machine and target device specific.

Essentially my conclusion is that Apple provided iOS developers with a great and simple performance unit testing tool but without a ton of extra tooling and scripting it may not scale for companies that run automated tests on hundreds of machines.

References #

Testing in Xcode 6, WWDC 2014

Kudos