measureBlock: How Does Performance Testing Work In iOS?

I was working on a mini project at Square involving performance unit testing for iOS. Essentially I’m looking into how we could introduce performance unit testing, what our options are and how it would scale on our CI. In this post I’ll just focus on the one tool Apple provides as part of its unit testing suite: the magical measureBlock method. The question was: how does it work? and is this going to work for us and our CI (continuous integration) process?

 What’s measureBlock?

For those who don’t know measureBlock here’s a little background. When you write a unit test in XCTest, a feature allows you to measure how long a block of code takes to execute. It looks like this in Objective-C:

- (void)testMyCodesPerformance
{
    [self measureBlock:^{
        [someClass doSomethingFancy];
    }];
}

Or in Swift:

func testMyCodesPerformance() {
    measureBlock {
        someClass.doSomethingFancy()
    }
}

Xcode runs this a bunch of times and establishes a baseline. If on a given subsequent test the standard deviation is too far off from that baseline then the test fails. Additionally, Xcode provides you with a nifty popover showing you the durations of various runs and lets you pick your own baseline and settings:
Screen Shot 2018-04-18 at 10.58.54 AM.png

 How Does It Work?

 The Basics

So unfortunately everything I found on Google was pretty shallow, however I stumbled upon some old slides from a 2014 WWDC session.

This document explains that measureBlock runs your block 10 times and calculates the average it takes to run your block. This average is then used as a baseline. The very first time you run your test, it will fail because no baseline has been established yet as it gets calculated on the first run. You may modify that baseline manually. On subsequent test runs, measureBlock still runs your block 10 times but this time it will compare the standard deviation for the run time to the baseline. If it is more than 10% off either up or down then your test will fail. All these settings can be manually changed too.

 Baseline vs. Average

Xcode shows a little popover that displays both a baseline and an average. The difference between the two is the average is the actual average time it took to run your block of code the last time you ran your test. The baseline is a fixed setting of your choosing (automatically set by Xcode if you don’t do it). The standard deviation is compared to the baseline, the average displayed in the popover does not have any effect on your test.

 Why Use Standard Deviation

Here is a graph that shows the run times for 10 runs of a given code block:
image (1).png
The average time is 1 second. (from Apple WWDC slides)

Now here’s a second graph where the average time is also 1 second:
image (1).png
(from Apple WWDC slides)

Clearly the average doesn’t tell the whole story. This is why measureBlock compares the standard deviation to the baseline because the standard deviations tells us about the spread of the measurements.

 Where Is The Baseline Stored?

So now the big question for those who work at big companies who run CI on multiple machines: where does the baseline get stored? I figured this out simply by using git, adding a performance unit test and looking at the file diff.
Xcode stores your baselines within the project file package, under project.xcodeproj/xcshareddata/xcbaselines/.... This folder will contain one .plist listing all performance test settings for a given host machine+run target combination, and a Info.plist with a list of all host machines. Baselines are specific to both the host machine running the tests and the targeted device (eg. iPhone 7 simulator). Xcode generates a unique UUID to identify the combo (machine+target) and ties all the performance settings to it. The combo is defined by the specs of the machine so if you run your performance test on a different machine that has the exact same specs then that same baseline will be pulled (see screenshot below for what specs are used to define the combo).

The Info.plist that indexes all host machine and target combinations looks like this:
run destinations.png

And this is an example of a .plist for a given host machine’s performance test settings:
plist with baselines for destination.png

So while these get checked into your code repository, every machine will have to have its own settings. This is reasonable, given performance will vary from a host machine to another, and from a simulator to another. However it may get tricky if you have hundreds of virtualized machines at a large company.

 Q & A

 Conclusion

The key things I figured out about measureBlock:

Essentially my conclusion is that Apple provided iOS developers with a great and simple performance unit testing tool but without a ton of extra tooling and scripting it may not scale for companies that run automated tests on hundreds of machines.

 References

 
5
Kudos
 
5
Kudos

Now read this

Anagrams/C++

This weekend I decided to refresh on C++ in preparation for some interviews. I tried to pick a project that I could code in roughly one day. It should not only have me review C++ but force me to solve some problems and implement some... Continue →