Marionette, Act II: Harnessing automation to test the browser

22 Aug 2016

Welcome back to my post series on the Marionette project! In Act I, we looked into Marionette’s automation framework for Gecko, the engine behind the Firefox browser. Here in Act II, we’ll take a look at a complementary side of the Marionette project: the testing framework that helps us run tests using our Marionette-animated browser, aka the Marionette test harness. If – like me at the start of my Outreachy internship – you’re clueless about test harnesses, or the Marionette harness in particular, and want to fix that, you’re in the right place!

Wait, what’s Marionette again?

Quick recap from Act I:

Marionette refers to a suite of tools for automated testing of Mozilla browsers.

In that post, we saw how the Marionette automation framework lets us control the Gecko browser engine (our “puppet”), thanks to a server component built into Gecko (the puppet’s “strings”) and a client component (a “handle” for the puppeteer) that gives us a simple Python API to talk to the server and thus control the browser. But why do we need to automate the browser in the first place? What good does it do us?

Well, one thing it’s great for is testing. Indulge me in a brief return to my puppet metaphor from last time, won’t you? If the automation side of Marionette gives us strings and a handle that turn the browser into our puppet, the testing side of Marionette gives that puppet a reason for being, by letting it perform: it sets up a stage for the puppet to dance on, tell it to carry out a given performance, write a review of that performance, and tear down the stage again.

OK, OK, metaphor-indulgence over; let’s get real.

Wait, why do we need automated browser testing again?

As Firefox¹ contributors, we don’t want to have to manually open up Firefox, click around, and check that everything works every time we change a line of code. We’re developers, we’re lazy!

"Gosh, you say it like it's a bad thing" (GIF from the film "Clueless")

Clueless via POPSUGAR

But we can’t not do it, because then we might not realize that we’ve broken the entire internet (or, you know, introduced a bug that makes Firefox crash, which is just as bad).

So instead of testing manually, we do the same thing we always do: make the computer do it for us!

The type of program that can magically do this stuff for us is called a test harness. And there’s even a special version specific to testing Gecko-based browsers, called – can you guess? – the Marionette test harness, also known as the Marionette test runner.

(Lazy) Firefox contributors, rejoice!

"Two enthusiastic thumbs up!" (GIF from the film "Clueless")

Clueless via tenor

So, what exactly is this magical “test harness” thing? And what do we need to know about the Marionette-specific one?

What’s a test harness?

First of all, let’s not get hung up on the name “test harness” – the names people use to refer to these things can be a bit ambiguous and confusing, as we saw with other parts of the Marionette suite in Act I. So let’s set aside the name of the thing for now, and focus on what the thing does.

Assuming we have a framework like the Marionette client/server that lets us automatically control the browser, the other thing we need for automatically testing the browser is something that lets us:

Properly set up & launch the browser, and any other related components we might need
Define tests we want to perform and their expected results
Discover tests defined in a file or directory
Run those tests, using the automation framework to do the stuff we want to do in the browser
Keep track of what we actually saw, and how it compares to what we expected to see
Report the results in human- and/or machine-readable logs
Clean up all of that stuff we set up in the beginning

Take out the browser-specific parts, and you’ve got the basic outline of what a test harness for any kind of software should do.

Ever write tests using Python’s unittest, JavaScript’s mocha, Java’s JUnit, or a similar tool? If you’re like me, you might have been perfectly happy writing unit tests with one of these, thinking not:

Yeah, I know unittest! It’s a test harness.

but rather:

Yeah, I know unittest! It’s, you know, a, like, thing for writing tests that lets you make assertions and write setup/teardown methods and stuff and, like, print out stuff about the test results, or whatever.

Turns out, they’re the same thing; one is just shorter (and less, like, full of “like”s, and stuff).

"Whatever" (GIF from the film "Clueless")

Clueless via POPSUGAR

So that’s the general idea of a test harness. But we’re not concerned with just any test harness; we want to know more about the Marionette test harness.

What’s special about the Marionette test harness?

Um, like, duh, it’s made for tests using Marionette!

What I mean is that unlike an all-purpose test harness, the Marionette harness already knows that you’re a Mozillian specifically interested in is running Gecko-based browser tests using Marionette. So instead of making you write code in for setup/teardown/logging/etc. that talks to Marionette and uses other features of the Mozilla ecosystem, it does that legwork for you.

You still have control, though; it makes it easy for you to make decisions about certain Mozilla-/Gecko-specific properties that could affect your tests, like:

Need to use a specific Firefox binary? Or a particular Firefox instance running on a device somewhere?
Got a special profile or set of preferences you want the browser to run with?
Want Electrolysis enabled, or not?

As well as some more general decisions, like:

Need to run an individual test module? A directory full of tests? Tests listed in a manifest file?
Want the tests run multiple times? Or in chunks?
How and where should the results be logged?
Care to drop into a debugger if something goes wrong?

But how does it do all this? What does it look like on the inside? Let’s dive into the code to find out.

How does the Marionette harness work?

Inside Marionette, in the file harness/marionette/runtests.py, we find the MarionetteHarness class. MarionetteHarness itself is quite simple: it takes in a set of arguments that specify the desired preferences with respect to the type of decisions we just mentioned, uses an argument parser to parse and process those arguments, and then passes them along to a test runner, which runs the tests accordingly.

So actually, it’s the “test runner” that does the brunt of the work of a test harness here. Perhaps for that reason, the names “Marionette Test Harness” and “Marionette Test Runner” sometimes seem to be used interchangeably, which I for one found quite confusing at first.

Anyway, the test runner that MarionetteHarness makes use of is the MarionetteTestRunner class defined in runtests.py, but that’s really just a little wrapper around BaseMarionetteTestRunner from harness/marionette/runner/base.py, which is where the magic happens – and also where I’ve spent most of my time for my Outreachy internship, but more on that later. For now let’s check out the runner!

How does Marionette’s test runner work?

The beating heart of the Marionette test runner is the method run_tests. By combining some methods that take care of general test-harness functionality and some methods that let us set up and keep tabs on a Marionette client-server session, run_tests gives us the Marionette-centric test harness we never knew we always wanted. Thanks, run_tests!

To get an idea of how the test runner works, let’s take a walk through the run_tests method and see what it does.²

First of all, it simply initializes some things, e.g. timers and counters for passed/failed tests. So far, so boring.

Next, we get to the part that puts the “Marionette” in “Marionette test runner”. The run_tests method starts up Marionette, by creating a Marionette object – passing in the appropriate arguments based on the runner’s settings – which gives us the client-server session we need to automate the browser in the tests we’re about to run (we know how that all works from Act I).

Adding the tests we want to the runner’s to-run list (self.tests) is the next step. This means finding the appropriate tests from test modules, a directory containing test modules, or a manifest file listing tests and the conditions under which they should be run.

Given those tests, after gathering and logging some info about the settings we’re using, we’re ready to run! (Perhaps multiple times, if repeated runs were requested.)

To actually run the tests, the runner calls run_test_sets, which runs the tests we added earlier, possibly dividing them into several sets (or chunks) that will be run separately (thus enabling parallelization). This in turn calls run_test_set, which basically just calls run_test, which is the final turtle.³

Glancing at run_test, we can see how the Marionette harness is based on Python’s unittest, which is why the tests we run with this harness basically look like unittest tests (we’ll say a bit more about that below). Using unittest to discover our test cases in the modules we provided, run_test runs each test using a MarionetteTextTestRunner and gets back a MarionetteTestResult. These are basically Marionette-specific versions of classes from moztest, which helps us store the test results in a format that’s compatible with other Mozilla automation tools, like Treeherder. Once we’ve got the test result, run_test simply adds it to the runner’s tally of test successes/failures.

So, that’s how run_tests (and its helper functions) execute the tests. Once all the tests have been run, our main run_tests method basically just logs some info about how things went, and which tests passed. After that, the runner cleans up by shutting down Marionette and the browser, even if something went wrong during the running or logging, or if the user interrupted the tests.

So there we have it: our very own Marionette-centric test-runner! It runs our tests with Marionette and Firefox set up however we want, and also gives us control over more general things like logging and test chunking. In the next section, we’ll take a look at how we can interact with and customize the runner, and tell it how we want our tests run.

What do the tests look like?

As for the tests themselves, since the Marionette harness is an extension of Python’s unittest, tests are mostly written as a custom flavor of unittest test cases. Tests extend MarionetteTestCase, which is an extension of unittest.TestCase. So if you need to write a new test using Marionette, it’s as simple as writing a new test module named test_super_awesome_things.py which extends that class with whatever test_* methods you want – just like with vanilla unittest.

Let’s take a look at a simple example, test_checkbox.py:

from marionette import MarionetteTestCase
from marionette_driver.by import By

class TestCheckbox(MarionetteTestCase):
    def test_selected(self):
        test_html = self.marionette.absolute_url("test.html")
        self.marionette.navigate(test_html)
        box = self.marionette.find_element(By.NAME, "myCheckBox")
        self.assertFalse(box.is_selected())
        box.click()
        self.assertTrue(box.is_selected())

This and the other Marionette unit tests can be found in the directory testing/marionette/harness/marionette/tests/unit/, so have a peek there for some more examples.

Once we’ve got our super awesome new test, we can run it (with whatever super awesome settings we want) using the harness’s command-line interface. Let’s take a look at how that interface works.

What is the interface to the harness like?

Let’s peek at the constructor method for the BaseMarionetteTestRunner class:

class BaseMarionetteTestRunner(object):
    # ...
    def __init__(self, address=None,
        app=None, app_args=None, binary=None, profile=None,
        logger=None, logdir=None,
        repeat=0, testvars=None,
        symbols_path=None, timeout=None,
        shuffle=False, shuffle_seed=random.randint(0, sys.maxint), this_chunk=1, total_chunks=1, sources=None,
        server_root=None, gecko_log=None, result_callbacks=None,
        prefs=None, test_tags=None,
        socket_timeout=BaseMarionetteArguments.socket_timeout_default,
        startup_timeout=None, addons=None, workspace=None,
        verbose=0, e10s=True, emulator=False, **kwargs):

Our first thought might be, “Wow, that’s a lot of arguments”. Indeed! This is how the runner knows how you want the tests to be run. For example, binary is the path to the specific Firefox application binary you want to use, and e10s conveys whether or not you want to run Firefox with multiple processes.

Where do all these arguments come from? They’re passed to the runner by MarionetteHarness, which gets them from the argument parser we mentioned earlier, MarionetteArguments.

Analogous to MarionetteTestRunner/BaseMarionetteTestRunner, MarionetteArgument is just a small wrapper around BaseMarionetteArguments from runner/base.py, which in turn is just an extension of Python’s argparse.ArgumentParser. BaseMarionetteArguments defines which arguments can be passed in to the harness’s command-line interface to configure its settings. It also verifies that whatever arguments the user passed in make sense and don’t contract each other.

To actually use the harness, we can simply call the runtests.py script with: python runtests.py [whole bunch of awesome arguments]. Alternatively, we can use the Mach command marionette-test (which just calls runtests.py), as described here.

To see all of the available command-line options (there are a lot!), you can run python runtests.py --help or ./mach marionette-test --help, which just spits out the arguments and their descriptions as defined in BaseMarionetteArguments.

So, with the simple command mach marionette-test [super fancy arguments] test_super_fancy_things.py, you can get the harness to run your Marionette tests with whatever fancy options you desire to fit your specific fancy scenario.

"I'm so fancy" (GIF from "Fancy" by Iggy Azalea)

Iggy Azalea's "Fancy" via tenor

But what if you’re extra fancy, and have testing needs that exceed the limits of what’s possible with the (copious) command-line options you can pass to the Marionette runner? Worry not! You can customize the runner even further by extending the base classes and making your own super-fancy harness. In the next section, we’ll see how and why you might do that.

How is the Marionette test harness used at Mozilla?

Other than enabling people to write and run their own tests using the Marionette client, what is the Marionette harness for? How does Mozilla use it internally?

Well, first and foremost, the harness is used to run the Marionette Python unit tests we described earlier, which check that Marionette is functioning as expected (e.g. if Marionette tells the browser to check that box, then by golly that box better get checked!). Those are the tests that will get run if you just run mach marionette-test without specifying any test(s) in particular.

But that’s not all! I mentioned above that there might be special cases where the runner’s functionality needs to be extended, and indeed Mozilla has already encountered this scenario a couple of times.

One example is the Firefox UI tests, and in particular the UI update tests. These test the functionality of e.g. clicking the “Update Firefox” button in the UI, which means they need to do things like compare the old version of the application to the updated one to make sure that the update worked. Since this involves binary-managing superpowers that the base Marionette harness doesn’t have, the UI tests have their own runner, FirefoxUITestRunner, which extends BaseMarionetteTestRunner with those superpowers.

Another test suite that makes use of a superpowered harness is the External Media Tests, which tests video playback in Firefox and need some extra resources – namely a list of video URLs to make available to the tests. Since there’s no easy way to make such resources available to tests using the base Marionette harness, the external media tests have their own test harness which uses the custom MediaTestRunner and MediaTestArguments (extensions of BaseMarionetteTestRunner and BaseMarionetteArguments, respectively), to allow the user to e.g. specify the video resources to use via the command line.

So the Marionette harness is used in at least three test suites at Mozilla, and more surely can and will be added as the need arises! Since the harness is designed with automation in mind, suites like marionette-test and the Firefox UI tests can be (and are!) run automatically to make sure that developers aren’t breaking Firefox or Marionette as they make changes to the Mozilla codebase. This all makes the Marionette harness a rather indispensable development tool.

Which brings us to a final thought…

How do we know that the harness itself is running tests properly?

The Marionette harness, like any test harness, is just another piece of software. It was written by humans, which means that bugs and breakage are always a possibility. Since breakage or bugs in the test harness could prevent us from running tests properly, and we need those tests to work on Firefox and other Mozilla tools, we need to make sure that they get caught!

Do you see where I’m going with this? We need to… wait for it…

Test the thing that runs the tests

Yup, that’s right: Meta-testing. Test-ception. Tests all the way down.

"Oh my god" (GIF from the film "Clueless")

Clueless via POPSUGAR

And that’s what I’ve been doing this summer for my Outreachy project: working on the tests for the Marionette test harness, otherwise known as the Marionette harness (unit) tests. I wrote a bit about what I’ve been up to in my previous post, but in my next and final Outreachy post, I’ll explain in more detail what the harness tests do, how we run them in automation, and what improvements I’ve made to them during my time as a Mozilla contributor.

Stay tuned!

"I'm outty" (GIF from the film "Clueless")

Clueless via POPSUGAR

Notes

1 Or Fennec, or B2G, or [insert Gecko-based project here]… ↩

2 If you scroll through it and think, “Wow, that’s long and ugly”, well, you should’ve seen it before I refactored it!

Clueless via POPSUGAR

↩

3 If you think distinguishing run_tests, run_test_sets, run_test_set, and run_test is confusing, I wholeheartedly agree with you! But best get used to it; working on the Marionette test harness involves developing an eagle-eye for plurals in method names (we’ve also got _add_tests and add_test). ↩

Anjana Sofia Vakil

Marionette, Act II: Harnessing automation to test the browser

Wait, what’s Marionette again?

Wait, why do we need automated browser testing again?

What’s a test harness?

What’s special about the Marionette test harness?

How does the Marionette harness work?

How does Marionette’s test runner work?

What do the tests look like?

What is the interface to the harness like?

How is the Marionette test harness used at Mozilla?

How do we know that the harness itself is running tests properly?

Notes