Keeping your Swift App Reliable with UI Testing
Open sourcing our UI automation library, how it works, and how you can use it to test your own Swift applications on Windows.
In this piece, Tristan and Jeff leading our developer infrastructure team present our UI testing library, swift-webdriver, ensuring the reliability of Swift Apps throughout their development — and keeping regressions at bay.
Today, we are excited to announce that The Browser Company is releasing swift-webdriver as an open source UI automation library for Swift applications on Windows. This is the project that we have been using internally to write UI tests for the Arc web browser, so it combines perfectly with WinUI. As an implementation of the WebDriver protocol, it can also support automating Web navigation via Selenium or browser-specific WebDriver implementations. This post will explain the need for UI testing, provide background on available technologies, and present our solution for Swift applications on Windows: swift-webdriver.
UI Testing
UI testing is an important part of developing reliable graphical applications. It enables testing interactions as they would be performed by the users, and can find high-level issues that would not be detected by unit tests, such as dialogs not dismissing correctly or navigation issues. It is also useful for smoke testing applications and having a Continuous Integration (CI) pipeline answer the question: Can I even launch this build?
At The Browser Company, we run automated UI tests as part of the CI pipeline for every pull request on the Arc repository. These tests ensure that our unboxing experience and browser import is robust, that we can move tabs between different spaces, that we can drag-and-drop or copy and paste content, and that we can open the bug reporter, to give a few examples.
UI testing rests on UI automation. A UI automation technology supports simulating user-like interactions with applications such as finding elements, clicking on them, typing text, using keyboard shortcuts, and retrieving text. A UI testing framework may add constructs specifically useful for testing such as assertions and screenshot comparisons.
A Survey of UI Automation Technologies
UI Automation and Microsoft Active Accessibility
Microsoft’s UI Automation technology is, unsurprisingly, heavily based on COM and OLE, which we discussed in a previous blog entry. At the root of everything is IDispatch
, which provides a late-bound, introspectable interface for interacting with program elements. This is useful for scripting, accessibility purposes, and UI automation. The earliest implementations of automation used IAccessible
, designed to enable both accessibility and automation for Win32 applications. It provided basic information about UI elements on top of IDispatch-based scripting. IAccessible
was part of a collection of interfaces and technologies known as Microsoft Active Accessibility (MSAA). This was a great starting point, but it had many limitations especially when it came to web content.
MSAA was later superseded by Microsoft UI Automation, specifically designed to handle more complex native interactions, as well as dynamic and extensible web-based applications. UI Automation exposes a considerable amount of functionality via the IUIAutomation family of interfaces. Many applications and tech stacks support both technologies, as these form the core of Microsoft's native UI automation technologies.
Selenium and WebDriver
Meanwhile, as Web applications became more complex, Selenium appeared as a solution to automate browser interactions and became widely adopted for UI testing. Through Selenium, a test could navigate to pages, click on buttons, fill in text fields and find the text on specific HTML elements. This was achieved through a REST API supporting straightforward requests such as:
GET /session/5087ecc8/element/searchTextBox/text HTTP 1.1
As Selenium gained in popularity, there was an effort to standardize its protocol through the World Wide Web Consortium (W3C). This led to the WebDriver protocol, which differs from the original Selenium protocol only in minor ways. Selenium would eventually deprecate its legacy protocol and adopt WebDriver.
Appium and Windows Application Driver
As the Web was growing in complexity, so were mobile apps, and the success of Selenium and its REST API approach made it a tempting model to adopt for UI testing on mobile platforms. Moreover, most of the concepts of Web navigation applied directly to mobile app navigation. Thus was born Appium, which provides an implementation of the Selenium/WebDriver protocol that leverages Mobile OS APIs to simulate user interactions.
But there was one problem: Appium did not support Windows applications. Not to be left apart, Microsoft decided to develop and contribute Windows Application Driver, its own implementation of the Selenium/WebDriver protocol which allows automating Windows applications, from UWP applications through legacy Win32 programs. Windows Application Driver runs as a standalone console program that manages the lifetime of the UI application under test and internally implements operations using the aforementioned IUIAutomation
APIs. Windows Application Driver now comes bundled with Appium, where it serves as a proxy, but it can also be used on its own.
swift-webdriver
When we succeeded in creating our first prototype of a WinUI application using Swift, we knew we would rapidly need a UI testing solution that was as good as we have on macOS via XCTest. We surveyed our options and Windows Application Driver appeared as the best supported technology for UI automation of modern apps. There were no Windows-compatible Swift libraries for it, so we rolled up our sleeves and wrote swift-webdriver. What’s more, while our focus was on Windows applications, swift-webdriver takes care to implement an endpoint-agnostic WebDriver client, which means it could also be easily adapted for web automation.
Using swift-webdriver is a 3-step process:
Obtain a WebDriver implementation. For Windows UI testing, this is achieved by creating a
WinAppDriver
object, which will start and manage the lifetime of the Windows Application Driver executable.
let webdriver = WinAppDriver.start()
Create a
Session
for the application to be automated. WithWinAppDriver
, this corresponds to launching a new instance of an application executable, or attaching to an existing one.
let session = Session(
webDriver: webDriver,
desiredCapabilities: WinAppDriver.Capabilities.startApp(name: "notepad.exe"))
Find and manipulate Elements of the application:
session.findElement(byName: "close")?.click()
swift-webdriver integrates nicely with XCTest-based unit tests. A typical pattern would be to create a WebDriver instance in the static setUp method, and then create a session in each individual test method. The XCTAssert
family of functions can be used to verify that a textbox contains the expected text, for example, and XCTUnwrap
combines with Session.findElement
to encode expectations about which UI elements exist at a given time. For test reliability, a Session
object can also be configured to implicitly retry looking up UI elements up to a specified timeout value.
The initial release of swift-webdriver supports the navigation, windowing, mouse, touch and keyboard interactions that we needed to test Arc, and we welcome pull requests for any additional functionality we may be missing.
Future Directions
The functionality provided by the swift-webdriver API closely resembles that of the XCode UI testing API on macOS, XCUIApplication
. This opens the possibility of providing an XCUIApplication
implementation on Windows in terms of swift-webdriver, which could allow cross-platform UI testing, provided that developers are careful about UI variations between platforms and use the same UI element identifiers. This would also provide a layer for implementing other UI testing functionality not directly handled by the WebDriver protocol, such as DPI support, window enumeration and screenshot comparison utilities.
While swift-webdriver is maturing for the specific scenario of communicating with the Windows Application Driver, it remains mostly untested on other platforms, with other WebDriver endpoints, and along the spectrum of protocol versions that includes the legacy Selenium protocol, WebDriver 1 and the newer WebDriver 2 standard. We would appreciate community contributions in this area!
Wrapping up
With swift-webdriver, we have another core component for building high-craft UI applications with Swift on Windows. This is possible thanks to the WebDriver protocol and the Windows Application Driver project, which we’ve covered in some detail. Already, swift-webdriver has allowed us to detect regressions in internal builds of Arc on Windows, which means that we can add new features with more confidence.
By open-sourcing this swift-webdriver, we hope to empower others to choose Swift for their next UI application on Windows, and we are excited to see the project grow from community contributions. So head up to the GitHub project home page and give it a try.
We’ll be sharing more about how this all comes together in building high-craft Swift applications on Windows. Stay tuned!
- Tristan & Jeff