How to develop cross-platform augmented reality applications

Trigger warning: this article contains several AR-related puns.

For the first six months of release, the IKEA Place app developed by TWNKLS and SPACE10 was available only on iOS. This was due to its reliance on Apple’s augmented reality API, ARKit, for which it was promoted as a launch app in 2017.

At this point, Google and Android were lagging behind in the ARms race. Although AR experiences had been available for some time on Android using its Tango platform, this required specialized hardware such as infrared cameras and motion trackers. As a result, and despite its rather advanced feature set, Tango received little attention from developers and consumers.

With the release in early 2018 of ARCore, Google’s answer to ARKit, augmented reality finally became accessible to mainstream non-iOS users. The new API required no special hardware other than a device running Android 7 or higher. TWNKLS was keen to bring IKEA Place to that mARket (yes, the puns are mostly going to be like this, I’m afraid), so we set about porting the app to ARCore.

Build the app in a platform-agnostic way

Fortunately, ARKit and ARCore each provide an easy-to-use interface for Unity, the engine we use to develop the app. Less fortunately, there exists no unified point of entry to each API. And least fortunately, the two APIs are somewhat different in their design: a function call to perform a given task in ARKit does not necessarily have an exact equivalent in ARCore, so porting code across the two platforms isn’t merely a matter of replacing function names (though there was a lot of that too).

It would, of course, not be feasible to maintain parallel iOS and Android versions of the project: the Place app is constantly being updated, and introducing new features to two completely separate apps simultaneously would be logistically unmanageable. Instead, we developed a wrapper library which would allow us to build the app in a platform-agnostic way. Instead of invoking AR functionality directly in our code, we route all AR-related actions through a new, generic interface, which relays the instruction to ARKit or ARCore depending on the operating system. We christened this library WrapparW, a witty portmanteau of “wrapper” and “AR”, with another W on the end because palindromes are nice.

As I mentioned, there are some significant structural differences between the ARCore and ARKit APIs which complicated the development of WrapparW a little. At this point, I hope you’ll forgive a quick digression into coding theory. If you already know this stuff, or just don’t care about it and want to get back to the AR, skip the next three pARagraphs.

At TWNKLS we internally use a framework which enforces a modified version of the model-view-controller (MVC) design pattern. This is a programming paradigm in which a project is divided into discrete entities which govern input, output, and the processing of the former into the latter. Such a separation of concerns ensures that modifications to one area of the project are less likely to have unforeseen implications for another, thus expediting the development of medium-to-large projects.

Of course, in order to constitute a working application, these individual parts must be able to affect, and to react to, each other. In our MVC implementation, inter-entity communication is handled by way of events: alerts which can be raised by one part of a system and reacted to by another. For a basic non-AR example, imagine a word processor designed using an event-based MVC: when a key is pressed, the controller entity would fire an event, to which the model entity reacts by inserting a character in the text and advancing the cursor position, and sending the resulting string to the view to be displayed.

The point is that all our new codebases are obliged to use an event-based design, and WrapparW was no exception. This was easy in regard to the ARKit path: that API is also event-based, with events being fired whenever the tracking state changes, anchors are created or moved, or the camera is updated; all we had to do was map these to genericized WrapparW events. A little more inventiveness was required, however, to fit the square peg of ARCore into the round hole of an event-based system.

My condolences if you just died of boredom. As it happens, a lot of the development of WrapparW was tedious grunt work: initializing AR sessions, or mapping enumerations from each API to a generic type. These tasks were laborious but not particularly interesting from a technical point of view (or indeed any other), so let’s move right to the fun stuff: what the two APIs can actually do to augment reality.

AR point cloud creation, horizontal plane detection, and light estimation

At the time of writing, the Place app primarily makes use of three fundamental AR features: point cloud creation, horizontal plane detection, and light estimation. Let’s look at these in turn. In each case, I’ll describe the feature in layman-speak, followed by slightly more technical details for those who are interested in how we dealt with this in WrapparW. These bits will assume rudimentary knowledge of object-oriented programming in general and Unity in particular. They’re in italics for easy ignoring.

Klik hier voor de gratis AR Management handleiding en leer de essentials die nodig zijn voor het bouwen van een sterke Augmented Reality-strategie.

Point cloud creation in augmented reality

Point clouds are augmented reality’s backbone (I almost said “cornARstone”, but generously relented). The image obtained from the device’s camera is analyzed by a computer vision algorithm which identifies features in that image: points in 3d space at which depth, or distance from the camera, can be recognized. This is essential for AR running on a device with no specialized infrared depth camera – which is to say, the vast majority of consumer-level devices available today. Together, the points form a cloud which represents the app’s knowledge of the world. The more points in the cloud, the more comprehensive that knowledge.

Both ARKit and ARCore provide direct read access to the point cloud. In ARKit, every time the camera obtains a new image, the event ARFrameUpdatedEvent is fired. After that event completes, the latest points (represented as UnityEngine’s Vector3 type) can be found in the pointCloudData array attached to the AR camera. ARCore is similar: calling PointCloud.GetPoint(int index) on the AR frame will return the point at that index.

The only notable difference between the APIs is that, in the non-event-based ARCore, the timing of the frame update is opaque, so we can never be sure if the points are correct for the current frame or the previous one. But for most usages a one-frame margin of error is unlikely to matter.

To integrate the two paths into WrapparW, we create our own event for handling frame updates. On ARKit, we chain this with ARFrameUpdatedEvent by invoking the former in a handler subscribed to the latter. On ARCore, we fire our event in the Update() function of the WrapparW class (which derives from Unity’s MonoBehaviour). Unity calls this function automatically once per frame; we can’t be sure that camera updates occur at exactly the same frequency, but it’s close enough.

Plane detection in augmented reality

So what’s the point of points? Well, for one thing, by connecting them together, both ARKit and ARCore are capable of reconstructing flat surfaces – planes – from the real world.

In its initial release, ARCore supported the detection of horizontal planes only. Luckily for us, that was all we needed for the Place app, whose core functionality centers on placing objects on the ground or floor – which is, in most cases, approximately horizontal and planar.

Planes in AR are closely related to the concept of anchors. These are essentially data structures which store a position and orientation (or rotation) relative to the real world. As the camera moves, anchors move in the opposite direction, thereby maintaining the same apparent position in world space. To maintain the illusion that an AR object is “in” the real world, it must be attached to an anchor.

All you geometricians out there will notice that these two properties – position and orientation – also happen to be the ones which are required to express a plane mathematically. This means that objects can trivially be anchored to an arbitrary point on a plane.

ARCore planes can be obtained by calling GetTrackables(List trackables) on the AR session. This outputs a list of TrackedPlanes (DetectedPlanes in ARCore 1.2+), each of which contains a CenterPose instance, which in turn has rotation and position fields.

Plane generation in ARKit, meanwhile, requires a component to be included in the scene – the aptly named UnityARGeneratePlane. ARKit’s naming conventions explicitly treat planes as a type of anchor – ARPlaneAnchor – a list of which can be obtained by calling unityARAnchorManager.GetCurrentPlaneAnchors() on the UnityARGeneratePlane instance.

In WrapparW, we convert the planes obtained from either API to our own generic type and return them to the app.

The respective handling of planes/anchors is another example of how ARKit’s event-based design makes it more versatile than ARCore. When the status of any anchor changes, ARKit internally fires ARAnchorUpdatedEvent. We wanted to expose this functionality in WrapparW, but it wasn’t clear how to approach it on ARCore. Initially, we tried comparing the anchor list every frame to the list from the previous one, and triggering an event when a change was detected. However, for our purposes it turned out to be more performant simply to fire the event every frame (again, in the Update() function) without doing the test. This is really an application-dependent choice: if we had to perform especially expensive operations when the anchors change, the initial approach may have been more efficient.

Lighting in augmented reality

It is important for AR content to take account of the lighting conditions in the real world. Imagine placing an AR chair in a dimly lit room: if all AR content were lit at a constant intensity, the chair would be much brighter than all the real world content on the screen and look completely out of place.
Lighting in both APIs is still at a pretty basic level: a single numerical value is provided which represents a global lighting intensity, approximated from the brightness of the pixels in the camera image. This is applied to all AR objects uniformly. While this is vastly better than having no dynamic lighting at all, it obviously doesn’t take into account light position or direction: an object sitting directly in front of a light source will be no more brightly lit than one in shadow, and parts of the object which face towards the light and those which face away will be equally bright. However, research in this area is rapidly progressing, with techniques being developed to determine more refined information about lighting conditions from the camera image.

Because both APIs express the scene lighting as a single scalar, wrapping this was trivial. ARKit’s estimate is obtained by calling lightData.arLightEstimate.ambientIntensity on the AR camera, and ARCore’s by calling LightEstimate.PixelIntensity on the frame. We also rescale ARKit’s value from the 0-2000 range to 0-1. The only other issue we encountered is that the intensity value tends to be significantly higher on ARCore than ARKit (in the same conditions), so the ARCore path in WrapparW applies a formula (devised through brute trial and error) to attenuate the value before returning it. In the app, we assume lighting will usually come from above, so we apply the intensity to a directional light facing roughly downwards to try to make everything look a little less uniform.

ARKit also computes a color temperature – lightData.arLightEstimate.ambientColorTemperature – which provides an additional, subtler harmonization between the real world and augmented content. As this has no equivalent in ARCore, on Android WrapparW simply sets this to a fairly neutral constant (6500K, the color temperature of an average overcast day).

While these are the main API features we use in IKEA Place, it’s by no means a comprehensive list of everything ARKit and ARCore can do, especially in recent versions. ARKit 1.0 and ARCore 1.2, for example, introduced image tracking: the ability to scan “markers” in the real world and attach anchors to them. Because this is not used in IKEA Place, we didn’t include it in WrapparW. The recently-released ARKit 2.0 contains a number of very interesting additions, such as full reflection mapping.

Conclusion

Of course, the ARKit and ARCore APIs will fall out of widespread use as higher-level alternatives become mainstream. Unity’s own answer to WrapparW, ARFoundation, is already in preview. However, at TWNKLS we’ll stick with WrapparW for now as a) its event-based structure aligns with our framework, b) we have access to the source and c) it’s in there now, so why give ourselves a bunch of extra work? Plus, in engineering it’s always helpful to know what’s going on under the hood, and I hope this article helps you with that even as technology marches on.

I feel all melancholy now. I guess that’s a wrap! That was another pun, by the way. Feel free to kill me, no jury would convict you.