Augmented Reality on Windows Phone and Windows 8 Metro style apps
I’ve lately been playing a lot with augmented reality on my Windows Phone and my Windows 8 Tablet. Both of them feature an accelerometer, compass and gyro as well as a back facing camera. This makes it possible to overlay various XAML elements on top of the camera feed, and have them move with the camera as you rotate and orient the device. As an example of this, you can try the AR view in my GuidePost app for Windows Phone (and while you’re at it, make sure you throw some good reviews in there ;-). The app uses the exact approach described here.
To do augmented reality in XAML, I built a custom panel control that automatically handles the placement of all it’s child elements based on the device’s orientation. I used an attached property to define what ‘direction’ an element is supposed to be placed in. You already know this from Grid and Canvas, where for instance Grid.Row=”1” places it in the second row of the grid, or Canvas.Top=”100” places the element 100 pixels from the top of the canvas. In my case I use ARPanel.Direction to define a Point where the first value is the azimuth (ie up/down) and second value is the compass heading.
Note: Since you probably know the Grid panel very well, to help explain how this custom panel works, you’ll sometimes see a “How Grid uses this” section, to explain how the grid control would do something similar.
Here’s what the four cardinal directions and up and down would look like in such a custom panel that we’ll be building:
<ar:ARPanel>
<TextBlock Text="North" ar:ARPanel.Direction="0,0" />
<TextBlock Text="East" ar:ARPanel.Direction="0,90" />
<TextBlock Text="South" ar:ARPanel.Direction="0,180" />
<TextBlock Text="West" ar:ARPanel.Direction="0,270" />
<TextBlock Text="Up" ar:ARPanel.Direction="90,0" />
<TextBlock Text="Down" ar:ARPanel.Direction="-90,0" />
</ar:ARPanel>
Neat huh? This makes it VERY simple to spit out a bunch of different AR-type of apps. So how did I build this?
The custom panel
When building a custom panel, there’s really just 3 things you need to do:
- Inherit from Panel
- Override MeasureOverride
- Override ArrangeOverride
Or to put that in code:
public class ARPanel : Panel
{
protected override Size MeasureOverride(Size availableSize)
{
//TODO
}
protected override Size ArrangeOverride(Size finalSize)
{
//TODO
}
}
The two methods here have the following purposes:
MeasureOverride is called first, and it should go through all the child elements and tell them “Hey I got this much space available - how much of that do you need?”.
How Grid uses this: If a grid is 150x100 and has two rows and two columns, it would look at the row/column of the element then say “I got this much 75x50 space in this row/column - how much do you want”. If all the row/column sizes are auto, it would send the entire size of the grid itself to the child.
ArrangeOverride is called second. This goes through all the child elements and tells them “I heard you want this much space (as indicated by child.DesiredSize), but I can only give you this much, and this is the area where you get to place yourself”.
How Grid uses this: After going through the measure step, Grid would now determine what the size of each row and column needs to be based on the child desired elements (unless the cell sizes are fixed), and then calculate the rectangle for each cell distributed based on the row/column definitions. These rectangles are then used to place the child elements in the row/column they are set to be placed in.
Now first let’s implement MeasureOverride, since this is very simple in this scenario. Each child element can render itself anywhere around you in a 360 degree space, so we don’t need to limit the child elements to any size other than the size of the panel itself. So we’ll ask each child “Given all of my space, how much would you want to use?”. This is simply done by going through each child element and calling Measure on them with the full size available (after each measure call, you will notice that child.DesiredSize will now be assigned). We end the call by returning the size that the panel needs to it’s parent - in this case we want to use the entire space available to us:
protected override Size MeasureOverride(Size availableSize)
{
foreach (var child in Children)
child.Measure(availableSize);
return availableSize;
}
MeasureOverride is called whenever the panel’s size is changing or anything else that would affect the size of the child elements. You can trigger this yourself by calling panel.InvalidateMeasure(). In our specific case we don’t ever need to call this, and the measure will only get invalidated when the panel’s size changes - for instance when page orientation changes, and luckily this is all done for us. (Grid would call InvalidateMeasure if the row/column definitions change)
Next up is ArrangeOverride. Now that we measured all the elements, we need to determine where to place them on the screen, and how much space we need to give each of them. We’ll basically go through each element and call Arrange(rect) on them, where the ‘rect’ is the rectangle it gets to render itself in. Ie. the top-left corner is where the area starts, and the width/height is the size it gets. It’s then up to the child to make use of that area. In our ARPanel case, the top/left corner would constantly change as the orientation of the device changes, but we’ll keep the width/height to the desired size of the element. So the method will look something like this:
protected override Size ArrangeOverride(Size finalSize)
{
foreach(var child in Children)
{
double centerX = TODO;
double centerY = TODO;
double left = centerX - child.DesiredSize.Width * .5;
double top = centerY - child.DesiredSize.Height * .5;
Rect rect = new Rect(new Point(left, top), child.DesiredSize);
child.Arrange(rect);
}
}
I subtract half the width/height of the size to center the element on top of ‘centerX’ and ‘centerY’, which we will still have to determine;
So this is basically our panel implementation. The tricky part is figuring out the “TODO” part of the above code based on the motion sensor. So basically, given a device orientation, what is the screen location of a direction?
Warning: I spent quite a lot of time on getting this part right. It’s a lot of vector math, and some of it I came to using pure trial and error :-). Therefore I might not even completely get all of it myself, but it works, and it works great. I’m pretty sure the math is right, but if I don’t go too much into detail in some of the math, just think of it as magic that works and use the code as is :-) For those of you who are used to XNA 3D Game programming, I’m sure this will all make sense to you though, since this is very similar to how you control a camera in a 3D environment.
The motion sensor
The Motion Sensor is what will be driving the placement of the child elements. Motion is a combination of Compass, Accelerometer and optionally Gyro. The Gyro is not required, but it makes the result A LOT better - especially when you rotate the device fast. Future certified Windows 8 tablets will require all 3 sensors, but for Windows Phone, only some of the newer Mango phones has a gyro, and I believe the Dell Venue Pro compass sensor doesn’t work. Future cheaper Windows Phone devices might not come with any sensors available, so beware that your app might not work on all devices, and you should check for the capability and notify the user if it’s missing.
On Windows Phone you can check whether the motion sensor is supported using ‘Microsoft.Devices.Sensors.Motion.IsSupported’. I didn’t find an equivalent property in Windows Runtime though. The following code starts reading from the sensor for both WinPhone and Windows Runtime:
#if WINDOWS_PHONE
if (Microsoft.Devices.Sensors.Motion.IsSupported)
{
motion = new Microsoft.Devices.Sensors.Motion();
motion.CurrentValueChanged += motion_CurrentValueChanged;
motion.Start();
#elif WINRT
motion = Windows.Devices.Sensors.OrientationSensor.GetDefault();
if (motion != null)
{
motion.ReadingChanged += motion_CurrentValueChanged;
#endif
}
else
{
throw new InvalidOperationException("Motion sensor not supported on this device");
}
#endif
When we get a new reading, all we have to do is tell the panel, that the current child placements are invalid. We don’t want to do too much work here, because the motion sensor can be triggering much more frequent than the layout cycle, so all we do here is flag the panel for arrange and clear any parameters that could be affected by the orientation change (like the attitude which I will get back to):
private Matrix? _attitude;
private void motion_CurrentValueChanged(object sender, EventArgs e)
{
_attitude = null;
#if WINDOWS_PHONE
Dispatcher.BeginInvoke(() => InvalidateArrange());
#elif WINRT
Dispatcher.Invoke(Windows.UI.Core.CoreDispatcherPriority.Normal,
(a, b) => InvalidateArrange(), this, null);
#endif
}
Handling sensor and screen coordinate systems
The Motion sensor gives us a Matrix that defines the rotation of the device relative to up and north directions. The screen has a different coordinate system that is relative to the upper left corner of the screen. When the screen changes between landscape and portrait mode, this coordinate system changes relative to the motion sensors coordinate system, and it’s important to take this into account. Lastly we need to define a field of view for the screen. If you are overlaying elements on top of the camera, the field of view must match the field of view of the camera. Think of it as how ‘narrow’ the view is. A lens that is zoom in far has a small field of view, and only few elements will be visible on the screen, whereas a wide angle lens will have more child elements on the same screen. I found that on my phones a FOV of roughly 35 degrees seems appropriate. So to sum up we need 3 things: The orientation of the sensor, the orientation of the view (screen), and the parameters for the camera (the projection).
First the view, which is slightly different between phone and runtime (probably because the ‘natural’ orientation on a phone is portrait mode, whereas on a PC it’s landscape mode, so the sensors are mounted different).
Matrix view;
if (orientation == LandscapeLeft)
{
view = Microsoft.Xna.Framework.Matrix.CreateLookAt(
new Microsoft.Xna.Framework.Vector3(0, 0, 1),
Vector3.Zero,
#if WINDOWS_PHONE
Vector3.Right);
#elif WINRT
Vector3.Up);
#endif
}
else if (orientation == LandscapeRight)
{
view = Microsoft.Xna.Framework.Matrix.CreateLookAt(
new Vector3(0, 0, 1),
Vector3.Zero,
#if WINDOWS_PHONE
Vector3.Left);
#elif WINRT
Vector3.Down);
#endif
}
else //portrait mode
{
view = Microsoft.Xna.Framework.Matrix.CreateLookAt(
new Vector3(0, 0, 1),
Vector3.Zero,
#if WINDOWS_PHONE
Vector3.Up);
#elif WINRT
Vector3.Left);
#endif
Next we define a viewport based on this view. Note that this depends on the size of the screen, so on size changed we need to remember to reset this value.
private Microsoft.Xna.Framework.Graphics.Viewport? _viewport;
private Microsoft.Xna.Framework.Graphics.Viewport Viewport
{
get
{
if (!_viewport.HasValue)
{
_viewport = new Microsoft.Xna.Framework.Graphics.Viewport(0, 0, (int)ActualWidth, (int)ActualHeight);
_cameraProjection = null; //camera projection depends on viewport - force a reset
}
return _viewport.Value;
}
}
And from this we can now define the projection of the camera / field of view.
private Matrix CameraProjection
{
get
{
if (!_cameraProjection.HasValue)
{
_cameraProjection = Matrix.CreatePerspectiveFieldOfView(
MathHelper.ToRadians((float)FieldOfView), Viewport.AspectRatio, 1f, 12f);
}
return _cameraProjection.Value;
}
}
Now we need the attitude of the device based on the current sensor reading (this was the value we reset in reading changed event). We rotate this value so it matches the rotation of the XNA coordinate system as well.
private Matrix Attitude
{
get
{
if (!_attitude.HasValue)
{
if (motion != null
#if WINDOWS_PHONE
&& motion.IsDataValid
#endif
)
{
_attitude = Matrix.CreateRotationX(MathHelper.PiOver2) * CurrentReading;
}
else
return Matrix.Identity;
}
return _attitude.Value;
}
}
#if WINDOWS_PHONE
private Matrix CurrentReading
#elif WINRT
private SensorRotationMatrix CurrentReading
#endif
{
get
{
return
#if WINDOWS_PHONE
motion.CurrentValue.Attitude.RotationMatrix;
#elif WINRT
motion.GetCurrentReading().RotationMatrix;
#endif
}
}
Ok that’s a lot of funky stuff, but now we are pretty set to project from vectors to screen coordinates using this simple method:
Matrix world = Matrix.CreateWorld(Vector3.Zero, new Vector3(0, 0, -1), new Vector3(0, 1, 0));
// Projects the point from 3D space into screen coordinates.
private Vector3 Project(Vector3 vector)
{
return Viewport.Project(vector, CameraProjection, view, world * Attitude);
}
Only thing is we have a direction ray and not a vector. A little extra method for converting to a vector is needed too:
private static Vector3 PolarToVector(double px, double py, double radius)
{
var O = (py - 90) * PI_OVER_180; // / 180d * Math.PI;
var W = (90 - px) * PI_OVER_180; // / 180d * Math.PI;
var x = (float)((Math.Cos(O) * Math.Sin(W)) * radius);
var y = (float)((Math.Cos(W)) * radius);
var z = (float)((Math.Sin(O) * Math.Sin(W)) * radius);
return new Vector3(x, y, z);
}
We don’t have any radius, but I found always using a value of ‘10’ (which is 10 units down the ray) works pretty well. So… we are now pretty set to implement the ArrangeOverride method.
Arranging the child elements
When arranging the children, we need to first check if it’s inside the screen view, and next where on the screen that is. I’m using the BoundingFrustum to do this. Think of the bounding frustum as the space the camera can see, and is a cone the expands out from the camera We can set this up using:
BoundingFrustum viewFrustum = new BoundingFrustum(Attitude * view * CameraProjection);
If we now build a BoundingSphere around each element, we can use the .Contains method to see if that element is within the frustum. So we first grab the Point object from the element, and build a bounding sphere. If it is inside, all we have to do is call the Project method above to get the screen location, and lastly call Arrange on the element.
So our entire ArrangeOverride method ends up looking like this:
protected override Size ArrangeOverride(Size finalSize)
{
if (ActualWidth > 0 && ActualHeight > 0 && motion != null
#if WINDOWS_PHONE
&& motion.IsDataValid
#endif
)
{
BoundingFrustum viewFrustum = new BoundingFrustum(Attitude * view * CameraProjection);
foreach (var child in Children)
{
object posObj = child.GetValue(DirectionProperty);
if (posObj is Point && !double.IsNaN(((Point)posObj).X))
{
Point p = (Point)posObj;
Vector3 direction = PolarToVector(p.X, p.Y, 10);
var size = child.DesiredSize;
//Create a bounding sphere around the element for hittesting against the current frustum
//This size is not entirely right... size we have is screen size but we use the world size.
//*.008 seems to roughly fit as conversion factor for now
var box = new BoundingSphere(direction, (float)Math.Max(size.Width, size.Height) * .008f);
if (viewFrustum.Contains(box) != ContainmentType.Disjoint) //partially or fully inside camera frustum
{
Vector3 projected = Project(direction);
if (!float.IsNaN(projected.X) && !float.IsNaN(projected.Y))
{
//Arrange element centered on projected coordinate
double x = projected.X - size.Width * .5;
double y = projected.Y - size.Height * .5;
child.Arrange(new Rect(x, y, size.Width, size.Height));
continue;
}
}
}
//if we fall through to here, it's because the element is outside the view,
//or placement can't be calculated
child.Arrange(new Rect(0, 0, 0, 0));
}
return finalSize;
}
else
return base.ArrangeOverride(finalSize);
}
Congrats on making it this far - because we are actually pretty much done! All there’s left to do is put it all together, reset the used orientation parameters when needed and define the attached property for use on the child elements. And I already did all that for you!
You can download the control library together with a sample app here. There’s both a Windows Phone and Windows 8 Metro version included. You can use the code as you like, but if you improve on it, I would LOVE to hear about it. I didn’t include a camera background in the samples, so I’ll let that bit be up to you to add (you can use a VideoBrush as the background for that).
Note that for Windows Runtime, I have only tested this on the Samsung Build Tablet. I have no idea how this behaves on a device without a motion sensor (probably crashes, but let me know in the comments below). Also the Build tablets have a buggy fusion sensor firmware, which means that readings are fired VERY infrequent, making it almost useless for anything but rough testing.
Also note that the compass sensor can get very confused when you are inside - especially if the building has a metal frame construction. If directions seem off, try going outside away from anything that confuses the magnetic field.