February 26, 2010

Engineering Messenger for real relationships

Web-based social networks have had huge growth lately, and they're a great way to share and stay informed about what everyone that you know is up to. They let you reach out to large groups of people, including friends and family, but your social network usually also includes a broad group of acquaintances, co-workers, and people from your past that you may no longer be very close to. Messenger, on the other hand, is about the people that matter most to you – the ones you talk with frequently and share more personal moments with.

For several years now, this has been our North Star in building Messenger – helping people stay connected to their closest friends and express themselves in meaningful ways. I'd like to tell you a little more about how we're doing that.

First, a personal anecdote:

Texting in the Triassic era

Picture of UNIX ntalk conversation

My first year in college I discovered the power of UNIX-based ntalk. It was a simple text-based program that allowed two users to send text messages to each other in real time. I remember loving how much faster and more conversational it felt than e-mail (go PINE!). I spent many late nights ntalking with friends and eating cold pizza. But there were downsides too. Because ntalk split the screen between the two users, it was hard to tell in what order the messages had arrived. And worse, ntalk was so “real time” that it would show recipients each character as it was typed. Needless to say, this led to some horribly embarrassing situations as I would start typing something, then change my mind and erase it – live. Plus, there was no friends list, or any way to know if a friend was online.

A more meaningful conversation

Instant messaging has grown up a lot since then, with major IM clients offering a rich set of features. For Windows Live Messenger, this includes a contact list with display pictures, status messages, and a presence indicator (to show when your friends are online), offline messaging, interoperability with mobile and SMS, and even the ability to chat with people on other networks.

Even though sending text remains the backbone of Messenger, over the years it has gone beyond text in ways that make communication more natural and intimate.

Some key stats

Our goal is to allow you to have deeper, more meaningful conversations in Messenger. So how do we measure our success? One key metric we track is conversation length – having a longer conversation means you're generally going beyond lightweight exchanges of messages like “Are you coming?”

We also look at voice and video (our closest approximation to a true conversation), and how people interact with each other through photos. Some numbers:

  • The average length of a Messenger session (a conversation) is 9-11 minutes
  • About 59% of sessions are over 5 minutes, with about 10% over 20 minutes long
  • 1.6 billion sessions per month are over 30 minutes long
  • People use Messenger to exchange over 380 million photos a month —over 4.5billion photos a year (this is in addition to the tens of billions of photos people also share each month via Windows Live SkyDrive and Hotmail)
  • Messenger users have 230 million voice & video conversations per month
  • Average voice session length is about 18.2 minutes
  • Average video session length is about 13.3 minutes
  • 9% of video calls and 13% of voice calls are longer than 1 hour
  • Usage of international voice and video sessions differ geographically. Brazilians use voice & video only 13% of the time for international connections while users in other countries, such as Spain or Germany, use voice and video 50-75% of the time for long distance chats with friends and family.

Following are a few examples of Messenger features that we’ve built to help people connect with their close friends in more meaningful ways. Some of these have been around for years, and some were introduced in the most recent version of Messenger.

Typing indicator

This is a very simple feature, but is really important to keep conversations flowing naturally.

In the real world, you typically wait until your friend is done talking before you answer. It’s a very basic social contract. Without a typing indicator, IM conversations feel unnatural, and you and your conversation partner are constantly “typing over each other.” But too much real time information loses the advantages of a written conversation, where people can choose their words carefully, or start a thought and then change their minds and cancel it.

Rather than show you every character I type as I type it, Messenger tells you "Piero is writing…" so that you can wait for me to finish my thought before you jump on to the next topic. You cannot tell what I am typing until I press Enter. Through extensive user testing, we found that this strikes the right balance between privacy and keeping people engaged with the conversation.

Emoticons, winks & nudges

Conveying emotions with text was challenging in the old days, especially with IM, where speed is of the essence. So users evolved a terse IM lingo with expressions like LOL, BRB, TTYL, etc. Messenger extends this concept with a set of animated emoticons, some of which have become almost iconic:

Picture of Windows Live emoticons

(Here’s a handy reference table with keyboard shortcuts for you power users)

In 2005 we added “winks” – little animations you can share with your friends. Messenger users exchange about 240 million winks a month.

Picture of a wink sent in Messenger

Finally, PCs are great for multi-tasking, which means sometimes the person you’re talking to is not staring at the chat window. But in more recent versions of Messenger, you’ve been able to get the attention of your distracted friends using the "nudge" feature. Clicking the little nudge button (Picture of nudge button  ) shakes the IM window and plays a little “wake up buddy!” sound that is sure to get attention. To protect you from overzealous nudgers, we’ve limited use of the nudge to every 15 seconds or so.

Sharing photos

For years now, you’ve been able to exchange files while you IM. But when we looked at the data, we found that the vast majority of files that people exchanged were photos. So in 2009 we introduced simpler, more fun ways to share photos. Now you can just drag your photos into a conversation, and the chat window transforms itself into a shared slide show. You and your friend see the same photo at the same time – if you advance the slide show, it advances for your friend, and vice-versa.

When designing this, the team wrestled with whether to enable automatic acceptance of incoming photos. Sharing files in Messenger had always required you to first “accept” the transfer, and then pick a location to save the file. But this felt too cumbersome. Our goal was to mimic what people do when sitting together and looking at a photo album.

So we tested this extensively in our usability labs and with our Beta users. What we found is that there is an implicit social contract between people engaged in an IM conversation. We found that during IM the frame of the conversation window virtually disappears, as people connect emotionally with their friend on the other side. Unlike e-mail, only your actual friend can send you things, and if a friend is annoying or inappropriate, you can simply block or delete them. So we decided it was worth the risk to automatically accept incoming photos and immediately display them, like so:

Picture of a Messenger slide show

Of course we provide an option for you to opt out of this behavior:
Picture of the option to automatically accept photo invitations

But since we launched this feature in 2009 we’ve found very few people opt out, and the feedback we have heard on this feature has been super positive. Expect more live sharing features like this in our upcoming releases.

Personal expression

In a similar spirit, we wanted to offer a way for you to express your real personality and your shifting moods. Internally, our team distinguishes between customization (customizing something for yourself) and personal expression (customizing how other people see you). In the past, like many other IM programs, Messenger had focused on creating “skins” – a quick way to customize the appearance of the program. But our data showed very few people took advantage of it. So in 2009 we shifted our focus towards personal expression.

We did this by making it very easy for you to pick a background scene, user tile, and status message that your friends will see when they interact with you. For example:

Picture of changing my Messenger scene, user tile, and status message

The key difference between this and a “skin” is that my Valentine’s Day scene, user tile, and status message are visible to others when they see me in their friends list or chat with me:

Picture of a chat window with my Valentine's Day scene

Voice & video chat

Of course, the ultimate representation of an actual conversation is to see and hear the person on the other end. Messenger users are addicted to voice and video—they use this feature 230 million times a month, for everything from socializing to staying in touch with relatives and friends abroad. Over the years, we’ve seen a switch away from a simple voice channel towards full-blown video IM. Today, 81% of the voice calls initiated in Messenger also include video.

For Messenger 2009 we decided to focus on improving video setup time and video quality.

Setting up a video call can be tricky. The two computers involved in the call need to negotiate on how to best complete the call, what quality settings to use, how to connect through firewalls and NATs, etc. But when you start a video chat with a friend, you want to see them immediately. There’s nothing worse than sitting there waiting for the call to connect. In Messenger 2009 we reduced average call setup time by 50%, from about 24 seconds to 12. We did this by reducing and consolidating unnecessary data round-trips, and switching the code to use our peer-to-peer IM channel instead of a custom channel.

Picture of a video conversation in Messenger

To improve quality, we introduced VGA resolution support (640x480) for crisper full-screen conversations. This resolution is in line with the broader web camera ecosystem, though over time we expect this to go even higher, as more people get HD cameras. We also updated our compression algorithms to improve the image quality. We put special focus on algorithms that adapt to fluctuating network conditions. For example:

  • Typically, people sit mostly still in front of their webcam. Where motion occurs, like around the eyes and face, or in gestures, it's important that the video feed can keep up with it. To do this, our code looks for parts of the image that are moving rapidly, and focuses more quality and bandwidth on those. If you wave your hand on screen, we send more information about your hand than the rest of your face, or the wall behind you.
  • Because most people are sitting in front of a static background, we run an algorithm that detects the outline of the person, and then send more information about that border than we do for the background. This makes hair, clothes and other edge features look more crisp, while saving bandwidth overall.
  • Web cameras are often set in poorly lit environments. They’re rarely somewhere with professional lighting like a TV studio. So we developed special algorithms to compensate for poor color and white balance and make skin tones look more natural.

These are just a few of the ways we've been working to improve the voice and video experience so that connecting with friends and family in Messenger feels more like they're in the room with you.

Stay tuned for more posts on how we’re continuing to evolve the Messenger experience.

No comments: