DAVID LEDGERWOOD: James, it's great to have you. Thank you for joining us.
JAMES COLGAN: My absolute pleasure, Ledge. It's great being here.
LEDGE: Could you give a little background story of yourself and your work, so the audience can get to know what you're doing?
JAMES: I've been in product management for a very long time now, but had a winding story.
I started out as an embedded software developer working on embedded systems. I've done product management and led programs for embedded microprocessors, cloud computing, SAS applications, and most recently with Outlook Mobile.
Essentially, I've gone from the backend of the supply chain all the way up to the frontend of the supply chain, which has been an awesome journey for me.
Constantly, the common theme is finding problems, solving problems and iteratively learning. That's really what drives me, is where am I going to learn most? And most recently it's, how can I have the greatest impact and share all of the learnings that I've gathered over these years.
That's why I love these types of podcasts where I can share some of the learnings that I've gathered.
Most recently, four years ago I was recruited into Microsoft to be the first product manager to lead Outlook Mobile as it came in from an acquisition.
It was an awesome task that was put in front of us. We had virtually very few monthly active users and we had a very audacious goal. And, within two years we got it to just under 60 million monthly active users.
Learned a lot along the way. Made a ton of mistakes, as you do as you're scaling something out so rapidly. And then most recently for the last couple of years what I've been doing is really focusing on scale. How do we engage with the largest and most complicated and sophisticated enterprises on the planet, and make sure that only are they happy in the products that we put out, but their users, their employees are not only productive, getting things and organized, but also that they're super happy using the product? And really trying to get my head around, how do we do that at such a massive scale.
That's kind of where I came from and what I'm doing right now.
LEDGE: Yeah, wow. Lessons learned. I mean you said 120 million active users now. Outlook is the cornerstone, certainly of corporate email on the desktop and now pushing to mobile. That's an extreme number of users.
How do you even begin to keep track of and think about and iterate something that has such a massive footprint when you make a change? You see all kinds of things online where someone changed their logo and everybody goes berserk. You have a lot higher stakes now in the seat you're in.
So, please, just talk through some of those stories.
JAMES: As we often say within the team, we're much user-driven and data-informed.
A lot of companies talk about being data-driven. Data helps you understand the what of what is going on, but it doesn't really help you understand the why. And it's really getting to understand and having a visceral sense of your users and their context, is really where the conversation starts.
As a product lead, when we were started to more seriously focus on the enterprise… When we first came into the company we were very much being used by consumers. And the vast majority of our users are actually consumers in the consumer space.
Then we got a tremendous amount of growth from there. But then once we had reached a level of compliancy, which is very important in this day of GDPR, and we had met a minimum bar of the features that enterprises were needing. That's when we were able to really say that we are enterprise ready, and to engage with these customers more deeply.
To start off that whole process what I did was look across the various different segments that we serve – and Microsoft serves pretty much every segment on the planet – and see if we can find some canonical examples of customers and enterprises that represents their industry.
More importantly, not so much the industry – as in they are in retail and therefore this is what's important to them, they're in manufacturing, this is what's important to them – it’s have that as an entry into, how do they run their company? How do they organize themselves? How do they communicate and how do they collaborate?
When we had these meetings, these were daylong meetings where I would ask to talk to an executive within the company, a mid-line manager within the company, and then an individual contributor within the company. And then we'd have a conversation, pretty much like the conversation we're having right now.
It is, "Tell me about your day, tell me about your life, what do you do when you get up in the morning? How do you get to work?" And then we start to get into things like, "How big is your organization? How many teams do you lead? How many people are on those teams? How geographically distributed are they? Are you having meetings online and conference calls? Are you having them live and face to face? How many emails?
Then we started getting into the emails aspect of the conversation and how to get into how to manage your calendar.
Everything was truly grounded in, how do you organize yourself? How do you think about your day? How do you work and cooperate with various different people in your organization, both inside your company and outside the company?
It was absolutely fascinating to see that, as you move through the organization, the challenges that each of those individuals had varied dramatically. All of those challenges and opportunities that they presented to us were all equally important, and therefore it came down to, how do we prioritize and how do we move forward with particular features and how do we roll those features out?
As a product leader, really what you're doing is getting that visceral sense of the user within their context. Being able to synthesize that across multiple customers and multiple different user types or personas, and then be able to bring that back into really the triad as many people call it. Which is your team of product management, developer and designer.
What I've done most recently as I've been having these conversations is adding a fourth person into that team, and that is your data scientist.
What you do is your bring into that team the challenge or the opportunity that you discovered as you've been synthesizing this data, this information, these interactions across all of these different customers and you articulate that as accurately and as honestly as you can.
It's very key to be objective in this. It's very difficult, often, for people to separate themselves because we all have our own lives, we all have our own opinions and challenges and opportunities we have in front of us. But how you honestly and subjectively represent those to the team? This is what we're trying to tackle for this set of particular users.
And then it is being able to articulate that and brainstorm and work through the classical design sprint. You've got five days where you work through exactly the challenge you're looking to take advantage of. You brainstorm particular ideas, you mock them up, you test them, you iterate, you learn.
As a product leader, you're really looking to help the team and give them the space to innovate. We've put out in front of the team exactly what we're looking to solve, and we've all agreed that this is a problem that is important to us and our users, so how do you innovate?
That's where engineers, developers and designers can really come up with some magical ideas. Again, as a product leader, what you've done is you've engaged with these enterprises.
What I love to do is then take back some of the ideas that we've come up with and put them up in front of customers, in front of users and have deep conversations about what a particular workflow looks like. This is something that designers and PMs do across the globe but it's great to see when your users have come with a particular problem and you put a solution in their hands, how they interact with it. Because what you thought was obvious absolutely is not, and what you had no idea the real value, then your users can point out a ... that you really we didn't pay much attention to but it happens to be something that is their favorite aspect of the feature. And so, it's getting that feedback.
You're framing it in a way to set expectations. This isn't your users and your customers designing your product for you, it's them providing feedback that, again, you've got to synthesize across a whole bunch of customers. And then you're taking that back in and say, "Okay, what we've learned?''
It's this constant learning and iterative process where you're refining down as you go through the various different levels of fidelity of the mockup or the dev. version of the product and then you're starting to roll that out to select portions of the user base, gauging feedback again and then slowly cranking that up and rolling it to 120 million people worldwide.
LEDGE: One thing that strikes me early in that process and having been through some of this, certainly not at the scale, how do you take all that qualitative information? Literally, what is that synthesis process?
How do you actually take all the notes, or transcriptions or whatever it is that gets that initial sort of feedback, what are the tactical steps and tools to turn that into something that ends up in a design sprint and developer backlog?
JAMES: That's a really good question, and it's the classical combination of art meets science. Let me give you an example of how we did that in the early days where our main point of feedback from users was through the app store reviews.
So, a lot of companies and teams maybe don't have the resources or the bandwidth to go out there and interview a whole bunch of customers and users in the same way that we do. I absolutely appreciate that we've got a tremendous amount of reach that is unique in the industry and not all teams have that.
And a very good proxy for that is reviews on the app stores. You've got the full spectrum, and you'll get particular app reviews on IOS and Android, and there were very different almost personality types between the operating systems, which we found absolutely fascinating.
The first thing we need to be able to do is aggregate all of those reviews that you're getting and then bucketize them across your one stars, two stars, three stars, four stars, and five stars.
Your five stars are really interesting in that they're giving you the pat on the back, and it's nice to see that you're making somebody's life better, but where you're really going to get the value is in looking at your one star reviews.
Then what we did within those one star reviews is categorize each of them in terms of the type of feedback that we're getting. And it could be relative to performance, it could be relative to reliability, it could be the onboarding process and log ins. It could be a particular feature within a certain component of your apps, so in Outlook Mobile maybe it's email or maybe it's in Calendar. It could be a competitive feature that they're looking for.
When you're starting out an insurgent, which Outlook mobile definitely was the insurgent on these platforms, that's a great source of, how do you prioritize the features that competitive solutions have and your user base are asking for?
Once you've categorized each of those one star reviews and prioritized them, you've taken a massive amount of data and you've got a framework around which to reason over it.
And then what we did as a team was – and this is very important. You need to do this as a team because everybody needs to take responsibility for the resolution of challenges and really bringing value to the user and customer. With engineering and design and PM in the room, we looked at each of these one stars and the top five of these one stars and we said, this is what this one star review means. This is the why behind that particular sentiment.
And you'd be able to back up with choice statements that you've cherry-picked, or rather selected. Cherry-picked means you've got the subjective perspective to it but you're trying to be as objective as possible. You select out some choice statements that are emblematic of the sentiment of the population.
Once everyone has an understanding of the why, then you have the people in the room that can start to take ownership of the problem or the opportunity.
So you've prioritized, you've sorted, you've categorized, you've created a framework that everyone agrees with. You've then prioritized and come to an agreement on how a problem could be solved.
Then those issues go into the design sprint and you move forward with the classical process of working through a design sprint, coming out with a solution, breaking it down, and it goes onto your backlog.
Now, you can do that for an entire product when the team is small and the product is just starting out. We started out very small with about 20-25 people on the entire team. Now we're much, much larger than that.
You need different tools and different ways to get that feedback at scale but, again, as you're starting out small or you've got a small portion of a larger product or a larger solution, that's something you can do as a whole team or a good selection of the team. Does that make sense?
LEDGE: Yeah, absolutely. One thing that struck me when you were just saying that, that in fact the one stars are so valuable.
My gut instinct would have been that, we've all read the reviews, it's like, "Well, I would have given this a five if it just had X, Y, or Z." or, “Three stars because it was missing these two things.” Where and a lot of times you might think that a one star is your sort of troll, right?
But you found out that that's not the case. I wonder, are those almost five star reviews more like the siren song that you actually should ignore? Did you have any thoughts and experiences in there to not work on the polls? But, “Hey, if we just did this we'd get a five. We want more fives.”
That seems like it would call to you. Maybe that's incorrect.
JAMES: No, it's a very reasonable assumption to make. We made lots of changes and course corrections as we learned more from the data.
With, “I would have given a five star if it had this one feature,” that is interesting. You're absolutely right.
Every great product is built on a solid foundation and fundamentals. So there's that one feature that could take somebody from a four to a five, but the equivalent one star person is really unhappy.
The one that's given you a four star, they're working through that day, you're providing value to that user, you're helping them already. You could take them from a happy to a joyous state.
Again, if you really want to focus on your fundamentals and the foundations of your product, then that's really where you're going to get the one stars from. Absolutely, you're going to get the trolls in there and you need to be able to filter those out but you do have people in there that are just desperately unhappy.
It's very tempting to look at a one star review and say, "Well, that's just somebody just trolling you, let's ignore them," but that's also the place where you're going to get issues around reliability and performance and user experience and simplicity and elegance of the design.
If they're complaining about a feature that is there but they think that it isn't, that's a huge signal. It's a huge signal to say that our user experience really needs to be revisited. “We've had this feature for a long time but, you know what, it's three steps down. It turns out that it needs to be higher up on the informational hierarchy. Let's bubble that up to the top or let's just revisit it at least.”
So I completely understand that thinking about taking four stars to a five stars. It is, like you say, a siren song. But what we found is, by focusing on the one stars we're able to improve the performance and the value that we're delivering to those people that are unhappiest. And because you're focused there, those four stars will naturally turn into five stars.
We proved that over and over again. When we first started out and released to the app store we had, on average, 2.2 stars and we were very unhappy. By the time we got through – it took us a while especially on android because you've got this hangover of one stars that you have to work through – right now I think we're at 4.6 stars or 4.7 stars.
It’s something that we still look at today. Even though we've got other metrics and channels and signals that we get for user sentiment, we still look at the star reviews as kind of our guiding light to ensure that we're heading in the right direction.
LEDGE: I don't think that I would have guessed that. That's really interesting. You'd go, "Microsoft, they're beyond looking at the star ratings.'' That's amazing. It's good to hear that. That will be comforting for a lot of people that only have access to that.
So, shift to the mega scale data driven. You talked about the data science becomes an important thing because then you're colleting so much data that, how do you conceptualize that and turn that into more learnings and then merge the two? What's that process look like?
JAMES: We talked about app reviews, that the first thing. The next level of abstraction that I talk about is really the Net Promoter Score, the NPS score.
We needed additional signals, and we needed to really look at our existing user base objectively. Because, you're absolutely right, the app store reviews can be somewhat subjective at times. And so we needed to really look at Net Promoter Score and get a touch on our user base more holistically across the user base.
Once we've looked at the app store reviews, the next thing that we looked at was NPS scores. Back in the day we used a tool called Delighted, and that was where we were able to really engage our users and find very quickly and very simply, okay, what do you think of our app and the experience that you're having right now, by asking one particular question.
For those people that aren't familiar with it, essentially you ask one question, "Would you recommend this product to your friends and colleagues?" And you give them a score from 0 to 10. That response is extremely indicative of their sentiment of the product.
If it's a 0 then they really hate your product. If it's a 10 then they're a promoter, hence Net Promoter Score. Somewhere in between then that's not very useful feedback for us.
What the equation essentially does is take all of that feedback, it gives you an average score.
The next thing that comes after that, once a user has submitted their response from 0 to 10, then they ask, "Why did you give us that score?" And there we're able to extract some really deep insights into our users and really get an understanding across multiple different geographies as well.
That's the other challenge that we had. When you're operating at a global scale, how do you get a sentiment or understanding of your users from cultures that are not your own? That are on the other side of the world?"
What we did there was, again back to more of the subjective, we did a research project that went to over several months and we went out to different geographies. We went out to Asia, South America, and focused a great deal on those two regions. We realized within our own team we didn't have a good representative of those cultures and we needed to learn. Quite frankly, we just needed to learn.
What we would do is sit down with those users. We'd be in their homes and our researchers would ask them questions about how did they run their lives? Very similar to what we did within the enterprise space but this is on the consumer space. How did they run their lives? How they communicate with their friends? How did they set up a date night or how did they organize their family?
We'd see the post its on the fridges. We'd see the paper calendars hung up on the wall. We got a good sense of the messaging apps that people use and why they use them. And from there, we got that visual sense and subjective information that we can bring back into the mix and use that to inform data that we're getting from other channels.
The next channel that we led in as well on the data side of things was user voice. Very, very early on we integrated user voice into the app, and today when you go into Outlook Mobile you can, in there, not only ask for in-app support – which we can talk about because that was an innovation within Microsoft that spread across multiple different apps now – but you can suggest a feature.
So you can go in there and you can suggest a feature that you think is missing from the app, or if somebody has already suggested that feature you can up-vote that. That's another signal that we had.
The other key piece, going back into the data science aspect of it, when we first released the app we had very little telemetry. In fact, virtually none. What we did was instrument almost the entire app because we needed to understand what the user journey was as much as possible from the app store all the way through to becoming what we call an engaged user.
What the meant was that, as somebody is going through the onboarding process, we're looking at that as a funnel. As users are adding an account, that's another stage of the funnel. And then once they land in their inbox, that first screen, what are they then doing?
There's one thing is downloading the app. There's another thing adding you an account, and then opening up the app. But are they truly engaged? Meaning, are they using your app and getting value out of it?
That's when you need to think about, regardless of the application, what would we consider an engaged user and a user that is getting value out of our app, or out of our SAS application? So it would be, they purchase an item. That is engaged but that's kind of at the end of another funnel.
So, prior to that, what would it be? They are searching for something. They are reading information. They are sharing articles. They are liking things. You've got various different metrics and you need to look at those and say, users that are performing X, Y, and Z, we consider those as truly engaged. And that becomes the end of your funnel.
What we do is, we instrumented all of that and the first thing that we found was, counterintuitively, our onboarding process – where we're trying to educate people as they're coming onto and about to use the product, we're engaging users on the features – that ended up being a net negative on our funnel.
So we're very proud of our product and we're saying, "These are all the features that we have that differentiate us. Get ready! Prepare yourself because this experience is going to be awesome!” And so we spent a lot of time and energy developing these cute little animations and gifs and things like that.
Turns out that everyone, as they're downloading the app, the first thing that they skip is all through all of that educational information. They just want to get into the box and start using the app. That blew our minds. It’s like, okay, that's where we need to stop investing. And where we need to invest is, how do we make that onboarding process as seamless as possible?
That’s where we started looking at, people have email from Gmail, from Yahoo, from AOL, from their own server that they've stood up. There are lots of people out there that are still using Pop3, which unfortunately we still don't support.
We started looking at the challenges that users have in setting up those email accounts and we started to find various different ways, using you could call it machine learning, very rudimentary machine learning, and we stared pre-populating a lot of those fields that often people struggled at doing it themselves. And we just kept on finding ways to get smarter and smarter and smarter about that onboarding process, to narrow it down as quickly as possible.
That was where we really were bringing together the subjective data, which was, ‘I can't log in.’ ‘This app sucks.’ ‘I can't add my account.’ Things of that nature that you were getting through the one stars.
Then through the telemetry we are looking, where exactly in this process are we getting the drop-off on the funnel? Where are people really struggling? Are the confused about what an email address is and the SMTP ID?
There's a lot of under-the-hood minutia around email, because it's a protocol that’s been around for such a long time, that just confuses people.
So again, you've got to take that subjective as an indicator of where people are having challenges, marry it up with the data that you're seeing on the absolute funnel, and then focus the ingenuity and the creativity of the engineering team. Of the designers and of the data scientists that are still trying to kneecap some of these insights from the data, to really get at how you can add value and get people to become engaged with your product. Not just because you want them engaged but that is really your true measure of you delivering value, and sustainable value to … .
LEDGE: Fascinating process.
JAMES: There was another one that I just want to share as well which is maybe a little bit counterintuitive.
This was before we had a lot of our telemetry in place. We were still trying to build that out and get … .
LEDGE: Describe telemetry there if you would. When you say telemetry, what does that mean in the context?
JAMES: Basically, what that means is you put within the code a little call back signal that this particular user has tapped on the calendar icon, for example. Or they have tapped on the email, or they have…
LEDGE: So it's like activity-based analytics with things that they did.
JAMES: Yes, exactly. From there you get a mapping, an understanding.
Another key piece of telemetry is session length. A session length is defined as; I open up the app, I do a whole bunch of actions, and then I shut the app down or I background the app. And that is called a session length.
What you need to understand is, what exactly is going on within those sessions? You're doing this at an aggregate level. At an individual level you're not really tracking an individual person about what they actually do, but what you want to see is how many of your users are doing X, or how many of your users are doing Y after they've done X? So, are they starting to do X and then they go onto Z. You see what I mean? It's all done in the aggregate.
LEDGE: I imagine there's all kind of stuff. Like distance between push notification and action. There's all kinds of things there. What am I prompting? Am I prompting for better engagement? Is it useful to the user? Maybe that comes down the line but there's so many ways that apps now interact with us. They don't just sit there waiting for you.
JAMES: Exactly. Some of those insights that you'll get from the telemetry will be counterintuitive.
We were aiming to increase our monthly active users – that's one of our key metrics. So we're thinking, if we want to increase our monthly active users we need to increase retention. To increase retention we need to increase engagement. To increase engagement means we want a longer session length so people will write emails. We want to make it easier and more enjoyable and engaging to compose emails.
That was the path that we were going down, purely based off of intuition because we didn't have the data yet. But then, as we were going through this process, our telemetry started to come online and we started getting more of our data. We looked at it, and to our surprise our average session length within Outlook Mobile was 22 seconds long.
We thought that people were going to the mobile device and showing similar behaviors as you would have on the desktop, and they're writing these long emails. Contrary to that, our users, as they're waiting in line at Starbucks or whatever, they're whipping out their phone. They're opening up Outlook Mobile. They're scanning through their inbox to see if they've got an email from their boss. They realize that they haven't and they’re closing the app down.
Or they are going through and they're reading emails and then they'll get through a couple of emails and maybe will send a one word response, "Okay, got it," and then they're shutting the app down.
Meaning that really what our emphasis needed to be on was, rather than creation of emails or creation of calendar invites et cetera, it's more along consumption. We needed to make the consumption of information, whether it's on email or whether it's on calendar and then on search as well as an optimizer of that, that needed to be the emphasis moving forward.
So it completely turned on our head the strategy about how we're approaching the entire app.
Just from that one data point and being curious about that data point and asking the question, "Okay, why is it 22 seconds long? What is behind that? What are people doing and what's the context of where they are?" And then once we've dug behind that then we can say, this is how we need to be approaching pretty much everything that we're doing within the product.
The entire strategy and roadmap changed.
LEDGE: If I could summarize then, you had that 22 seconds that would have led you then to say, "What's specific things on aggregate are people doing in the 22 seconds?"
So you were able to track; well, they scroll up and down, they read some emails and delete some, there's obviously some kind of fingerprint of a scanning idea and quick responses so you were able to see those types of things.
It really comes down to tracking everything because you don't know what the next stage is then. How someone uses 22 seconds is an interesting way to think about the world.
JAMES: It's a process of successive revelation. What happens is, you have a hypothesis. You use an initial insight from telemetry and from data to either prove or disprove that hypothesis. But really what you end up doing is creating more questions. So, “Wow. It's 22 seconds long. People are really… What are they doing? Let's find out what they're doing.”
LEDGE: What they're doing. Not just assuming, "Oh, Jesus, 22 seconds. That’s horrible."
JAMES: Exactly. You're absolutely right, you could have approached it in a very different direction. Like, “22 seconds. That’s sucks. Let's make it 23 and 24." And that becomes your guiding principle and your key performance indicator.
That can lead you into a very… That would be a disaster because then what you're doing is you're developing based upon your needs and your desires as opposed to your users' needs and users' desires.
You've got to have this curiosity, like, "What are they doing?" And then you understand through a little bit of added telemetry…
Adding telemetry is often very expensive. It takes time. Think of it as a feature. It's got to be reliable, et cetera, et cetera.
Then you find out a little bit, “This is what they're doing," and then you've got to ask yourselves, why? There's all these other things that they could be doing, why are they doing that?
That's where you start to bring in the subjective aspects of it. That's when you need to be talking to actual users and observing actual users as they're using the application, and having that conversation.
That’s, like I said at the top of the conversation, is this marrying of art and science and being able to bring them both together. It's the science of data science and then it's the art of user research. They both need to be hand in hand.
LEDGE: Well I bet we could both sit and do this all day, so I'll ask you the last question. One thing that I would want to know is, what do you wish for and what do you wish it could do through all this process that maybe technology can't do? Put on your futurist hat and go, "Man, I really wish, I hope that it goes this direction in five years for people who are solving problems like this."
JAMES: That's a really good question. For me, I'm looking for the products and services that we're building to be more proactive and predictive.
What I mean by that is, the onus is still on the user to initiate the transaction. Meaning, "I think that I want to go on vacation and so I'm going to go to kayak,” or, “I think I need to buy something therefore I'm going to go to Amazon."
Neither of those services, and no real services, are really anticipating a need. So, how do you get a true understanding of the user and their context, because you're not looking to solve everything for everyone.
How do you get a sense of what they need and be able to reach out proactively in a non-creepy manner, – there's a lot of empathy that needs to be involved here – in an elegant way, and then offer up suggestions or paths that solve either long-term or short-term needs?
To boil it down. How do we start being more proactive and predictive in delivering value and satisfying the needs that the user may not have even known that they had?
LEDGE: Well, I look forward to when Outlook can start telling me, "Hey dude, you look like hell and you need a vacation."
JAMES: Exactly. I mean that's the direction that Outlook mobile can go in, right? I mean, we know a lot about the individual – what's on their calendar, how they're organizing their day – so that we do have some data points that we can work off.
We're constantly building intelligence into the app. You're seeing that in some very incremental ways but you'll see more moving forward.
Again it's really, what are the problems and the challenges that we're looking to solve and can we be a little bit more proactive in that? Freeing up people's time. Helping them to be a little bit more organized and maybe a little bit more focused. We're all extremely distracted, right? How do we bring focus and calm to the individual so that they can not only be more productive at work but they can be happier in their lives.
More and more demands are placed up on us human beings, and the boundary between work and play is finer and finer. There are many cultures in this world that you're switching between work and personal modes every few minutes. How do you lower the stress level? How do you bring calm? How do you bring focus? How do you enable a user to focus on everything that's outside of the app? That's when you're truly delivering value.
When your app can make the user happy when they're not using it, then you've succeeded. The user is focused on the conversation. They're focused on their children. They are focused in that meeting. They are focused as they’re being creative. They're being creative, they're adding value, they're having a more fulfilling life because they're not using their app but it's enabled by your app. That's true success.
LEDGE: That's a big vision. I can't do better than that. We'll finish with that one.
James, thank you so much. This has been super enlightening. I know the audience is going to love it.
JAMES: Awesome. Thank you so much. It's been a real pleasure talking to you, Ledge.