Currently valued at $4bn, Clubhouse is a tech unicorn (a start-up company with a value of more than $1bn) at the forefront of the new social audio era sweeping the internet.
After a meteoric rise involving more than 10 million downloads, the app is now facing stiff competition from almost every existing social media platform looking to emulate its success and attract new users.
As it fights to retain users and empower creators, Clubhouse has secured partnerships with the likes of TED and the NFL, and enhanced several features to attract new users.
We spoke to Justin Uberti, Clubhouse’s head of streaming — and the creator of WebRTC and Google Duo — to find out about the “spatial audio feature” and what the future has in store for social audio.
BP: Justin, thank you for joining all the way from the other side of the world.
JU: Literally, yes! I’m excited to be here. Great to talk about Clubhouse and any sort of thing you want to talk about in this space.
BP: Before we dive into spatial audio — which you guys have just recently announced — could you clarify what your position as head of streaming entails?
JU: Clubhouse is a service that’s built around voice and audio and that means things that are related to voice and audio fall on my desk. There’s a lot of things I’ve talked to Rohan [Seth] and Paul [Davison, the founders of Clubhouse] about what we might be doing for the future in terms of making the service feel more natural, a great place for listeners and a great place for creators.
Spatial audio is one of the first things we worked on. I think it’s just a really interesting place to take a service where voices are the central feature. We have a million ideas, so I spend a lot of my time just thinking about what we can do. What are things that our users really appreciate? We try to think about having a service that’s not about text or images, it’s about voice. Where should you go? That’s kind of out of what my charter is.
BP: Why do you think that live audio has taken off in the way that it has; is it purely because of the pandemic or are there other factors?
JU: I think there’s multiple factors; the pandemic certainly kick-started things, but I think that there are a few other interesting trends that only became possible in the past few years.
One is the notion of just consuming audio. For a while, if your phone started making a sound, it was an annoyance that you had to go and silence. Consuming audio just wasn’t easy. But now you see a lot of people with earbuds or headphones and that’s helped make consuming audio a lot easier. I think that’s a pretty positive trend that has kind of helped move this whole sort of notion of voice and interactive services forward because it’s became a lot easier to get a really high quality audio experience just by popping in your buds.
I feel like that’s something we’ll see continue to advance as it becomes more possible to get a good quality set of earbuds and consume audio. That’s one trend that Clubhouse benefits from.
I think that another interesting trend is that typing is still hard on a smartphone. Not hard in the sense that you can’t do it, but rather that you don’t want to sit there for an hour typing. That sort of thing has not really been solved. So, with people gravitating away from desktop and towards spending their time with their phone readily available and being able to engage with each other at length using just their voice is a more natural interaction method. I think that’s something that Clubhouse also gets a bit of a tailwind from.
Thirdly, I think that just getting used to interacting with people in online room environments, that’s something that everyone was forced to get a crash course in because of the pandemic. And now you’re kind of experienced with moderation, managing your mic, and all these types of things. Now you’re engaging and can actually participate in these casual spaces very naturally. So I think those are a few different things that have all worked together to benefit Clubhouse.
BP: The observation about typing is an interesting one because many people that I know prefer audio when it comes to quick bursts of communication because it carries the nuances of what you’re saying a lot better than text. It’s also more convenient to do something like send a quick voice note while I’m busy with something else than it is to send a text message.
JU: Definitely, there’s not a huge place for text. Don’t get me wrong: if you’re dashing off a quick message on something like Whatsapp then that’s perfect if that’s the way you want to communicate. But if you’ve made the decision to engage for a while, I think that voice is much more natural than text.
BP: This brings me to the topic of spatial audio, which is a new feature that you’ve just launched. While audio is a great means of communicating, social audio can still be a bit flat because you’re not physically in the same space as the other person, or people, that you’re communicating with so you're not getting that same type of aural experience that you’d normally get in real life.
JU: Yeah, if I digress for a moment here, when I was working at Google and on things like WebRTC, we were excited about the fact that you could have these online meetings with video and voice and people connecting from whatever location they were in. We’re all quite proud of that work, but I think that until we really were forced to use it all the time we didn’t notice where there may still have been some gaps. Personally speaking, we put an enormous investment into making video work — getting HD resolution and all types of connection — and in some ways didn’t give the same amount of attention to audio because we kind of felt like it worked. There weren’t a lot of frontiers to conquer there.
When we look at Clubhouse, which is a voice-only service as you know, in some ways it forces you to focus because you look at the problem space in a different sort of way.
I spent time just listening to a large number of rooms and getting the feel of the interaction of just hearing music or comedy and hearing people kind of go back and forth. And as I was looking and kicking around some ideas of what might be an interesting thing to develop as a feature for the service, some experimentation with spatial audio hit me. It’s something that really feels very visceral, especially in this sort of voice only service. I feel like it’s creating a qualitatively different experience. And so that notion that voice limits what you can do, it forces you to develop within those limits.
We did some demos, built some prototypes, and everyone really liked how it made the service just feel that much more immersive and natural. And, coming back to what I mentioned before about the ethos of Clubhouse, we thought it was very true to that. So that gave us a lot of energy to go on and try to figure out how we turned this into something that goes from being a tech demo to something that really works well for our entire audience.
BP: In a recent Twitter thread you spoke about the benefits of spatial audio for human communications, could you expand on a few of those points?
JU: So I talked about the increased immersion and more human-like feel. That was the primary goal of what the feature was intending to do and I feel like that it does it nicely. But, as we got into it and looked into the literature around what this means for better intelligibility and cognitive ease, the overall feeling was that spatial might be useful in other ways in terms of more ease and less fatigue — we’ve all seen articles about the sort of fatigue that people get from being in online meetings for hours a day.
It can feel like a really bad experience when you tell a joke and hear nothing, but when laughter [from the audience] comes in from all directions around you, that really feels like a qualitatively different experience because that mimics the real life experience. We don't really have much in the way of this tooling right now, but it’s easy to understand how this could be expanded through some sort of creator controls. And I think there’s a lot of potential here.
One thing Paul mentioned was ghost stories where the sound could be heard behind you. I think it just makes the canvas a little richer and there are more tools that the creators can pull upon.
BP: Who does spatial audio work for?
JU: The feature right now is turned on for iOS users and we can’t wait to bring this to Android. One of the things that was interesting about this feature was the fact that with a lot of features on Clubhouse, we turn them on and people have to kind of stumble across them. I wanted to be really careful when we rolled this out because we’re changing the default experience so we want to ensure that we got it right. We had a really careful rollout plan. If it felt like we were going to have to go back and do some reworking of the experience on iOS, we didn’t want to have to go do that rework twice on both iOS and Android. So we had iOS to make sure we felt good about the experience and how our community was receiving it, then we would go in and bring it to Android. Since that has gone off well, we’re full speed ahead on the Android experience. We’re hoping that in a month or two we’ll be able to go and deliver this because we’re quite excited to get this to our Android users.
BP: Clubhouse adoption in Africa has been slower than other regions, which makes me wonder about retention. You guys are very good at listening to your users, but at the end of the day you have the likes of Twitter who are building similar services and platforms into their apps and platforms. Why would a user want to download another app that only does one thing?
JU: I don’t want to spend too much time talking about competitors, but I’ll give an anecdote from my previous life. At Google, going into any sort of thing we were working on there was a general belief that the Google ecosystem would give us a tailwind in any new sort of venture that we moved into. And I think when you talk about what these other companies are doing, they hope that they can use their prowess to move into this voice space, but in my experience that is far from guaranteed. The more different the new behaviour is from the existing behaviour, the more it’s like you’re almost starting from the beginning and in some ways you’re redefining what your product is in people’s minds. There’s a lot of human psychology there. Clubhouse sort of sidesteps some of those issues, but that being said, we’ve got to go out there and execute and do things to really capitalise on this opportunity.
I think that in many ways we’re just less constrained and we don’t need to fit into an existing shape. We have the most ability to understand what it is that people who are coming to the service actually want.
It’s early days, but one of the reasons I joined Clubhouse was to try to figure out what a voice-first service should be. In many ways it reminds me of when I was at AOL. One of the things that people loved was that AOL had a sense of community from things like AOL chat rooms. In some ways, it’s almost like everything that’s old is new again. This is kind of like a communal place for people with voice instead of text.
BP: What is the future of audio?
JU: There’s a lot wrapped up in that. You start asking if there’s a way where voice can become an opportunity to interact not just with the participants but also maybe with an application itself? I don't want to give the indication that we’re going to go down any specific path or anything like that, but it’s an interesting question. I’ll leave it at that question of what else can be done with voice and I think it’s something that has not been really deeply explored.
The power that people have now within their phones or home computers for doing quality audio production has come an incredibly long way. The processing you can get from the app on your phone is unbelievable and you have to wonder what does that mean? What kinds of experiences can a musician create by just plugging into their phone, especially if you have a social sort of application as part of that.