The following Q&A is part of a series of interviews conducted with speakers at the 2017 ODSC East conference in Boston. The interview has been condensed and edited for clarity. This interview is with Kishore Aradya, , whose talk was entitled “From the Trenches: Managing and Building Data Science Teams?”
What sort of questions are you hoping to hear from some of the participants?
Some of the things I would like to hear is, “Oh, by the way, we’re in the same boat as you are. We really don’t understand how this is supposed to work. We don’t really know what a real composition of the data science team is supposed to be. It may be the scientist, it could be the engineers, or what is it that you want really?”
“We call ourselves what?” Data scientists, we call us data analytic engineers, or what is it? There’s many aspects to that. When you look at all that is across the board and names themselves gets muddied and so many people use BI to be the same. I would like to get more, learn more from others and maybe people will get some ideas from my own journey.
In you talk description, you mention “Stories from the trenches.” Would you care to tell an example?
Sure. One of the things that I encountered when I was first hired, not necessarily in this part of my control but in the previous role that I was hired is to be like, “Hey, you know what? We’re building a data analytic product. We don’t really know what it means.”
And I was, “Okay, this is interesting because you’re hiring me to run a data analytic teams but you don’t know what it’s supposed to do, what it’s meant for.” One of the things that I like to do is to take those things up as a challenge, to try to understand what it means and so I took upon the challenge.
It was a journey for me to– when I have these conversations with executives at different levels of the company, each of them are looking at it from their own lens. It’s like almost looking like, there’s a big elephant with a lot of people who are blindfolded touching different parts of the elephant and each of them think that it’s something completely different.
That’s the impression that I got out of my journey as I went through that, right. What it is, then you start to tease things. What is it that you want from it? What business insights are you trying to get out of this thing? Are you actually looking to sell this product to an outside entity?
Or are you planning on using the output from this internally to make business decisions? Do you want to then take the business decisions that we make from it and later turn into product that you could then go off and sell to an outsider? Is it a revenue generator?
Those are the questions you’re going to be asking about. Not only that, when you talk to different people from either Finance or from Customer Service or Sales, each of them come out at it from their own perspective, on their own purview.
A lot of the times, the data analytics team as we are, our data science teams; it’s an opportunity for the team to pull data across a 360 degrees view, across how the company is really operating and are they really– Is the synergy between the different teams can go through with that? That’s one of the challenges that I realized as I went through this but those are the kinds of questions that you do get an answer for.
Exactly. Why are data science teams different from these other software engineering teams? What are the unique challenges that they have that other software engineers don’t have?
That’s a great question, that’s the reason why I’m having this conversation, right. I have been managing teams for over a decade now and my background isn’t software engineering. I have Masters in Computer Science, then all the period I went out got an MBA.
So what I wanted to do was to really understand the nexus between technology and business and how those things work together, right, that’s what at least drew me to be there. Now when you look at typical software development, what you’re doing is you are creating products for– that’s straight-laced software development.
Code, get the product out there, make sure that the systems don’t fall down, make sure that there’s continuity of it, so it’s a very straight-laced activity. Wherein, there’s very little in terms of interfacing or analyzing or working with other teams or other ways of looking at it.
The entire organization is built around delivery. Build this, deliver this and get it out there in front of the customer. Whereas,when you take a step back and look at the data science as data analytics as a whole, here you have people coming in from various backgrounds.
It’s not just somebody who is coming straight from engineering masters and so the path and the title and the trajectory that goes with that is pretty straight forward. Whereas here it’s not very straight forward at all. The titles and the roles are being defined as we speak.
Like in my own company, other company, it’s a journey. Everybody is trying to figure out what it means to what. When you hire people from different background, people who have mathematical background, more statistical background, more samples and more computationally background, where they’re more into software engineering and data engineering.
People who are able to go and create data, database, DBAs, also come into those roles and people with different background. People with Physics, PhDs in Physics, they are some of the best data scientist that we’re found to be and there’s a reason for that. Because they look at data in a very different way, they look at problems at very different ways not a defined thing.
It’s complexity that’s not– it’s not a defined complexity, it’s open-ended. In an open-ended concept, you shouldn’t have fear. First and foremost, you should be fearless enough to go and try to figure those things out, so those are the different kind of variations we looked at.
My next question touches on the soft and intangible skills needed for data analytics. Of course, you can hire someone with a PhD, the best data engineers and so on. But what exactly are the soft skills needed for a successful team?
If you look at– I just want to contrast that with a typical software development thing, right. We have seen and I have seen, I’m sure all of us here and the audience have seen this where you get to the point where you could actually have this person not interact with any other person in an isolated way and produce very good high quality code, right.
Meaning it is an isolated activity to certain extent. I’m not saying it is entirely isolated but you still will work with the code that’s been build but it’s relatively an isolated activity, when you compare that with data sensitive. Because now you have to interact with the business side.
Now you have to interact with folks directly, there has to be a feedback loop and then there’s a lot of ability to present information and the ability to be able to portray and ability to elicit feedback in a very constructive way, without necessarily having to take it in stride as things go through.
People are going to come back to you and and rework those questions. They’re going to be like, “Well, I didn’t know what that meant. Can you go back and figure this out for me? Can you go back look at these numbers?” Your mindset has to be very flexible, it’s a very different approach to doing it.
That’s why the soft skills, the ability to work with somebody and the ability to understand what a person is trying to do, or trying to say. And also, visualization is a very different beast. It’s not illogical, this concept, there’s an arch to that and so again these are, some people call it a soft skill.
These are not necessarily computational skills but these are skills that are oriented to a static way of looking at it, be able to produce information visually appealing and so that it’s part of, it’s all elements that goes into that. You need to have all those variations then.
What advice would you give to a company executive starting to build a data science team?
I think one of the first hires, I mean hiring is such a key role. Especially if you’re building a data science team, the first things you want to do is; you want to look for a Jack of all trade. What I mean by that is you don’t want to get somebody do is, “All I want to do is do machine learning.”
“All I want to do is just build this item, build this code. This is really cool stuff. I want to do that,” right? People who are enamored by technology for the sake of it is really not the kind of person you want to hire. At least not initially because you need to build the skill-set up in varied ways.
Not only that, you also want somebody who understands your domain. Who understands the business, “What is the business context? What is it that you’re trying to do, why are you going to do with the data? What are you going to do with the analysis? What kind of actions are take from that?”
If you don’t understand the domain, if you don’t understand what you’re trying to use for the output that you are producing for, then you’re just doing it. Then you’re just crunching into the numbers and producing database. So you really need a person who understands the domain, understands the business output and is able to present the information.
In order to break into and succeed in data science, does someone need graduate degree from a prestigious institution?
In my experience, I worked in so many different companies who I now have the privilege of working with some of the smartest people who have had masters and PhDs from Ivy League Schools and so on and so forth.
I have also work with folks who don’t have a degree, who haven’t gone to college who are just really good in coding, really good and ability to produce very high-quality code, right? The point I’m trying to make here is it depends upon what it is that you want to achieve with that.
Meaning if you are in a larger organization, a larger enterprise like for example, big financial company like Capital One, they hire hundreds and hundreds of data scientists. They’re looking for some specific con, specific background with people that do this, right.
So they’re looking for depth. They are really looking for somebody who has spent years researching, doing this piece of work, doing it. Where they really want you to model, regression, whatever it is, the different models that you need to use. That does not come but cannot possibly come from taking a few courses here or there.
There’s no sweat equity, so unless when you put some effort in there, you can’t really say, “I can do that.” That’s one piece of that but on the other end of the spectrum, if you are just looking for, “You know what? I just have this startup. I just have this product out there. I’m collecting a gargantuan of one of the data. I just want to make sense of it.”
I want to have a system where I can suck all the data, put it into a system, make it available either; create some dashboards, create some– so that’s the kind of mode of thinking that you want to have, then, yes, great. You can hire anybody who shows inclination to that background and then grow the person meaning it’s an investment.
Once you get the person in there to look into it, now, you invest in that too make a work. And then you hire some, mix them up with other folks with more depth, so then you can start doing it and train the team, build the team up to go after to go after it unless in PhD, everybody’s doing it.
To try to get hold of that kind of thing. It’s just not going to work and a lot of the PhDs, they’re very much interested in creating research, research papers, and patents. That’s the direction that we’re going to have.
If you’re lucky to get hold of somebody who’s practically oriented as well as that, great. Hold on to those people. They have all done that, doing all that, but the reality is you need to train. The company’s need. It’s an investment.