Can you tell us a little about your background and where you’re from?
I’m from New Delhi in India where i did my bachelors in Electronics & Communication engineering (B.E.) from Netaji Subhas Institute of Technology. I did my undergraduate education in India before coming to the United States for my graduate studies. I did my masters and PhD in Electrical/Computer engineering at the University of Wisconsin-Madison. Then I did a short postdoc at Princeton before coming here.
How did you get into computer science?
I got into computer science after I got this job, basically. Or to say it another way, my PhD is actually in electrical engineering, as is my whole graduate and undergraduate education. But the interface between these different disciplines is becoming so blurry now that my PhD work can fall in either signal processing or machine learning. Especially towards the end of my PhD, I started working on topics closer to machine learning. That’s why when I was applying for jobs, I opened up my search to computer science departments though I did not apply to too many CS places (mainly because I don’t think I am fully qualified to teach undergraduate CS courses as I haven’t taken many). So really I’ve been a computer scientist for the past 5 years.
And Carnegie Mellon just happened to be one of the places.
Yes, especially given that they have a unique machine learning department, like no other place has. Machine learning seemed like a good choice, very aligned with my interests, and that’s how I ended up here.
So all of your background is in ECE. How much overlap is there between ECE and machine learning, since you mentioned a little about signal processing?
There are sub-fields that are quite different, like power systems or nanotechnology in electrical engineering. Computer engineering is somewhat of an overlap, where people design chips, or in general, any kind of computer hardware spans the two departments. But more recently, the boundaries between signal processing and machine learning at least are becoming narrow. When you’re trying to do signal processing using large number of devices, and really you’re talking about scaling up, you get into big data issues such as caring about the runtime of your algorithms which are central to machine learning.
Why did you decide to go into research rather than work in industry?
I think that was a personal choice. I like this lifestyle better in the sense of having more flexibility to decide what I want to work on, even when I want to work on it. I’ve always been in the academic environment. I sort of liked it much more over industry where you have to commit to a certain amount of time that you will go and work in an office and a certain project that somebody else is driving. So yeah, I think I really like the flexibility and sense of ownership that research environment in the academia provides.
You mentioned you didn’t go into CS until coming to Carnegie Mellon. What led to your interest in machine learning specifically inside ECE?
Initially I was working on signal processing, but more the networking aspect of it. If you have to send signals between different wireless devices, how can you devise protocols to exchange information between different devices? I slowly got more into the math behind it. That’s where I found myself designing and analyzing sensor networks, how different sensors in a network should interact with each other and move along different paths to cover an area to estimate some field of interest. This was all approached using signal processing, but if you think of it, it’s really about collecting lots of data and making sense of it, and that’s machine learning. Especially towards the end of my PhD, I found myself working on problems and using ideas that were very closely related to machine learning.
How would you break down what machine learning is to someone with no knowledge of what it is?
Machine learning is essentially the design of algorithms that learn with experience to perform better on a task. As you show the machine more and more examples, it learns to perform a task better and better, just like humans. We can look at, say, images of buildings versus some nature scenes, and we can differentiate them very easily once we have seen a few examples. Similarly, we can design a machine learning algorithm that can differentiate between these types of images. The more examples the algorithm processes, the better it gets at automatically classifying images into natural versus man-made buildings. Of course, there’s the math that goes behind the algorithms. Machine learning is about designing and analyzing algorithm that learn to do things automatically. And classification is just one task. Another example is designing algorithms for learning to map a field — you have some sensors for taking readings at some locations, but now the algorithm should figure out what are the readings of the other places using appropriate models.
Could you explain more about your own research, like what you’re doing right now?
I work at the intersection of statistics, machine learning, and signal processing. Basically, it’s about analyzing algorithms and fundamentally talking about how many examples does an algorithm need to see to guarantee a certain accuracy on a task. In particular, I’m looking at large data sets and how we can extract meaningful structure from the data. The structure could be, say in a social network’s data set, some underlying groups or communities that you may not know exist beforehand and you want to learn it from data about who talks to whom and how often. Or you want to look at brain activity and identify which regions of the brain are talking to each other. So that’s like learning a graph of brain connectivity. Another problem we are looking at is clustering of asthma patients according to the severity of their symptoms so that you can identify target groups for specific treatments.
These are all examples of different types of structures: clusters or graphs, that you want to infer from your data. I am studying how can you extract such structure when in modern data sets, a lot of the time, you don’t get to see everything — when a lot of the information is missing — or you can specifically query for certain information. Maybe certain information is cheaper to obtain, other is more expensive. So how can we use interactive queries to decide when, and when not, to query certain information to learn these types of structures in a much faster way?
What would you say is the most interesting thing you’re working on, or your favorite part of research?
All of it is my favorite! It’s hard to work on something if you’re not driven by it. I think the best part I like is being able to work on fundamental math and at the same time be able to carry it forward so that it has real meaning in some application. Trying to bridge that gap — not just running a heuristic algorithm that works well, but having understood the fundamental principles behind why this algorithm is working well. And also taking real challenges and designing principled backend algorithms that can meet those challenges.
Have any of these algorithms that you’ve designed been applied in the real world?
I’m collaborating with people, especially the department of psychology and CNBC, and also the Lane Center for Computational Biology. Those are the two projects I mentioned — one where we are trying to learn about brain networks and the other on clustering asthma patients. So not commercially, but at least in research, the algorithms we are developing are being applied to explore the potential of some very rich data sets. Neuroimaging techniques are now pretty modern; they can map very very high resolution connections between the brain regions. And currently neuroscientists don’t have the methods to be able to learn from that rich data set. They tend to ignore a lot of information they are collecting. The algorithms we are coming up with will be able to help them model the complex data in a much finer way to be able to learn from it much better than they would with simple algorithms that only work with some summaries of the data instead of full resolution data. So yeah, we’re working towards applying the algorithms we develop for real world tasks.
How much freedom do you have in choosing what projects to work on?
I have complete freedom! I can pick literally whatever I want to work on. Of course I talk to collaborators and find what would be good real challenges to solve. So I do listen to other people. But ultimately the choice is mine for what to work on. Sometimes people think that the research might be driven by funding, but I think it’s that I am driven by something, and then I write a proposal on it. So it’s very rarely that I look for, “Oh, here’s an opportunity, and I should try to fit my research to it.” I do what I do and then I learn to sell it.
How much of your time doing research is divided between writing proposals versus doing actual research?
I think most of it is about doing either actual research or at least supervising students to do it, and thinking about the higher level picture. The actual part about grant writing is actually fairly small. I don’t spend too much time on grant writing. I write maybe two main proposals a year. There might be a few other ones that I’m co-PI on, but as PI, I never write more than two. And I commit maybe three weeks to each, not completely full time, but the grant is my primary focus during those three weeks. So that’s about it. I don’t write 10 proposals a year (laughs).
Could you tell us more about what classes you’re teaching or what you’ve taught in the past?
I’ve been teaching mainly the graduate machine learning courses here. We used to have two, 601 and 701. 601 was geared more towards masters students; 701 was for PhD students. This year we have actually divided it into three, because the enrollment has grown so much, and we have split 601 into two sections. So we actually have close to 120 something students in each section of 601 and we have close to 140 student in 701, and we’ve created 715 which is the advanced version of 701. I don’t have the exact numbers, but maybe close to 50 students there. The enrollment has really exploded in these courses.
In one year, I teach either 601 or 701, mostly 701, and I might now teach 715. The other semester I try to do either 702, which is statistical machine learning, where we go more into the analysis of the algorithms and the theoretical part, or a new course I’ve designed: 704, Information Processing and Learning. This course brings together concepts from information theory, signal processing and machine learning. Because I use a lot of signal processing and information theory in my research, I thought it would be good to try to teach that, and make students aware of the connections between these fields. Although it’s a plus to have taken 601 or 701, I try not to make it a prereq; we pretty much start from the basics. This will only be the second time I’m offering it.
You mentioned 601, 701 and 715. What’s the difference between them? Is it just that they get harder and harder?
Yes, they get harder and harder. But do we expect someone to take all three? No, definitely not. We recommend taking one of them.
Fundamentally, the way we designed 601, it is slower paced and more aimed at masters students, and 701 is aimed at PhD students whose main line of research is not machine learning. And 715 is geared more towards doing research in machine learning. So that’s the difference we’re trying to aim for, but I don’t know how successful that is.
As you see machine learning becoming more and more popular, do you have any thoughts on that? About why that’s happening?
It’s definitely a very exciting time for machine learning, and computer science in general, but in particular for machine learning. Part of that reason is definitely the advent of big data. There’s lots and lots of data in almost all the fields that’s been collected now that hasn’t been collected earlier, or even generated. There’s the fact that people are active on social networks; there’s all sorts of financial transactions going on online; we have much higher-throughput technologies. So I think that has led to definitely a big push in machine learning algorithms and research on it. And also the fact that data sets are getting much more complicated. Earlier, you could get by using some simpler approaches that you’re not able to do now, both because the data sets are complex, and also because their size is large. So you have to care not only that you can make inferences accurately, but that they also have to be efficiently done: short run time, or a small amount of memory, and so on. So that’s where, I think machine learning as fusion of statistics and computation really took off, and what’s leading to its growth. The applicability of machine learning is very wide, and it’s only increasing. I think this will continue to be the main strength of machine learning and help it grow much more.
Do you prefer teaching or research?
I like both…yeah, that’s a hard question. Do you really want me to pick one?
I think that they are both satisfying in their own ways. That’s one of the reasons why I took on this position. In a purely research role, I’d miss the interaction with a large class of students and talking about what you do research on to a broader audience that can learn from you. If I did only teaching, that won’t be very satisfying because then I won’t be exploring new things as much. So they complement each other.
We’ve noticed you’ve been doing some outreach. Could you tell us more about that?
I have been serving as panelist in local area high-school STEM events. Also, I have been active with OurCS, the Opportunities for undergraduate research in Computer Science. It’s a workshop held every two years at CMU and brings together undergraduate women from all over the country, and even countries outside the US. I think we had participants from Qatar, some European countries, even some Asian countries. These are all women undergraduates who came here and get experience on different projects, and also trying their hands at a small research problem. I served as one of the research team leaders who take a small group of students and supervise them on a short research problem they can hope to address in two days. That was a lot of fun, getting to know them, and hopefully motivating some of them to pursue graduate studies, which I hope you will do as well.
For someone who wants to get into graduate studies, what can they do right now as an undergraduate?
Just be active in terms of personal development and professional development. Start to get a sense of what areas you’re more excited about, and try to do some projects in those areas. I think that can really give you hands-on experience with what it’s like to work in a specific field, and what kind of contributions you can make. You realize the impact of a particular field only after you’ve done some projects in that area.
So what advice would you have for someone who might be looking to go into grad school, or looking into research in machine learning?
I think the most important thing is to figure out what you want to do. Do you really have a strong desire to go into industry and join the workforce? That’s a different lifestyle you’re committing to. Or are you more motivated by trying to further extend what’s already been done? Research is all about taking things from where they are to where they can be. So do you want to contribute more in that direction, or do you want to go into the workforce and make whatever exists better by implementing it. To be able to decide what’s better for you, the best thing is to try it out a little bit. So do internships, which should give you a sense of whether going straight into industry is the right thing, and also work with professors on a small research project which will give you a sense of what it is like to be a grad student. And then you will be in a better position to make the decision of what’s better for you. There is no wrong answer. But at least try your hand out at both those things.
Did you try working in industry at some point?
No, that’s something I wish I had done more of. I did not do many internships, I just didn’t try for it. I was really attracted by research. Working in a company just didn’t attract me. But in retrospect, I think that experience is important.
I think it is good to know what it is like to be on the other side. Sometimes you just don’t know what to expect unless you try it.
Were there any things you were surprised about or didn’t anticipate when you got into research and grad school?
It’s a great feeling when things work, but realizing that things don’t always work is something that you sort of know but hope doesn’t happen to you, but it almost always does. Research is not about just sitting down and getting to a solution, it’s about working towards it and realizing when you are heading down the wrong path sometimes. That was really the learning curve, getting intuition about when you can make the decision that something is working or not working. And learning from your mistakes, improving as you go ahead. You’re always very excited when you’ve made up your mind to do cool research, but realizing that you have to work towards it slowly.
Within computer science (or even outside of computer science), do any new innovations or discoveries excite you right now?
There are several. As I said, it’s a great time to be a computer scientist and a machine learning researcher, especially. We are extending the boundaries of what machine learning can do, not only in terms of its applicability to lots of different things that we never thought of before, but also in terms of developing systems that are learning forever or are learning by interaction with humans. I think those are all great venues for machine learning to grow in — not just, “Here’s a data set, analyze it,” but rather, “How can I keep learning as I go, and then also interact with external knowledge to make the algorithm better and better?”
Another possibility which machine learning is going towards is that — so far, we have been studying things in isolation — we care about how many samples we need or how much runtime it takes to train our algorithm to a desired accuracy, but we haven’t tried to optimize things jointly. Moreover, not only do we care about accuracy, samples and runtime, but also memory and storage — what part of the data to load into the memory on the computer, versus storing the rest on a hard disk or in some cluster somewhere (in a distributed manner versus centralized manner). What are the right tradeoffs between all of these quantities together, and how can we design algorithms that jointly optimize all of these objectives? I think this is something that machine learning is barely starting to do, and I think that is a very exciting place to be, because all of these factors are really important in modern data sets. We do care about not only speed and accuracy, but also where data is stored, and how we can access it. Even issues like privacy is an objective that is important to address, and that algorithms are just barely starting to do. Even if they address some of these issues, they are very much done in pairs — normally you would think about privacy versus accuracy, but not care about runtime. Jointly optimizing all of these is going to be a very exciting and challenging problem for machine learning.
So what’s going on in the joint optimization area right now? Is it just being researched at CMU, or is it being addressed everywhere?
It’s happening all over, including CMU. People have started accepting the fact that all of these are objectives that need to be optimized simultaneously. For example, consider how many measurements you need to take and runtime jointly. How can I equate one measurement to one flop? We don’t understand this, so people are trying to think of ways to compare these two and say, “How can I give you more flops but take away some number of measurements?” What’s the right tradeoff? But again, people are still thinking of these in groups of two or three — measurements, accuracy and runtime, or communication constraints with accuracy and privacy, but not all jointly.
Outside of research, what do you like to do in your free time?
If you consider free time as time outside of work, I have lots of it now that I have a 1-year-old. For the last year, my time out of work has been spent playing with him, taking care of him, and enjoying my time with him. Whenever I’m not working nowadays, I’m with my baby, enjoying our time together. Before that, there isn’t much free time you get as an assistant professor, but whatever time I used to get, I would do different things — I don’t have a single thing that I can call a main hobby. I’m a jack of all trades, master of none — I would read books, paint or do craftwork, go swimming, go to the gym, watch TV (while working), try out new dishes — there was a lot of stuff, but no single thing that I can call a main hobby.
How do you balance your work and personal life?
That’s one of the things about academic life — it’s as flexible as you want it to be. I work no later than 5:30 PM, and I try to get home by 6:00. From then until he goes to bed, I’m playing with him and spending time with him. Even in the mornings, he wakes up at 7:30 or 8:00 AM, and I’m with him until I drop him off at 10:00 AM at daycare. I spend a lot of time with him, since I’m not going to get this time again, and he’s going to grow up soon. It’s not easy to balance work and your personal life. But having a balance is really the key, and a baby teaches you a lot about that, and how to prioritize things. There are certain things you should focus on, and forget about the rest. If your bed is not made every morning, it’s okay — you learn that you don’t have to be a perfectionist, because that’s less important than spending time with your kid or working on that grant or paper deadline that you have. Really focusing your efforts helps, and you have to give up your idea of perfection on everything.
What advice would you give to girls in high school who are interested in majoring in STEM fields but might not have that many technical opportunities in high school?
That’s actually something I feel quite passionately about. I have been going to STEM nights that high schools organize, especially for women. One thing I’ve realized after talking to them is that very few are motivated to go to graduate school, and even if they are, they are more interested in bio and health-related fields, not computer science or math. I can’t really blame them, since they are not really exposed to what it means to be a computer scientist. They think it’s mostly about games and coding, two things that never excited me. Truth is — as a computer scientist, you get to work with neuroscientists, computational biologists, healthcare professionals, etc. Even if you are more motivated by biological applications, computer science is a great way to help out these fields, since every field has large data sets that need to be analyzed. A human cannot think of all the possibilities; for example, we might not be able to think of the best route to get to a tumor, with the intricate neural connections, but algorithms are able to do that for you. Computer science is a lot more powerful than what high school students are being exposed to. Somehow we need to make them more aware of all of this. How can we do it? Many universities run programs like CMU’s TechNights program. There are many opportunities like this for high school students to be involved in, and I think we need to advertise them better, at least to local high schools. That’s a role we can play and are playing at CMU.
Do you have any general life advice or words of wisdom?
Being flexible and learning from circumstances is important, because very often, we have our minds set on something, but when you go through some experience or you listen to others, then you realize that things might actually be different. Taking input and actually using it, is important, instead of just thinking that whatever you have in mind is right. Being open to other ideas and experiences is very important.
It’s funny that you describe it that way, because that feels like what the computer is doing when it is learning.
Exactly, that’s right. We are thinking of it nowadays as not training a computer, but more like raising a child (as Tom, our Department head says). We should give computers the same experiences as we get, so eventually, they will be able to learn human intelligence.