Please tell us about your internship
Keerti Agrawal interned this summer at Spotify, the music-streaming behemoth.
The company gave her an important task: To design a machine-learning model that forecasts user growth by tracking streaming habits. The streaming data was immense — around 100 million users — and her model had to predict the retention rate of monthly users.
It was hard work, but Agrawal enjoyed the internship. She worked at Spotify’s downtown office and had a view of the 9/11 Memorial. The culture in the office was casual and youthful — couches rather than cubicles, t-shirts rather than shirt collars — and the kitchen brimmed with delicious free food. On Friday afternoons, independent rock bands played in the conference room.
“Working at Spotify provided me an amazing opportunity to learn with some of the most talented people in the world,” says Agrawal. “I found my work to be meaningful and impactful. Most of the motivation for my work came from highly supportive and energetic work culture at Spotify.”
How did you end up in such an offbeat, unconventional and interesting career?
Agrawal, a bright student with impressive academic and work credentials, is in her last semester in the data-science master’s program. She graduated from the Indian Institute of Technology, Roorkee, with a degree in applied math. And before enrolling in the master’s program in Data Science at Columbia University, she had internships at IBM and Deutsche Bank and worked as a business operations analyst for Groupon. She was also a product analyst for a startup in Mumbai called Housing.com.
In this interview, Agrawal talks about what she learned at Spotify and how the work she did there relates to what she’s learning in her data-science classes.
Can you discuss the project you did for Spotify?
I was a data-science intern on the Finance Analytics team. My project was to formulate and implement a probabilistic model using streaming behavior to forecast user growth. I focused on forecasting growth for future two months across 60 markets. The model provided a probability measure to see if the user is going to be streaming on the 60th day from today. The model also needed to update probabilities by incorporating real-time changes in user streaming behavior.
And how did you do that?
In simple terms, consider the users who have played a song with Spotify for at least 30 seconds in the last 30 days from today (monthly-active users). Out of those, we needed to know how many would drop out over time, and know the number of users retained with Spotify after 60 days. To do this, we looked at each user’s recent streaming behavior to predict his or her retention after 60 days.
How’d you deal with the deluge of data?
I had the data of around 100 million users — their music streaming activity behavior on the daily level — along with their geographical information. The idea was to use features such as days since a user has last streamed a song with Spotify; how many times he streamed in last one week or before that; and which market segment the user belongs to. We then used the data to formulate a probabilistic model to find the chances of him or her streaming a song again.
What form was the data in?
BigQuery tables and columns with detailed user info such as country, registration date and activity history on Spotify. The way it works is that a Spotify user plays a song and that becomes data tracked by user ID. Another column will show us the time and the date the user listened to the song. We track active users and know when users become non-active.
What programs did you use to analyze the data?
My machine-learning model was built in Scala and run on Spark cluster. I had never used them before so I had to learn from scratch. My model returned good results and it can be made more robust by adding a few other features.
Was the internship fun?
Yes, it was really fun. The awesome part was that my office at 4 World Trade Center had the great view of New York’s financial district. On Fridays after work, Spotify brought in indie bands who were amazing artists. The kitchen was always filled with Mexican, Thai, Japanese food, etc.
Did your work for Spotify relate to your classes at DSI?
Since my project was related to statistics and machine learning, I realized that the classes I had taken in statistical inference, machine learning, applied machine learning and computer systems were extremely practical for use in industry. There was so much overlap between what I learned in my classes and the work I did in my internship.
How did your internship end?
I presented my project to all the teams at Spotify during an intern demo fair. They loved my project and said I produced something important for them. It was great to get such positive feedback. And will definitely consider working full time for Spotify.
Why did you enroll in the data-science master’s program?
I was working in Mumbai for the startup Housing.com. I was doing data analysis and I quite enjoyed it. I studied applied math, and data science requires strong math skills. I wanted do a master’s in data science, so I researched programs and thought Columbia had the best one for me.
Now that you are finishing your degree are you happy with your decision to study here?
Yes. As I mentioned my work at Spotify made me realize how much I’ve learned in my classes. Also, being in New York gives me exposure to attend many conferences, meetups and hackathons related to data science. And during this semester, my last, I’ll do a capstone project with an industry partner. I’m really looking forward to that.
What are your plans after you graduate in December?
The internship at Spotify solidified my interest in data science. So I’d like to work in the data science field for a top tech company. I’d want to get more exposure to America’s tech culture and learn from some of the smartest people in the world.