Regularly, I receive questions from candidates interested in doing data science for consumer Internet companies asking me: “What do I need to get hired?”
I stay regularly in touch with other Heads and Leads in Data Science. This allows me to have a good overview of the industry and their needs. In addition, I have led teams in Asia and Europe tackling data science problems.
For our teams, we are constantly looking for top people willing to join our ranks. Moreover, I receive more than 5 resumes every day applying for a data science position and I interviewed several hundreds of them.
Therefore, in this post I will outline a summary of different opinions from people in the industry. This should be helpful for candidates trying to break into the field. I will touch the following points:
- What do we seek in a data scientist?
- What are we expecting during the interview?
- Tips and pitfalls for any candidate?
Remember that this is the first impression and determines, if you will be invited to the interview. Always do:
- 1 page CV: Even if you have 2 PhDs and 20 years experience, one page is sufficient.
- 1 page cover letter: Tell me briefly, who are you? Why should I interview you? What would you bring to the table? What do you expect?
- Links to your portfolio (GitHub): I will highlight this. You must have a portfolio of software, if you want to be a data scientist. This is so important that it deserves its own point.
Everybody can apply, data science is a very diverse field. With that said, it is true that individuals with a quantitative background (MINT) have extra points in the eyes of the recruiter. Therefore, if you have a business or social sciences degree, highlight your quantitative abilities and software skills (courses you took, internships, jobs and your portfolio).
Attention PhDs: There is indeed a bias that you might be too academic and with the wrong mindset for industry. Dispel this through internships, competitions or projects of your own. Most likely, you will make it to the interview. Therefore, be prepared to show your hands-on mindset there (more below).
A note on MOOCs and data science boot camps, etc.: They are fine, we all need to get exposure to new concepts; I am constantly taking 1-2 MOOCs to learn new things and I like them. Boot camps and academies also bring valuable experience and help you build a portfolio (see below). Go ahead and mention all this in your cv. With that said, do not expect that anyone will hire you based on this. They are just the cherry of the cake, but nothing more.
This is crucial. We would prefer to hire a candidate with just a bachelor degree from a non-mint program with a great portfolio than someone from a mint program without hands-on experience of any type. In data science, practical projects can help overlook many other flaws such as low grades, wrong degree, wrong school, etc.
Remember that in this field there is a lot of coding involved, at least in our teams. We do data products that are used by different stakeholders and that means production-ready code. Most of my teams prefer to use python or Scala. Other teams are looking for R, Julia or Clojure skills. Therefore, a GitHub account helps verify the following:
- That you can actually code.
- That your code is clean.
- That your code is idiomatic.
- That you are familiar with the scientific stack of the language in question.
- That you are a curious and/or interesting person seeking answers to interesting things.
- If you have had exposure to machine learning / statistical learning, statistics in general.
- If you are more of a software engineer or more of a data analyst.
Instead of going to summer school or interning in consulting / banking or anything else that will only help you marginally, invest the summer either a) working on pet projects or b) doing data science at a company.
Normally, the first contact with a candidate occurs through Skype. Here we touch multiple topics such as the team, the company and our challenges. We also expect to hear a candidate’s story and knowledge. If you make it this far, it means that you are already better than 50% of all the other applicants, those are great news. Nevertheless, do not ruin this and be prepared, some words of advice:
- Be able to tell a story, a data scientist is a storyteller, if you bore me, then probably you are not good for this job.
- Be brief and concise. The first interview lasts 45 minutes to one hour and often even less than that. Therefore, you have 10 minutes to tell your story.
- Brush on your statistics, ML and general CS and coding skills. You have to be prepared to answer questions (more below).
- Statistics: You cannot argue that you want to be / are a data scientist if you are not able to answer general questions on probability and inference. This is essential and can destroy all your credibility.
- Machine Learning: What has been your exposure to the field? Is there anything that you specially like? Can you explain it in layman terms? What are the pros and cons?
- Computer Science: You must have knowledge on algorithms and data structures. Topics related to Big O Notation are relevant. Be prepared to be able to tell this in very simple terms.
- Coding skills: You should be familiar with the theoretical concepts behind the programming language of the company. It is impressive, how many people claim to be python developers, but cannot explain what duck typing means (major no-no if you fail here) or how mutability and references in python work.
- Working in teams: Do you work with version control? What do you use? How are your development cycles? Any agile methodology? What has been your exposure to working in a larger setting? All these topics are covered on a shallow level. But if you are not capable of answering the basics, then this will give you a lot of negative points.
After the interview, you will be asked to answer a technical test covering different topics. This will help you to get a feeling on the type of problems being solved. Here you can make a rebound and shine. We like candidates who put the effort and take some time to answer this. On an objective level, it helps us confirm their coding, problem-solving and technical abilities. On a subjective one, we can see their resilience, their attention to detail and interest in working with us.
After this, we usually take some time to review the submitted material and based on its quality, you will have interviews with other team members. We, in the data science industry, are very democratic and want to work with people that we like on a personal level. Other members of the team will ask similar questions related to the topics discussed above. In the end, a successful hire went through 3-4 interviews (phone and on-site) and a technical test. We try to be fast with our decision-making process and are always honest with the candidates.
Last words of advice
Do not underestimate the interviews. You only have one chance to impress and it is very easy to ruin it. It is not the end of the world, if you did not succeed the first time. You can always apply later in the future. Do not be a jack-of-all-trades, master of none. The key is to develop a core expertise. It is true that data science is a multidisciplinary field, but you should have a specialty that helps overlook other flaws. For example, if you are an impressive python backend developer without exposure to machine learning, it is fine. Be curious and get exposure to other “non data science” topics, for example web development. It is always great to have somebody in the team, who can do front end development (for dashboards and web apps for example).