So you want to build a data science team?

Internet companies looking to start a data science team in the area of recommendation systems, discovery, advertisement technology, product or analytics often get overwhelmed with the challenges and specific characteristics of hiring, building and growing a team. They can become confused by all the terms, praises and buzzwords around certain technologies, algorithms and skills. Then, starting a team of this kind is not the same as it is with an average software development team. Profiles are more specific, terminology is more exotic and there is little consensus on the market regarding best practices and state of the art.

One major International retailer approached me recently for advice on how to build an in-house team from scratch for their E-commerce team and I would like to share with you the elements that I consider every company should clarify before getting started in this endeavour.

In this first part I will touch the topics: Accountability, resources and team composition

Accountability

It should be very clear from the beginning for everyone where in the organisational chart the team will be located and the main stakeholders will be. There are multiple approaches that can be used. Some organisations put the data science team under the CTO, others under the CFO or even the CMO, others prefer a federated system with specialists distributed across departments and supervised by a project manager, while others go for the R&D route where the team does not have a specific agenda or stakeholder and has an open hand to decide. This depends on the company organisation, culture and resources, as well as the team’s mission. The risk of not deciding this from the beginning can lead to confusion in the daily activities of the team. As it is a sexy topic, more than one person in management would be happy or is expecting to have the team under her command. These expectations can lead to friction and confusion that can seriously affect the performance of the newly formed team.

Resources

Anyone familiar with the current state of the job market must be aware that technical talent in this area does not come at a low price, yet it is surprising how budgets are not properly planed. For an Internet company with 300 or more employees trying to create a centralised team with a specific mission (e.g. recommendation engines, customer reactivation, etc.) a good first start is a team of 5 to 8 people, where 1 is the technical project manager, 1-2 are the hardcore data scientists responsible for modelling and 3-5 are the data engineers deploying the production code. Over time, teams can become larger and similar teams with different missions can surge. Therefore, a quality team represents a significant commitment and this should be clear for every stakeholder.

Team Composition

After determining the resources available and the expected team size, the next big topic is who to hire. For the regular HR department this becomes very quickly an impossible task. Very fast mailboxes are flooded with CVs containing all types of exotic qualifications and never-heard before terms. Here it is also very easy to be influenced by media or technology vendors. Hence, it should be defined which hard skills and technologies are relevant, if education weights more than experience, if big names in CV carry an extra weight, if it is really necessary to hire super senior engineers or long-experienced post-docs. This is easier said than done as in the seed stage of the team there are still many unanswered questions. Therefore, here my advice is to start with solid basics and not look for the über-exotic. Then, the objective in the first year or two of the existence of the team is to lay a foundation and justify the existence of the team through quick gains and low hanging fruits.

Taking the above example of 8 individuals and considering that the company might not be able to compete with the Googles and Facebooks of this world in prestige, remuneration and perks, a good initial composition can look as following:

Technical Project Manager

The person has 3 to 5 years experience managing similar teams dealing with quantitative subjects. Preferably, this person has a solid technical background and although she is not expected to code, she is capable of doing it. This person not only has the skills expected in a project manager, but has also an understanding of the algorithms and techniques used by the team and great if she can also do code reviews.

Data Scientist

Someone with a solid quantitative background. Ideally, she holds a PhD degree in the fields of Physics, Mathematics, Computer Science, Biology or associate disciplines. This person should be judged by the quality of her research, where she has published, and what she has contributed. It is entirely possible to be an expert in Machine Learning and be really bad in software development. Hence, it is very important to not assume anything and double-check her coding skills. Unless you want to develop a more academic R&D team, somebody who cannot code will not be very helpful, especially in the early days of the team. Additionally, it is important to verify how hands-on is the individual, as candidates from academia sometimes have wrong expectations of what industry expects from them.

Data Engineer

This person does not need to be very academic. She can be a solid software developer with an interest in quantitative topics. This person must have a very solid understanding of algorithms, data structures and software engineering in general. Double-check the algorithms part (especially computational complexity), as many engineers have a poor understanding of the subject. Yet, it is essential for every robust data team. Overall, her code must be excellent. Try to look for individuals who actively contribute to open source projects. Ideally, this person uses the same technology stack as your data scientists (e.g. Python, Scala, etc).

Seniority for each of these positions depends on the company and budget. However, I do not recommend hiring in the beginning very senior individuals. Often, they have very specific expectations, but in the early days of the team, the scope and nature can change dramatically. In addition, data teams have to create their own platforms in the beginning, as the data they need might not be there or not in the formats that they want it. This means doing non-glamorous tasks and getting dirty. Therefore, it is preferable to have ambitious and adaptable individuals, even if they might not be very experienced.

These are some points that are worth considering. Yet, the list is not comprehensive, in future articles I will address following topics:

  • Technology Stack
  • Recruiting
  • External Support
  • First MVP
  • Long-term Perspective

Which are the topics that you consider relevant?