What is data science and who is a data scientist?
By definition, data science is a multi-disciplinary field that uses scientific methods, processes, algorithms and systems to extract knowledge and insights from raw or unstructured data. Data science is the same as data-mining and big data, two of the other in-vogue words over the last decade.
Based on the applications of data science, the type of people to hire as data scientists, and the output they deliver, data science in business corporations can broadly be categorized into one, providing decision making insights for humans and two, designing Systems which help make decisions -
1. Decision making insights for humans
This is an application of data science which some would argue has been in existence for a little while. The deliverables of this is to provide insights to executives to make business, product, strategy or marketing decisions. The data scientists who specialize in this field can be called Decision Scientists.
These data scientists define, and implement metrics, run experiments, create dashboards, draw causal inferences, and generate recommendations from these. They draw conclusions from data in order to make decisions like which design layout on an eCommerce page will lead to more sales, what type of sales leads will result in more closure, what type of employee profile will lead to better productivity etc.
2. Designing Systems which help make decisions
This is an application of data science where the deliverables are models, algorithms which are consumed by computers. The data scientists who specialize in this field can be called as Modeling Scientists.
These data scientists design models, training data sets and algorithms which are consumed by computers to give a desired output. Modeling scientists most often work along with software professionals to design, deploy, implement and scale the models they have created. Examples of such models are a recommendation system on an eCommerce website, which tells you which book to read next, optimizing a digital marketing campaign for more clicks etc.
Data Operations in an organization
Organizations which deal with considerable amount of data need special personnel, systems, hardware and team structures to get the most out of their data operations. Data Operations can be broadly divided into below five streams based on the expertise of the personnel handling them and the output they produce -
Data infrastructure: data ingestion, availability, operations, access, and running environments to support workflows of data scientists, e.g. running Kafka and a Hadoop cluster.
Data engineering: determination of data schemas needed to support measurement and modeling needs, and data cleansing, aggregation, ETL, dataset management.
Data quality and data governance: tools, processes, guidelines to ensure data is correct, gated and monitored, documented, standardized. This includes tools for data lineage and data security.
Data analytics engineering: enabling data scientists focused on analytics to scale via analytics applications for internal use, e.g. analytics software libraries, productizing workflows, and analytic microservices.
Data-product product manager: creating products for internal customers to use within their workflow, to enable incorporation of measurement created by data scientists. Examples include: a portal to read out results of A/B tests, a failure analysis tool, or a dashboard that enables self-serve data and root cause diagnosing of changes to metrics or model performance.
Which type of data scientist to hire depends on the stage at which the data science function is in the organization. Larger organizations with established data teams go for data scientists with niche skillsets while smaller organizations which are just adapting data science as a way of achieving competitive advantage need to hire full-stack data scientists who can span as many as the above streams as possible.
As for organizational structure, it’s important that the data science function reports to someone who understands the importance of it and can be invested in it for long term. Most often this is the CEO of the company or an important decision-maker like the CMO. For individual contributor teams, a hybrid model can also be implemented with data scientists working in cross-functional teams but have a centralized team reporting.
In conclusion, data science answers key business questions by translating data into answers. But to get the most out of data science it is important to hire the right people, set up efficient operations, provide infrastructure and establish a structure to integrate the insights into business decisions.