We look at recent survey findings from Anaconda that highlight the growing need for data science and some of the problems within the industry that need to be solved.
As part of Data Science Week here at Siliconrepublic.com, we’re taking a look at the findings of Anaconda’s latest investigation into the sector, The State of Data Science 2020: Moving from Hype Toward Maturity.
Published at the end of June, Anaconda’s annual survey examines how data science is maturing in commercial environments and looked at how academic institutions are preparing the next generation of data scientists.
The survey was open from February to April 2020, with 2,360 participants from 100 countries providing responses. Among the participants were students, academics and people working in commercial environments. Each cohort was asked some unique questions, while other questions were presented universally.
Almost half (49pc) of survey respondents fell into the millennial age cohort, while 9pc were classified as Gen Z. Another 28pc were aged between 39 and 54, while the remaining 14pc were over the age of 55.
Data science in a commercial setting
Of the survey’s respondents, 59pc work in commercial environments. One in five of these people work in a variety of departments, but 28pc are stationed in a centralised data team.
Anaconda said: “As data science continues its ascent to a strategic discipline in many organisations, we expect larger organisations to establish a data science centre of excellence to maximise the business impact from data science and provide professionals an opportunity to cross-train in various departments.”
The survey found that organisations with more than 10,000 employees were most likely to have deployed this model already.
Anaconda’s report noted that the data scientist of today is often a jack of all trades, with a good handle on all components that play a role in their analysis and work, from mathematics and modelling to data preparation, visualisation, model training and a degree of DevOps knowledge.
The survey found that Python is the most popular language among respondents, while Javascript and C++ are less utilised but still fairly common.
According to the survey, 75pc of the surveyed data scientists use Python as their primary language at work, making it an almost essential skill for anybody considering working in the field.
Managing security in the field
Anaconda quizzed respondents on the inherent security management challenges that arise in data science.
While many companies are now working with open source software, which allows contributors and maintainers to catch and patch vulnerabilities, security issues remain a “fact of life” that will always consume resources.
Across Anaconda’s sample, people in different roles had different attitudes to open source software and security. Respondents who cited their profession as professor, instructor or research held the lowest levels of concern about open source vulnerability management.
In the report, Anaconda wrote: “On the one hand, this may be because this respondent set is closest to efforts to correct vulnerabilities in open source tools. On the other, it may reflect a gap in university data science curricula, in which students do not gain sufficient understanding about security and vulnerability management to prepare them for commercial environments.”
The cohorts that reported the highest levels of concern about managing security vulnerabilities were system administrators and line of business (LOB) managers. The survey found that, to system administrators, meeting security standards can pose as a key production roadblock.
Job satisfaction
According to Anaconda’s research, data professionals in research and development organisations report the longest planned tenure with their current employers, followed by those working in an LOB.
In contrast, data professionals working in IT organisations report frustrations in demonstrating their business impacts and only 34pc of those surveyed plan a lengthy tenure with their current employers.
Across all different departments, Anaconda said that there is a potential for a high rate of employee churn within the first two years of work.
The organisation said: “Given the well-understood talent shortage in this profession and the need for data scientists to develop a strong understanding of the environments in which they work to add value, organisations should identify and invest in high-impact programmes to drive retention among data professionals.”
Entering the sector from college
The survey found that there are gaps between what enterprises are seeking in data scientists and what higher education institutions are teaching students.
Two of the most frequently cited skills gaps among respondents – big data management (38pc of respondents) and engineering skills (26pc) – do not rank in the top 10 skills offered in university programmes.
The top five skills learned by students are Python, machine learning (ML), data viz, probability and statistics, and deep learning; while enterprises are lacking big data management, advanced mathematics, deep learning, engineering skills and ML.
Most students (40pc) surveyed believe that the biggest obstacle to obtaining their dream job within the field of data science is experience. A further 26pc believe that the biggest obstacle is technical skills, while 18pc believe it is soft skills. Only 7pc said that finding a job that provides a sense of purpose is an obstacle to obtaining their dream job in data science.
Anaconda suggested that strong internship and practicum programmes could address these gaps and recommended that universities go beyond providing résumé enhancement and hands-on-keyboard technical skills.
Anaconda wrote: “Good internships also prepare students for the nuanced challenges faced by a data professional in an enterprise: serving as a ‘data translator’, demonstrating business impact from their work, and influencing colleagues cross-functionally to address production roadblocks and secure access to resources.”
Concerns within the industry
Anaconda asked respondents to name the biggest problems in artificial intelligence (AI) and ML that need to be tackled urgently.
The top five problems listed were the social impacts from bias in data and models; impacts to individual privacy; advanced information warfare; a reduction in job opportunities caused by automation; and lack of diversity and inclusion in the profession.
Anaconda said: “Important and complex questions of ethics, responsibility and fairness should be on the minds of every data scientist, business leader and academic. There are no simple answers to these questions; rather, their consideration should be a constant threat informing data science work.”
Anaconda recommended that enterprises treat ethics, explainability and fairness as strategic risk vectors and treat them with commensurate attention and care. Despite all of this, only 15pc of instructors that responded to the survey are teaching AI and ML ethics to students, and only 18pc of students said that they are learning AI and ML ethics.
Only 15pc of respondents said that their organisation has implemented a fairness solution and only 19pc said that they have an explainability solution in place. Of the organisations surveyed, 35pc plan to implement explainability tools, while just 23pc plan to implement fairness tools.
How the future looks
Anaconda said that the journey to maturity is an ongoing process for the data science discipline, and that within the next three years the discipline will continue its trajectory towards becoming a strategic business function across a wider range of industries. The organisation said that continued growing pains are to be expected.
“With the new-found prominence of epidemiology and other data sciences in the wake of the Covid-19 pandemic, and the use of data analysis and visualisation in studies of racial injustice and police violence, the value of data analysis has become clear to a wider audience than ever before,” says the Anaconda report.
“This may continue to raise the profile of the discipline and its importance in a wide range of industries.”
In the conclusion of the report, Anaconda suggested that data scientists could challenge existing security processes with demand for innovative tools and by using open-source libraries more, as developers did in the past. The report recommends that organisations take a proactive approach to support the integration of open source technologies.
The report also recommends that employers look beyond compensation to design holistic talent retention strategies that are focused on helping employees gain experience articulating the value of their work, while providing opportunities to continue to grow their skills.
Anaconda also concluded: “Of all the trends we identified in our study, we find the slow progress to address bias and fairness, and to make machine learning explainable the most concerning. While these two issues are distinct, they are interrelated and pose important questions for society, industry and academia.”