Resources

How was the data generated? What are the original sources?

Our data reports the median salary of those working in the data science field in 2025. The data was reported by 6,165 individuals who worked either in artificial intelligence, machine learning, and data science. They submitted their full-time equivalent gross salary, before any deductions like social security, taxes, etc., and not including any benefits like equity or stock. The data was reported by individuals across the globe and across all different levels of experience. Those who reported their part-time salary had the information extrapolated to its full-time equivalent value. The original sources are self-reported salaries from anonymous individuals. The compensation figures were all normalized to equal the US dollar. The dataset was collected by ai-jobs.net, a platform dedicated to tracking salaries and job trends in the data and AI industries. Data collection occurs in two primary ways: individuals submit their salary information through the platform’s survey, and the platform aggregates salary data from job listings with publicly available compensation figures. The reporting period for this dataset covers January to June 2025. Job titles included in this dataset range from entry-level data analysts to senior machine learning engineers and data science managers, allowing a comprehensive view across career stages. While regional differences in salaries exist, all reported figures have been converted to USD for comparability, using the exchange rate at the time of submission. The dataset does not capture additional compensation such as bonuses, equity, or benefits, focusing solely on base salary to maintain consistency across reports. Respondents come from various industries, including technology, healthcare, finance, and academia, offering a broad perspective on how data science compensation compares across sectors.

Who funded the creation of the dataset? 

The Data Scientist Job Salaries data was aggregated by ai-jobs.net, a platform that hosts data science and AI job listings. The raw data was collected, compiled and shared as part of the organization’s mission to inform the industry. While this kind of crowdsourced data is useful for identifying general trends in salaries across roles, locations, and experience levels, it is important to acknowledge its limitations. They collect this information in two ways: anonymously from individuals all over the world via survey and gathering data from jobs with open salaries. Differences in job titles, company size, industry, and cost of living across regions can create inconsistencies in the dataset as well. For example, someone reporting a ‘Data Scientist’ salary in San Francisco might have a significantly different scope of responsibilities or benefits package than someone with the same title in a different country, making direct comparisons challenging. Their website does not mention any formal vetting process to verify the accuracy of salary survey responses. This kind of crowdsourced data is valuable for spotting broad patterns, such as the types of companies paying higher salaries, average compensation by region, or how salaries trend with years of experience, however, it is worth noting that these self-reported surveys can create inconsistencies. 

What information is left out of the spreadsheet?

The qualitative factors like job satisfaction, organizational culture, diversity, and individual career motivations, which are crucial for a comprehensive understanding of career experiences, are left out. Non-monetary benefits such as bonuses or stock options, as well as specific industry or company contexts, are also largely missing. The way these datasets categorize information creates an ideological effect by framing careers mainly in terms of economic value and market competition. Evaluating careers solely through numbers like salary and experience risks overlooking important qualitative aspects such as personal growth potential and job fulfillment. If these datasets were the only sources of information, our understanding of data science careers would be severely limited. Focusing only on salary and experience makes it difficult to grasp the complex realities and varied personal experiences within the field. Therefore, interpreting such data requires careful consideration of the qualitative context behind the numbers. Additionally, it’s important to recognize that reducing careers to salary data alone can shape how we view success in the field, pushing people to prioritize high-paying roles over positions that might offer better alignment with their interests, values, or long-term goals. For example, someone might choose a role in a smaller research-focused team with lower pay but greater opportunities for creativity and learning, which a dataset focused only on salaries would mark as “less desirable.” For a more complete and ethical understanding of data science careers, we need to interpret salary datasets as just one piece of a much larger puzzle that includes qualitative and personal dimensions of work.

Annotated Bibliography

This is a dataset made available to the public, which compiles global entries from various data science professionals from 2020 to 2025. It is a transparent reference for current career compensation trends in the tech industry and compiles anonymous self reported salary submissions. Every entry includes variables such as work year, experience level, employment type, job title, employee residence, company location, company size, salary in the individual’s currency, and salary converted to USD. It provides an enormous amount of data regarding our research topic and because it accounts for employees world wide, it is uniquely suited to analyze pay disparities and mobility patterns in data science careers. This data is the backbone of our research. It allowed us to test multiple variables and correlate that with salary levels, directly linking to our thesis. 

Anderson and Rainie compile expert forecasts about the long-term effects of digital technologies on democratic institutions. The resource presents qualitative survey data from hundreds of technologists, scholars, and futurists who warn that increasing reliance on AI, automation, and surveillance may widen inequality, erode civic trust, and concentrate power in unaccountable systems. Though not peer-reviewed, the report is significant due to the breadth of expertise and topical urgency. The work helps frame my thesis by drawing a direct connection between technological shifts and democratic backsliding, reinforcing the idea that inequality is not only economic but institutional and informational.

  • Bown, Jonathan. “Economics of Data Careers,” Kaggle, 2020. https://www.kaggle.com/code/jonbown/economics-of-data-careers.

Jonathan Bown analyzes the dataset from aijobs.net in order to focus on compensation patterns across job titles, experience levels, company sizes, and geographical locations. Bown argues that these higher salaries are attributed to seniority and specialization. Certain roles are consistent top earners due to their strategic value and technical complexities. He uses various data analysis visualizations, such as bar graphs, scatterplots, and matrices to depict how these variables interact with salary outcomes. This resource is vital because it gives us a benchmark on how industry professionals interpret real world compensation trends. Bown himself is a machine learning engineer, and his analysis supports our claim that salary differences are reflective of hierarchies based on employment structure.

  • Chen, Jiayong, Zhengbin Song, and Ching Hoi Lam. “Unraveling the Trajectory of Data Science Salaries in the United States: A Comprehensive Analysis from 2020 to 2023 with Future Salary Projections,” Guangdong University of Finance, Management of Information Systems Department, 2023.

This journal provides a deeper look on how salaries in data science have changed between 2020 and 2023, as well as provides predictions for the next few years. What this article argues is that salary differences are mainly driven by job title and experience level, and that these salary differences have been only growing. Chen, Song, and Lam provide breakdowns of each role and point out that higher roles not only earn more, but also experience faster salary growth. They also provide various graphs to back up their findings, such as a linear regression model, a bar graph, and histogram. This source is pivotal in understanding the more broad trends across jobs in data science, especially since some job titles are growing faster than others. For our project, this source is vital as it gives concrete evidence that salary varies based on hierarchy in the workplace and how that hierarchical structure is changing as the field grows. 

  • Corak, Miles. “‘Inequality is the Root of Social Evil,’ or Maybe Not?: Two Stories about Inequality and Public Policy,” Canadian Public Policy / Analyse de Politiques 42, no. 4 (2016): 367–414.

Corak explores two dominant narratives about inequality—one framing it as a fundamental threat to social cohesion and opportunity, the other treating it as a secondary issue if absolute poverty is reduced and growth maintained. He analyzes how these narratives shape public policy debates and attitudes toward redistribution, social mobility, and labor market reforms. Corak uses empirical data on income distribution, intergenerational mobility, and education access across countries to highlight how inequality can either reinforce disadvantage or be seen as a byproduct of a dynamic economy. The article is important for distinguishing between moral and utilitarian policy justifications, allowing a more precise critique of policy intent and outcome. For my thesis on structural inequality and social mobility, Corak offers a clear map of ideological divides in policy thinking and underscores how the framing of inequality directly affects proposed interventions.

  • De Fraja, Gianni, Jesse Matheson, Paul Mizen, James Rockey, Shivani Taneja, and Gregory Thwaites. “Remote Work and Compensation Inequality,” SSRN, 2024. https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4962603.

The paper argues that while remote work (RW) is a valuable benefit highly concentrated among higher earners, it does not increase overall compensation inequality due to general equilibrium effects, including wage adjustments that offset the in-kind RW benefits. It uses novel UK survey data (SWAA-UK) and Labour Force Survey wage data, alongside a conceptual general equilibrium model, to analyze wage growth and inequality before and after the pandemic. The resource is important because it challenges the commonly held view that RW exacerbates labor market inequality, providing a nuanced approach that includes workers’ valuation of RW and complementary labor market effects. Specifically for your thesis, it offers empirical evidence and theoretical modeling to show that RW’s rise leads to higher total compensation without increasing inequality, highlighting the interplay between in-kind benefits and wage adjustments in shaping labor market outcomes.

  • Einav, Liran, and Jonathan Levin. The Data Revolution and Economic Analysis. 2014. www.journals.uchicago.edu/doi/full/10.1086/674019. 

The resource argues that big data will fundamentally transform economic research and policy by enabling more precise measurement, new research designs, and personalized policy instruments. It uses examples from private sector practices, academic studies, and technological advancements to illustrate the potential and challenges of harnessing large datasets. This resource is important because it provides a comprehensive overview of how data revolution is shaping economics, highlighting both opportunities and obstacles. For your thesis, the resource specifically offers insights into the evolving methodologies and data management strategies necessary for modern economic analysis, illustrating how big data can lead to more effective and targeted policy interventions.

  • Landon-Murray, Michael. “Big Data and Intelligence: Applications, Human Capital, and Education,” Journal of Strategic Security 9, no. 2 (2016): 92–121. http://www.jstor.org/stable/26466778.

The resource argues that data science has grown to become a significant application to the United States Intelligence Federal sector. The authors suggest that such applications could extend beyond traditional topics and cover more broad areas which raises the demand for well-trained data scientists. The article references various studies and expert opinions, one of which was the McKinsey Global Institute’s observation of there being a shortage of skilled individuals in big data analytics most likely due to the evolving application across different industries. The findings of this resource unveils the critical role that data science plays and highlights the need for a specialized workforce to navigate the challenges of the modern technological world. Although demand for data science careers are high, salaries may be low in some industries since the private sector drives talent away and messes with salary competitiveness.  

  • Lewis, Alfred. “CAREER OUTLOOK: TECH CAREERS,” Hispanic Engineer and Information Technology 33, no. 2 (2018): 27–35. https://www.jstor.org/stable/26573744.

The resource argues that technology careers are critical due to the increasing importance of data security and the evolution of technology. The article points out to many sources including Forbes and Business Insider to showcase the most in demand tech careers and their average salaries. The findings of the resource provides insight for individuals on a career search by displaying that tech jobs have been spreading in various industries which have led to job growth and competitive salaries, this paints an attractive picture for recent graduates. The resource explains how the growing demand for data science roles is positively affecting salaries and number of jobs, which is reshaping the job market and salary structures. 

  • McCoy, Frank. “Career Outlook: Spotlight on Information Technology,” Hispanic Engineer and Information Technology 30, no. 1 (2015): 31–44. http://www.jstor.org/stable/43757403.

This resource argues that there is a significant need for skilled professionals in IT jobs due to the rising digital dependencies across multiple sectors. The author pulls from industry leaders and research firms to provide insight into salary ranges for IT roles, which he aims to attract STEM graduates to data careers with indications of financial benefits to this growing industry. These findings show that the critical role of IT professionals is underscored and can pose as a great opportunity for career seekers. The author emphasizes that as technology continues to advance so will the demand for these skilled workers, which makes it a favorable career choice to get into.  

  • Parsa, Ali, and Amin Sadr. “An Exploration of Data Scientist Salaries,” Asian Review of Computer and Data Analysis 8, no. 3 (2022): 15–29. https://www.akademiabaru.com/submit/index.php/arca/article/download/5282/4110/26

This source examines how salaries vary among data scientists depending on their employment type, whether it be: full time, part time, freelance, or contract. Parsa and Sadr argue that the compensation of these jobs are intertwined with not only job responsibilities and skills, but also affected by employment structure. The authors use salary data collected from different public databases, as well as Kaggle, and compare them using statistical models to depict any disparities. This source sheds light on the role that employment type plays in reinforcing hierarchies in the workplace in the data science field. It also provides how differences in employment status translates into pay gaps which is important to understand the structural inequality and shifting norms within the digital workforce. 

  • Quan, Tee Zhen, and Mafas Raheem. “Human Resource Analytics on Data Science Employment Based on Specialized Skill Sets with Salary Prediction,” International Journal of Data Science, 20 May 2023, ijods.org/index.php/ds/article/view/64.

The resource argues that specialized technical and soft skills significantly influence employment opportunities and salary levels in the data science field. It uses empirical data from job listings, statistical analyses, and salary prediction models as evidence to support its claims. This resource is important because it provides detailed insights into the key skills that drive higher salaries and better job prospects, which are essential for understanding workforce demands in data science. Specifically, for your thesis, it offers a framework for analyzing how specific skills impact salary levels and employment trends, enabling the development of targeted HR analytics and predictive models in the data science domain.

  • Sommer, Teresa Eckrich, et al. “A Two-Generation Human Capital Approach to Anti-Poverty Policy,” RSF: The Russell Sage Foundation Journal of the Social Sciences 4, no. 3 (2018): 118–143.

Sommer and colleagues propose a policy model addressing poverty through simultaneous investment in parents and children. Drawing from longitudinal program evaluations and policy pilot data, the authors show how integrated services—like early childhood education, job training, and mental health support—can disrupt intergenerational poverty cycles. Their focus is not just on individual behavior but on structural limitations that inhibit mobility. This article matters for my thesis by offering a framework that targets root causes of persistent inequality through system-wide coordination. It reinforces the argument that solutions must operate at both familial and institutional levels, and not treat economic hardship in isolation. Sommer et al. argue that traditional anti-poverty programs often fall short because they treat parents and children as separate policy targets, ignoring the compounded effects of disadvantage across generations. By designing policies that integrate early childhood education with workforce development, parenting support, and health services, they demonstrate that dual investment creates measurable gains in both child development outcomes and parental economic stability. The authors present data from the CareerAdvance program in Tulsa, Oklahoma, showing improvements in parental employment and children’s school readiness, suggesting the model’s scalability. They also stress the importance of program delivery through trusted institutions and community partnerships to build long-term engagement. This work is crucial for understanding how structural inequality embeds itself across family units, and it challenges narrow, individualistic policy approaches. For my thesis, this article provides a practical example of systemic intervention and reinforces the claim that effective solutions must simultaneously address institutional barriers and household-level constraints.

  • Wu, Liman, and Waruni Hewage. “Investigating Equity in Remote Salaries in the Data Science Field Using Data Analysis Techniques,” Otago Polytechnic Auckland International Campus, 2024.

Wu and Hewage explore the effect that remote data science roles have on one’s salary and they specifically focus on how experience level and employment type impacts pay. They argue that while remote work has provided more flexibility within the field, it hasn’t necessarily resulted in more fair compensation. They use data that has been gathered from remote job listings and professional salary surveys, and dissect it by years of experience the employee has, the types of contract, and the role itself. They compare salaries across the different groups through visualizations like bar charts and scatterplots. This source challenges the idea that remote work results in a more equitable work place, and shows how these hierarchies still persist. Not only that, but it also supports the claim that the shift to remote work has made salary inequalities more difficult to identify.