Skip to main content

The Rise of AI Will Actually Add to the Data Scientist's Plate

Sijie Guo
CEO
StreamNative

The data scientist was coined "the sexiest job of the 21st century" not that long ago. Harvard Business Review reported in 2012 that, "capitalizing on big data depends on hiring scarce data scientist." Fast forward to 2024, and we're in the era of generative Artificial Intelligence (AI) and large language models (LLMs) where one might assume that the role of data scientists would simplify or even diminish. Yet, the reality is quite the opposite. As AI becomes more prevalent across all industries, it's expanding the scope and responsibilities of data scientists, particularly in terms of building and managing real-time AI infrastructure.

Traditionally, data scientists focused primarily on analyzing existing datasets, deriving insights, and building predictive models. This included a unique skill set of communicating those findings to leaders within the organization and identifying strategic business recommendations based on their findings. Their toolbox typically included programming languages like Python and R, along with various statistical and machine learning (ML) techniques. The rise of AI is dramatically reshaping this landscape.

Today's data scientists are increasingly required to step beyond their traditional analytical roles. They're now tasked with designing and implementing the very infrastructure that powers AI systems. This shift is driven by the need for real-time data processing and analysis, which is critical for many AI applications.

Real-Time AI Infrastructure: A New Challenge

The demand for real-time AI capabilities is pushing data scientists to develop and manage infrastructure that can handle massive volumes of data in motion. This includes streaming data pipelines, edge computing, scalable cloud architecture, and data quality and governance. These new responsibilities require data scientists to expand their skill sets significantly; They now need to be well-versed in cloud technologies, distributed systems, and data engineering principles.

Organizations are increasingly recognizing the competitive advantage that real-time AI can provide. This is resulting in pressure on data science teams to deliver insights and predictions at unprecedented speeds. The ability to make split-second decisions based on current data is becoming crucial in many industries, from finance and healthcare to retail and manufacturing.

This shift towards real-time AI is not just about speed; it's about relevance and accuracy. By processing data as it's generated, organizations can respond to changes in their environment more quickly and make more informed decisions.

As data scientists take on these new challenges, they're no longer siloed in analytics departments, but instead are becoming integral to various aspects of business operations. This expansion of responsibilities includes:

1. Collaboration with IT and DevOps: Working closely with infrastructure teams to ensure AI systems are robust, scalable, and integrated with existing IT ecosystems.

2. Product Development: Embedding AI capabilities directly into products and services, requiring data scientists to work alongside product teams.

3. Ethical Considerations: Addressing the ethical implications of AI systems, including bias detection and mitigation in real-time environments.

The Emergence of DataOps Engineers

As the complexity of data ecosystems grows, a role has emerged to support data scientists: the DataOps Engineer. This role parallels the DevOps evolution in software development, focusing on creating and maintaining the infrastructure necessary for efficient data operations. DataOps Engineers bridge the gap between data engineering and data science, ensuring that data pipelines are robust, scalable, and capable of supporting advanced AI and analytics initiatives. Their emergence is a direct response to the increasing demands placed on data infrastructure by AI applications.

The rise of DataOps has significant implications for data scientists. In large enterprises, organizations with the resources to employ dedicated DataOps teams can significantly streamline their data pipelines. This allows data scientists to focus more on developing advanced models and extracting actionable insights, rather than getting bogged down in infrastructure management. Smaller companies, which may not have the budget for dedicated DataOps teams, often require data scientists to take on dual roles. This can naturally lead to bottlenecks, with data scientists dividing their time between infrastructure management and actual data analysis.

As a result of these changes, data scientists are now expected to have a broader skill set that includes proficiency in cloud infrastructure (AWS, Azure, GCP), an understanding of modern analytics tools, familiarity with data pipeline tools like Apache Spark and Hadoop, and knowledge of containerization and orchestration platforms like Kubernetes. While not all data scientists need to be experts in these areas, a basic understanding is becoming increasingly important for effective collaboration with DataOps teams and for navigating complex data ecosystems.

The Opportunity Ahead for Data Scientists

While AI is undoubtedly making certain aspects of data analysis more efficient, it's simultaneously expanding the role of data scientists in profound ways. The rise of AI is adding complexity to the data scientist's plate, requiring them to become architects of real-time AI infrastructure in addition to their traditional analytical roles.

This evolution presents both challenges and opportunities. Data scientists who can successfully navigate this changing landscape will be invaluable to their organizations, driving innovation and competitive advantage in the AI-driven future. The rise of AI isn't simplifying the role of data scientists — it's elevating it to new heights of importance and complexity, while also fostering the growth of supporting roles and teams.

Sijie Guo is CEO at StreamNative

Hot Topics

The Latest

In MEAN TIME TO INSIGHT Episode 24, Shamus McGillicuddy, VP of Research, Network Infrastructure and Operations, at EMA discusses network observability tool sprawl ... 

In cloud-native systems, scaling is often as simple as moving a slider. For on-premise databases, the stakes are different. Over-provisioning hardware is expensive. Under-provisioning leads to performance bottlenecks that are difficult to fix once the equipment is in the rack ...

When most people think about cybersecurity, they picture firewalls, encryption, and access controls — technical tools designed to protect systems and data. But beneath the technology lies a deeper set of principles about trust, decision-making, and resilience ... The best leaders don't eliminate risk. They manage it intelligently. And in many ways, cybersecurity offers a surprisingly useful playbook for doing exactly that ...

Many organizations assumed their infrastructure strategy was settled. It had been implemented, optimized and built into long-term plans. Recent changes in technology and vendor consolidation are forcing a second look. Cloud outages and licensing changes have exposed how much dependency exists on a small number of platforms. As a result, organizations are reevaluating whether those decisions still hold up under current conditions ...

Edge AI is strategically embedded in core IT and infrastructure spending across industries, according to the 2026 Edge AI Survey from ZEDEDA. The research shows that 83% of C-suite and IT executive respondents say edge AI is important to their core business strategy ...

As AI adoption accelerates, operational complexity — not model intelligence — is becoming the primary barrier to reliable AI at scale, according to the State of AI Engineering 2026 from Datadog ... The report highlights a compounding complexity challenge as AI systems scale ... Around 5% of AI model requests fail in production, with nearly 60% of those failures caused by capacity limits ...

For years, production operations teams have treated alert fatigue as a quality-of-life problem: something that makes on-call rotations miserable but isn't considered a direct contributor to outages. That framing doesn't capture how these systems fail, and we now have data to show why. More importantly, it's now clear alert fatigue is a symptom of a deeper issue: production systems have outgrown the current operational approaches ...

I was on a customer call last fall when an enterprise architect said something I haven't been able to shake. Her team had just spent four months trying to swap one AI vendor for another. The original plan said three weeks. "We didn't switch vendors," she told me. "We rebuilt half our integrations and discovered what we'd actually been depending on." Most enterprise leaders don't expect that to be the experience ...

Ask any senior SRE or platform engineer what keeps them up at night, and the answer probably isn't the monitoring tool — it's the data feeding it. The proliferation of APM, observability, and AIOps platforms has created a telemetry sprawl problem that most teams manage reactively rather than architect proactively. Metrics are going to one platform. Traces routed somewhere else. Logs duplicated across multiple backends because nobody wants to be caught without them when something breaks. Every redundant stream costs money ...

80% of respondents agree that the IT role is shifting from operators to orchestrators, according to the 2026 IT Trends Report: The Human Side of Autonomous IT from SolarWinds ...

The Rise of AI Will Actually Add to the Data Scientist's Plate

Sijie Guo
CEO
StreamNative

The data scientist was coined "the sexiest job of the 21st century" not that long ago. Harvard Business Review reported in 2012 that, "capitalizing on big data depends on hiring scarce data scientist." Fast forward to 2024, and we're in the era of generative Artificial Intelligence (AI) and large language models (LLMs) where one might assume that the role of data scientists would simplify or even diminish. Yet, the reality is quite the opposite. As AI becomes more prevalent across all industries, it's expanding the scope and responsibilities of data scientists, particularly in terms of building and managing real-time AI infrastructure.

Traditionally, data scientists focused primarily on analyzing existing datasets, deriving insights, and building predictive models. This included a unique skill set of communicating those findings to leaders within the organization and identifying strategic business recommendations based on their findings. Their toolbox typically included programming languages like Python and R, along with various statistical and machine learning (ML) techniques. The rise of AI is dramatically reshaping this landscape.

Today's data scientists are increasingly required to step beyond their traditional analytical roles. They're now tasked with designing and implementing the very infrastructure that powers AI systems. This shift is driven by the need for real-time data processing and analysis, which is critical for many AI applications.

Real-Time AI Infrastructure: A New Challenge

The demand for real-time AI capabilities is pushing data scientists to develop and manage infrastructure that can handle massive volumes of data in motion. This includes streaming data pipelines, edge computing, scalable cloud architecture, and data quality and governance. These new responsibilities require data scientists to expand their skill sets significantly; They now need to be well-versed in cloud technologies, distributed systems, and data engineering principles.

Organizations are increasingly recognizing the competitive advantage that real-time AI can provide. This is resulting in pressure on data science teams to deliver insights and predictions at unprecedented speeds. The ability to make split-second decisions based on current data is becoming crucial in many industries, from finance and healthcare to retail and manufacturing.

This shift towards real-time AI is not just about speed; it's about relevance and accuracy. By processing data as it's generated, organizations can respond to changes in their environment more quickly and make more informed decisions.

As data scientists take on these new challenges, they're no longer siloed in analytics departments, but instead are becoming integral to various aspects of business operations. This expansion of responsibilities includes:

1. Collaboration with IT and DevOps: Working closely with infrastructure teams to ensure AI systems are robust, scalable, and integrated with existing IT ecosystems.

2. Product Development: Embedding AI capabilities directly into products and services, requiring data scientists to work alongside product teams.

3. Ethical Considerations: Addressing the ethical implications of AI systems, including bias detection and mitigation in real-time environments.

The Emergence of DataOps Engineers

As the complexity of data ecosystems grows, a role has emerged to support data scientists: the DataOps Engineer. This role parallels the DevOps evolution in software development, focusing on creating and maintaining the infrastructure necessary for efficient data operations. DataOps Engineers bridge the gap between data engineering and data science, ensuring that data pipelines are robust, scalable, and capable of supporting advanced AI and analytics initiatives. Their emergence is a direct response to the increasing demands placed on data infrastructure by AI applications.

The rise of DataOps has significant implications for data scientists. In large enterprises, organizations with the resources to employ dedicated DataOps teams can significantly streamline their data pipelines. This allows data scientists to focus more on developing advanced models and extracting actionable insights, rather than getting bogged down in infrastructure management. Smaller companies, which may not have the budget for dedicated DataOps teams, often require data scientists to take on dual roles. This can naturally lead to bottlenecks, with data scientists dividing their time between infrastructure management and actual data analysis.

As a result of these changes, data scientists are now expected to have a broader skill set that includes proficiency in cloud infrastructure (AWS, Azure, GCP), an understanding of modern analytics tools, familiarity with data pipeline tools like Apache Spark and Hadoop, and knowledge of containerization and orchestration platforms like Kubernetes. While not all data scientists need to be experts in these areas, a basic understanding is becoming increasingly important for effective collaboration with DataOps teams and for navigating complex data ecosystems.

The Opportunity Ahead for Data Scientists

While AI is undoubtedly making certain aspects of data analysis more efficient, it's simultaneously expanding the role of data scientists in profound ways. The rise of AI is adding complexity to the data scientist's plate, requiring them to become architects of real-time AI infrastructure in addition to their traditional analytical roles.

This evolution presents both challenges and opportunities. Data scientists who can successfully navigate this changing landscape will be invaluable to their organizations, driving innovation and competitive advantage in the AI-driven future. The rise of AI isn't simplifying the role of data scientists — it's elevating it to new heights of importance and complexity, while also fostering the growth of supporting roles and teams.

Sijie Guo is CEO at StreamNative

Hot Topics

The Latest

In MEAN TIME TO INSIGHT Episode 24, Shamus McGillicuddy, VP of Research, Network Infrastructure and Operations, at EMA discusses network observability tool sprawl ... 

In cloud-native systems, scaling is often as simple as moving a slider. For on-premise databases, the stakes are different. Over-provisioning hardware is expensive. Under-provisioning leads to performance bottlenecks that are difficult to fix once the equipment is in the rack ...

When most people think about cybersecurity, they picture firewalls, encryption, and access controls — technical tools designed to protect systems and data. But beneath the technology lies a deeper set of principles about trust, decision-making, and resilience ... The best leaders don't eliminate risk. They manage it intelligently. And in many ways, cybersecurity offers a surprisingly useful playbook for doing exactly that ...

Many organizations assumed their infrastructure strategy was settled. It had been implemented, optimized and built into long-term plans. Recent changes in technology and vendor consolidation are forcing a second look. Cloud outages and licensing changes have exposed how much dependency exists on a small number of platforms. As a result, organizations are reevaluating whether those decisions still hold up under current conditions ...

Edge AI is strategically embedded in core IT and infrastructure spending across industries, according to the 2026 Edge AI Survey from ZEDEDA. The research shows that 83% of C-suite and IT executive respondents say edge AI is important to their core business strategy ...

As AI adoption accelerates, operational complexity — not model intelligence — is becoming the primary barrier to reliable AI at scale, according to the State of AI Engineering 2026 from Datadog ... The report highlights a compounding complexity challenge as AI systems scale ... Around 5% of AI model requests fail in production, with nearly 60% of those failures caused by capacity limits ...

For years, production operations teams have treated alert fatigue as a quality-of-life problem: something that makes on-call rotations miserable but isn't considered a direct contributor to outages. That framing doesn't capture how these systems fail, and we now have data to show why. More importantly, it's now clear alert fatigue is a symptom of a deeper issue: production systems have outgrown the current operational approaches ...

I was on a customer call last fall when an enterprise architect said something I haven't been able to shake. Her team had just spent four months trying to swap one AI vendor for another. The original plan said three weeks. "We didn't switch vendors," she told me. "We rebuilt half our integrations and discovered what we'd actually been depending on." Most enterprise leaders don't expect that to be the experience ...

Ask any senior SRE or platform engineer what keeps them up at night, and the answer probably isn't the monitoring tool — it's the data feeding it. The proliferation of APM, observability, and AIOps platforms has created a telemetry sprawl problem that most teams manage reactively rather than architect proactively. Metrics are going to one platform. Traces routed somewhere else. Logs duplicated across multiple backends because nobody wants to be caught without them when something breaks. Every redundant stream costs money ...

80% of respondents agree that the IT role is shifting from operators to orchestrators, according to the 2026 IT Trends Report: The Human Side of Autonomous IT from SolarWinds ...