Independent Researcher, USA.
World Journal of Advanced Engineering Technology and Sciences, 2025, 14(01), 241-253
Article DOI: 10.30574/wjaets.2025.14.1.0020
Received on 16 December 2024; revised on 23 January 2025; accepted on 26 January 2025
The combination of Data Engineering and MLOps has become the foundation practices for constructing efficient and secure ML processes. While Data Engineering provides the necessary solutions for handling data in terms of ingestion, transformation, and storage, MLOps delivers the solutions to handling models in terms of deployment, monitoring, and management. Together, these fields help handle the increasing challenges of handling massive amounts of data and training and deploying an ML model for real-time use. This paper discusses the possibilities and trends of integrating data engineering and MLOps, seeking architectural patterns and toolchains mostly seen in optimizing machine learning pipelines. Key issues addressed include data management problems where the tool is limited in functionalities for data processing; workflow slowdown or interruption in automated CI/CD pipelines; and data use licenses where there are disputable ethical issues of data utilization and data fairness. Non-trivial techniques that enable a scalable and robust application architecture, including pipeline design, service redundancy, and automatic coordination, are discussed, along with their example applications. Novel approaches to MLOps are described in terms of serverless architectures, federated learning, and AI toolkits for managing pipelines, and they are presented to demonstrate some future developments. As a synthesis of current literature and best practices in the field of ML, this paper offers practical advice on constructing resilient, high-performing systems. Hopefully, this work will provide the existing literature on machine learning with further development and a best practice guide for organizations to acquire operational effectiveness and advancement into this new era of data-based decision-making.
Data Engineering; MLOps; Machine Learning Pipelines; Scalability and Resilience; AI-driven Automation
Preview Article PDF
Souratn Jain and Jyotipriya Das. Integrating data engineering and MLOps for scalable and resilient machine learning pipelines: frameworks, challenges, and future trends. World Journal of Advanced Engineering Technology and Sciences, 2025, 14(01), 241-253. Article DOI: https://doi.org/10.30574/wjaets.2025.14.1.0020.
Copyright © 2025 Author(s) retain the copyright of this article. This article is published under the terms of the Creative Commons Attribution Liscense 4.0