Abstract: Serverless has captured the interest of researchers and practitioners alike, being often considered the next step in the evolution of the cloud. Existing research, however, indicates it is ill-suited to data analytics due to the limitations of commercial platforms. This has led researchers to either design data analytics systems that work around the limitations of serverless platforms, suggest alternative serverless platforms, or both. In this paper we demonstrate that there is a third option: to provide the functionality needed to run off-the-shelf distributed data processing systems on top of existing serverless platforms (e.g., AWS Lambda) in a transparent manner. In the paper we discuss how this can be done and present initial experimental results of the TPC-H benchmark of unmodified Apache Spark and Apache Drill running on AWS Lambda. The results enable research in serverless data analytics that go beyond patching the shortcomings of existing commercial solutions and can be the basis for turning serverless into a general purpose computing platform.
@InProceedings{odas24,
     title = {Off-the-shelf Data Analytics on Serverless},
     author = {Michael Wawrzoniak, Gianluca Moro, Rodrigo Bruno, Ana Klimovic, Gustavo Alonso},
     booktitle = {Conference on Innovative Data Systems Research (CIDR '24)},
     year = {2024}
}