Push versus Pull Telemetry at Cloud Scale: Performance Benchmarking in Kubernetes-Based OpenTelemetry Environments

Authors

  • Akhil Reddy Independent Researcher, Indonesia

DOI:

https://doi.org/10.56127/ijst.v2i1.2795

Keywords:

Cloud Observability, Kubernetes, OpenTelemetry, Prometheus, Telemetry Systems, Push Architecture, Pull Architecture, Cloud Benchmarking, AI/ML Infrastructure, Distributed Systems

Abstract

In today's cloud-native environments, telemetry pipelines are becoming more and more vital for maintaining visibility, distributed traceability, automated monitoring, and intelligent incident response for large-scale containerized environments. Most observability ecosystems today are either push or pull based, with push-based telemetry architectures having agents and exporters actively publish metrics and traces to central collectors, or pull-based architectures having monitoring systems actively pollute observability endpoints exposed by services and infrastructure nodes. Although both paradigms have been widely embraced in industry, there are few empirical comparisons that can be reproduced to assess their performance characteristics under operational conditions in the cloud. This paper provides a comprehensive benchmark study to understand and analyze the performance difference between push-based vs. pull-based telemetry architectures, with the emphasis on evaluating push-based telemetry in the context of Kubernetes cluster environments, equipped with OpenTelemetry collectors, Prometheus-based monitoring system, and synthetic workload generators to simulate production-like workloads. The study assesses end to end latency distributions, telemetry throughput, operational overhead, cost per event, end to end system behavior when put under sustained workloads, and end to end system recovery behavior after induced failures. Experimental evidence shows that the push architecture can be more adaptable under burst-intensive loads due to its flexibility in buffering mechanisms and asynchronous delivery of events, whereas the pull architecture has better consistency of query results, easier visibility of the operation, and more predetermined semantics for monitoring under steady load. This paper presents a hybrid telemetry architecture especially designed for the context of AI/machine learning infrastructure environments where the burst-resilience and scalable query semantics are concurrently required. The proposed framework combines push side buffering, and ingestion flexibility with pull side observability and analytics. The study introduces a benchmark methodology that is reproducible and experimentally validated hybrid telemetry design that can enhance the observability performance of modern distributed cloud-native infrastructure.

References

[1] Baeten, M. (2023). Microservice coverage detection. http://hdl.handle.net/1942/41381

[2] Calcote, L., & Butcher, Z. (2019). Istio: Up and running: Using a service mesh to connect, secure, control, and observe. O'Reilly Media.

[3] Denys, P. F. (2023). Distributed Performance Analysis Tools for Large Scale Computations. Ecole Polytechnique, Montreal (Canada).

[4] Gaddam, R. R., & Krishna, K. (2022). Kube Agent Hardening for Fleet-Wide Secure Telemetry. International Journal of Emerging Research in Engineering and Technology, 3(3), 148-158. https://doi.org/10.63282/3050-922X.IJERET-V3I3P115

[5] Gan, Y., Liu, G., Zhang, X., Zhou, Q., Wu, J., & Jiang, J. (2023, March). Sleuth: A trace-based root cause analysis system for large-scale microservices with graph neural networks. In Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 4 (pp. 324-337). https://doi.org/10.1145/3623278.3624758

[6] Gorak, P. (2023). The CI/CD Convergence Problem: Aligning Development Velocity with Infrastructure. IJSAT-International Journal on Science and Technology, 14(3).

[7] Kabamba, H. M. (2023). Approches intégrées d’analyse de performance des systèmes distribués asynchrones par trace d’exécution système (Doctoral dissertation, Ecole Polytechnique, Montreal (Canada)).

[8] Madamanchi, S. (2021). Google Cloud for DevOps Engineers: A practical guide to SRE and achieving Google's Professional Cloud DevOps Engineer certification. Packt Publishing Ltd.

[9] Mohamed, Haytham, "Towards an Efficient Multi-Cloud Observability Framework of Containerized Microservices in Kubernetes Platform" (2022). Masters Theses & Doctoral Dissertations. 401.

https://scholar.dsu.edu/theses/401

[10] Murphy, A. C. (2022). Hard-Real-Time Computing Performance in a Cloud Environment (Doctoral dissertation, Old Dominion University).

[11] Rajasekharaiah, C. (2020). Core cloud concepts: compute. In Cloud-Based Microservices: Techniques, Challenges, and Solutions (pp. 119-153). Berkeley, CA: Apress. https://doi.org/10.1007/978-1-4842-6564-2_7

[12] Scano, D., Giorgetti, A., Paolucci, F., Sgambelluri, A., Chammanara, J., Rothman, J., ... & Cugini, F. (2023). Enabling P4 network telemetry in edge micro data centers with kubernetes orchestration. IEEE Access, 11, 22637-22653. https://doi.org/10.1109/ACCESS.2023.3249105

[13] Seknametla, P. R. (2023). Automated Root Cause Analysis in Microservice Architectures: Leveraging Distributed Trace Correlation with OpenTelemetry for Faster Incident Resolution. International Journal of Emerging Research in Engineering and Technology, 4(1), 158-164. https://doi.org/10.63282/3050-922X.IJERET-V4I1P117

[14] Sharma, B. (2023). Improving microservices observability in cloud-native infrastructure using EBPF (Master's thesis, Purdue University).

[15] Veluru, S. P. (2021). Leveraging AI and ML for automated incident resolution in cloud infrastructure. International Journal of Artificial Intelligence, Data Science, and Machine Learning, 2(2), 51-61. https://doi.org/10.63282/3050-9262.IJAIDSML-V2I2P106

Downloads

Published

2023-04-27

How to Cite

Reddy, A. . (2023). Push versus Pull Telemetry at Cloud Scale: Performance Benchmarking in Kubernetes-Based OpenTelemetry Environments. International Journal Science and Technology, 2(1), 95–107. https://doi.org/10.56127/ijst.v2i1.2795

Citation Check

Similar Articles

<< < 1 2 3 4 5 

You may also start an advanced similarity search for this article.