Efficient TinyML Architectures for On-Device Small Language Models: Privacy-Preserving Inference at the Edge

Authors

  • Mangesh Pujari Independent Researcher
  • Anil Kumar Pakina Independent Researcher
  • Anshul Goel Independent Researcher

DOI:

https://doi.org/10.56127/ijst.v3i3.1958

Keywords:

TinyML, Edge AI, Small Language Models, On-device Inference, Ai Preserving Privacy, Compression of Neural Networks, Quantization, and Microcontrollers.

Abstract

Deploying small language models (SLMs) on ultra-low-power edge devices requires careful optimization to meet strict memory, latency, and energy constraints while preserving privacy. This paper presents a systematic approach to adapting SLMs for Tiny ML, focusing on model compression, hardware-aware quantization, and lightweight privacy mechanisms. We introduce a sparse ternary quantization technique that reduces model size by 5.8× with minimal accuracy loss and an efficient federated fine-tuning method for edge deployment. To address privacy concerns, we implement on-device differential noise injection during text preprocessing, adding negligible computational overhead. Evaluations on constrained devices (Cortex-M7 and ESP32) show our optimized models achieve 92% of the accuracy of full-precision baselines while operating within 256KB RAM and reducing inference latency by 4.3×. The proposed techniques enable new applications for SLMs in always-on edge scenarios where both efficiency and data protection are critical.

References

1. Ahamed, F., et al. (2023). Mamba: Efficient sequence modeling with selective state spaces. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

2. Anand, R. (2024). Empowering edge AI with small language models: Architectures, challenges, and transformative enterprise applications. LinkedIn.

3. Chollet, F. (2017). Xception: Deep learning with depthwise separable convolutions. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

4. Howard, A. G., et al. (2017). MobileNets: Efficient convolutional neural networks for mobile vision applications. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

5. Hu, J., Shen, L., & Sun, G. (2018). Squeeze-and-excitation networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

6. Mingxing, T., et al. (2019). EfficientNet: Rethinking model scaling for convolutional neural networks. Proceedings of the International Conference on Machine Learning (ICML).

7. Prabhu, K., et al. (2021). Privacy-preserving inference on the edge: Mitigating a new threat model. OpenReview.

8. Roveri, M. (2023). Hardware architectures for embedded and edge AI (from ML to HW and back). Workshop on Widening Access to TinyML Network by Establishing Best Practices in Education.

9. Sandler, M., et al. (2019). MobileNetV2: Inverted residuals and linear bottlenecks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

10. Sifre, L., & Mallat, S. (2014). Rigid-motion scattering for image classification. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

11. Soro, S. (2020). TinyML for ubiquitous edge AI. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

12. Tan, M., et al. (2019). MnasNet: Platform-aware neural architecture search for mobile. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

13. Warden, P. (2019). Speech commands: A dataset for limited-vocabulary speech recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

14. Wightman, R. (2024). MobileNet-V4 (now in timm). GitHub Repository.

15. Yang, T.-J., et al. (2018). NetAdapt: Platform-aware neural network adaptation for mobile applications. Proceedings of the European Conference on Computer Vision (ECCV).

16. Zhang, X., et al. (2018). ShuffleNet: An extremely efficient convolutional neural network for mobile devices. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

17. Sun, Z., et al. (2020). ShadowNet: A secure and efficient on-device model inference system for convolutional neural networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

18. Wang, Y., et al. (2023). PrivateLoRA for efficient privacy-preserving LLM. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

19. Zhu, L., et al. (2023). PockEngine: Sparse and efficient fine-tuning in a pocket. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

20. Qu, G., et al. (2024). Mobile edge intelligence for large language models: A contemporary survey. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

21. Xu, M., Bi, J., & Li, H. (2022). Privacy-preserving edge intelligence: Opportunities and challenges. IEEE Internet of Things Journal, 9(14), 12023–12038. https://doi.org/10.1109/JIOT.2021.3106425

22. Reddi, V. J., et al. (2020). MLPerf inference benchmark. IEEE Micro, 40(2), 8–16. https://doi.org/10.1109/MM.2020.2972518

23. Kandala, S. V., Medaranga, P., & Varshney, A. (2024). TinyLLM: A framework for training and deploying language models at the edge computers. IEEE Systems Journal. (in press)

24. PrivateLoRA Contributors. (2024). Fine-tuning language models on-device with privacy guarantees. IEEE Secure Systems Magazine. (in press)

25. MobileNet Research Team. (2024). Open-source efficient models for mobile NLP. Google Research Newsletter.

26. Alajlan, N. N., & Ibrahim, D. M. (2022). TinyML: Enabling of inference deep learning models on ultra-low-power IoT edge devices for AI applications. Micromachines, 13(6), 851.

27. Soro, S. (2021). TinyML for ubiquitous edge AI. arXiv preprint arXiv:2102.01255.

28. Duddu, V., Boutet, A., & Shejwalkar, V. (2020). GECKO: Reconciling privacy, accuracy and efficiency in embedded deep learning. arXiv preprint arXiv:2010.00912.

29. Saha, S., & Mandal, S. (2022). A review on TinyML: State-of-the-art and prospects. Journal of King Saud University - Computer and Information Sciences, 34(4), 1595–1623.

30. Huckelberry, J., Zhang, Y., Sansone, A., Mickens, J., Beerel, P. A., & Reddi, V. J. (2024). TinyML security: Exploring vulnerabilities in resource-constrained machine learning systems. arXiv preprint arXiv:2411.07114.

Published

2024-11-28

How to Cite

Mangesh Pujari, Anil Kumar Pakina, & Anshul Goel. (2024). Efficient TinyML Architectures for On-Device Small Language Models: Privacy-Preserving Inference at the Edge. International Journal Science and Technology, 3(3). https://doi.org/10.56127/ijst.v3i3.1958

Similar Articles

<< < 1 2 3 > >> 

You may also start an advanced similarity search for this article.

Most read articles by the same author(s)

Obs.: This plugin requires at least one statistics/report plugin to be enabled. If your statistics plugins provide more than one metric then please also select a main metric on the admin's site settings page and/or on the journal manager's settings pages.