Efficient TinyML Architectures for On-Device Small Language Models: Privacy-Preserving Inference at the Edge

Mangesh Pujari; Anshul Goel; Anil Kumar Pakina

doi:10.56127/ijst.v3i3.1958

Efficient TinyML Architectures for On-Device Small Language Models: Privacy-Preserving Inference at the Edge

Authors

Mangesh Pujari Independent Researcher
Anshul Goel Independent Researcher
Anil Kumar Pakina Independent Researcher

DOI:

https://doi.org/10.56127/ijst.v3i3.1958

Keywords:

TinyML, Edge AI, Small Language Models, On-device Inference, Ai Preserving Privacy, Compression of Neural Networks, Quantization, and Microcontrollers.

Abstract

Deploying small language models (SLMs) on ultra-low-power edge devices requires careful optimization to meet strict memory, latency, and energy constraints while preserving privacy. This paper presents a systematic approach to adapting SLMs for Tiny ML, focusing on model compression, hardware-aware quantization, and lightweight privacy mechanisms. We introduce a sparse ternary quantization technique that reduces model size by 5.8× with minimal accuracy loss and an efficient federated fine-tuning method for edge deployment. To address privacy concerns, we implement on-device differential noise injection during text preprocessing, adding negligible computational overhead. Evaluations on constrained devices (Cortex-M7 and ESP32) show our optimized models achieve 92% of the accuracy of full-precision baselines while operating within 256KB RAM and reducing inference latency by 4.3×. The proposed techniques enable new applications for SLMs in always-on edge scenarios where both efficiency and data protection are critical.

References

1. Ahamed, F., et al. (2023). Mamba: Efficient sequence modeling with selective state spaces. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

2. Anand, R. (2024). Empowering edge AI with small language models: Architectures, challenges, and transformative enterprise applications. LinkedIn.

3. Chollet, F. (2017). Xception: Deep learning with depthwise separable convolutions. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

4. Howard, A. G., et al. (2017). MobileNets: Efficient convolutional neural networks for mobile vision applications. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

5. Hu, J., Shen, L., & Sun, G. (2018). Squeeze-and-excitation networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

6. Mingxing, T., et al. (2019). EfficientNet: Rethinking model scaling for convolutional neural networks. Proceedings of the International Conference on Machine Learning (ICML).

7. Prabhu, K., et al. (2021). Privacy-preserving inference on the edge: Mitigating a new threat model. OpenReview.

8. Roveri, M. (2023). Hardware architectures for embedded and edge AI (from ML to HW and back). Workshop on Widening Access to TinyML Network by Establishing Best Practices in Education.

9. Sandler, M., et al. (2019). MobileNetV2: Inverted residuals and linear bottlenecks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

10. Sifre, L., & Mallat, S. (2014). Rigid-motion scattering for image classification. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

11. Soro, S. (2020). TinyML for ubiquitous edge AI. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

12. Tan, M., et al. (2019). MnasNet: Platform-aware neural architecture search for mobile. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

13. Warden, P. (2019). Speech commands: A dataset for limited-vocabulary speech recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

14. Wightman, R. (2024). MobileNet-V4 (now in timm). GitHub Repository.

15. Yang, T.-J., et al. (2018). NetAdapt: Platform-aware neural network adaptation for mobile applications. Proceedings of the European Conference on Computer Vision (ECCV).

16. Zhang, X., et al. (2018). ShuffleNet: An extremely efficient convolutional neural network for mobile devices. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

17. Sun, Z., et al. (2020). ShadowNet: A secure and efficient on-device model inference system for convolutional neural networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

18. Wang, Y., et al. (2023). PrivateLoRA for efficient privacy-preserving LLM. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

19. Zhu, L., et al. (2023). PockEngine: Sparse and efficient fine-tuning in a pocket. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

20. Qu, G., et al. (2024). Mobile edge intelligence for large language models: A contemporary survey. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

21. Xu, M., Bi, J., & Li, H. (2022). Privacy-preserving edge intelligence: Opportunities and challenges. IEEE Internet of Things Journal, 9(14), 12023–12038. https://doi.org/10.1109/JIOT.2021.3106425

22. Reddi, V. J., et al. (2020). MLPerf inference benchmark. IEEE Micro, 40(2), 8–16. https://doi.org/10.1109/MM.2020.2972518

23. Kandala, S. V., Medaranga, P., & Varshney, A. (2024). TinyLLM: A framework for training and deploying language models at the edge computers. IEEE Systems Journal. (in press)

24. PrivateLoRA Contributors. (2024). Fine-tuning language models on-device with privacy guarantees. IEEE Secure Systems Magazine. (in press)

25. MobileNet Research Team. (2024). Open-source efficient models for mobile NLP. Google Research Newsletter.

26. Alajlan, N. N., & Ibrahim, D. M. (2022). TinyML: Enabling of inference deep learning models on ultra-low-power IoT edge devices for AI applications. Micromachines, 13(6), 851.

27. Soro, S. (2021). TinyML for ubiquitous edge AI. arXiv preprint arXiv:2102.01255.

28. Duddu, V., Boutet, A., & Shejwalkar, V. (2020). GECKO: Reconciling privacy, accuracy and efficiency in embedded deep learning. arXiv preprint arXiv:2010.00912.

29. Saha, S., & Mandal, S. (2022). A review on TinyML: State-of-the-art and prospects. Journal of King Saud University - Computer and Information Sciences, 34(4), 1595–1623.

30. Huckelberry, J., Zhang, Y., Sansone, A., Mickens, J., Beerel, P. A., & Reddi, V. J. (2024). TinyML security: Exploring vulnerabilities in resource-constrained machine learning systems. arXiv preprint arXiv:2411.07114.

Downloads

Published

2024-11-28

How to Cite

Mangesh Pujari, Anshul Goel, & Anil Kumar Pakina. (2024). Efficient TinyML Architectures for On-Device Small Language Models: Privacy-Preserving Inference at the Edge. International Journal Science and Technology, 3(3), 67–75. https://doi.org/10.56127/ijst.v3i3.1958

Download Citation

Issue

Vol. 3 No. 3 (2024): November: International Journal Science and Technology

Section

Articles

Most read articles by the same author(s)

Anil Kumar Pakina, Deepak Kejriwal, Tejaskumar Dattatray Pujari, Adversarial AI in Social Engineering Attacks: Large- Scale Detection and Automated Counter measures , International Journal Science and Technology: Vol. 4 No. 1 (2025): March: International Journal Science and Technology
Tejaskumar Pujari, Anshul Goel, Ashwin Sharma, Ethical and Responsible AI: Governance Frameworks and Policy Implications for Multi-Agent Systems , International Journal Science and Technology: Vol. 3 No. 1 (2024): March: International Journal Science and Technology
Anil Kumar Pakina, Ashwin Sharma, Deepak Kejriwal, AI-Driven Disinformation Campaigns: Detecting Synthetic Propaganda in Encrypted Messaging via Graph Neural Networks , International Journal Science and Technology: Vol. 4 No. 1 (2025): March: International Journal Science and Technology
Anil Kumar Pakina, Mangesh Pujari, Neuro- Symbolic Compliance Architectures: Real-Time Detection of Evolving Financial Crimes Using Hybrid AI , International Journal Science and Technology: Vol. 3 No. 3 (2024): November: International Journal Science and Technology
Mangesh Pujari, Anil Kumar Pakina, Anshul Goel, Balancing Innovation and Privacy: A Red Teaming Approach to Evaluating Phone-Based Large Language Models under AI Privacy Regulations , International Journal Science and Technology: Vol. 2 No. 3 (2023): November: International Journal Science and Technology
Tejaskumar Pujari, Anshul Goel, Ashwin Sharma, Ensuring Responsible AI: The Role of Supervised Fine-Tuning (SFT) in Upholding Integrity and Privacy Regulations , International Journal Science and Technology: Vol. 3 No. 3 (2024): November: International Journal Science and Technology
Tejaskumar Pujari, Anshul Goel, Deepak Kejriwal, Ethical and Responsible AI in the Age of Adversarial Diffusion Models: Challenges, Risks, and Mitigation Strategies , International Journal Science and Technology: Vol. 1 No. 3 (2022): November: International Journal Science and Technology
Mangesh Pujari, Anshul Goel, Ashwin Sharma, Enhancing Cybersecurity in Edge AI through Model Distillation and Quantization: A Robust and Efficient Approach , International Journal Science and Technology: Vol. 1 No. 3 (2022): November: International Journal Science and Technology

Efficient TinyML Architectures for On-Device Small Language Models: Privacy-Preserving Inference at the Edge

Authors

DOI:

Keywords:

Abstract

References

Downloads

Published

How to Cite

Issue

Section

Similar Articles

Most read articles by the same author(s)

Menu

Template