Energy Efficiency in LLM Inference: Comparing Inference Libraries in a Unified Docker Framework
Reinier Schep, Razvan Loghin, Maosheng Jiang, Alex Zheng.
Group 2.

The increasing adoption of Large Language Models (LLMs) has raised concerns about their computational efficiency and energy consumption. This study presents a comparative analysis of four popular LLM inference libraries—Ollama, MLC, vLLM, and TensorRT—evaluating their energy efficiency in a standardized Dockerized environment. Each library is tested for energy consumption in Joules per token generated, and tokens per second are also measured to provide a comprehensive assessment of performance. Our experiments are conducted on identical hardware configurations to ensure fairness, and results indicate significant variations in energy efficiency between frameworks. This work aims to guide researchers and practitioners in selecting the most energy-efficient and performant LLM inference library based on their deployment needs.