Comparing Energy Consumption of Popular RegEx Engines in Text Editors and IDEs

Marina Escribano Esteban, Kevin Hoxha, Inaesh Joshi, Todor Mladenović.

Group 1.

This study evaluates the energy efficiency of RegEx engines across JavaScript, Java, .NET, and C++. It analyses their performance at varying complexity levels. Results show that JavaScript consistently consumes the least energy, Java exhibits the highest consumption for low and medium complexity patterns, and .NET and C++ spend the most amount of energy on highly complex patterns. Statistical analysis confirms significant differences, which emphasises the importance of optimizing RegEx usage. Given the increasing focus on sustainable software engineering, developers should consider energy-efficient engines and optimization strategies to mitigate negative environmental impact.

Introduction

Regular Expessions (RegEx) are a powerful tool in software development, allowing for fast pattern matching across text files and code. However, despite their speed, performing RegEx search is CPU-intensive and can lead to unintended matching due to its greedy nature1. As the scale of code repositories grows2 and the demand for energy-efficient computing increases3, understanding the energy consumption of RegEx engines across different libraries is an important area of study.

RegEx functionality varies across libraries, each with its own engine and energy demands. This experiment explores energy usage across popular RegEx libraries when using the pattern matching functionality CTRL+F in an IDE (Integrated Development Environment) or text editor. The selected libraries are based on their use in popular IDEs4, which commonly rely on RegEx engines from Java5,6,7,8, JavaScript9, .NET10 and C++11. This experiment compares their energy usage and impact in controlled searches.

In this blog, we explore the question: How energy efficient is CTRL+F in different RegEx engines? We outline the experiment’s motivation, methodology, implementation, and hardware setup. Our findings aim to guide developers in understanding the energy impact of different RegEx engines and help them make informed choices when using tools for large-scale code searches.

Motivation

Pattern matching is a powerful, yet computational expensive tool. The complexity varies depending on the pattern, engine and input size, with worst-case scenarios reaching exponential time O(2n)12, particularly with patterns prone to “catastrophic backtracking”13. This is caused when the engine explores an exponential number of paths to match. Therefore, evaluating RegEx engines across pattern complexities is crucial to identifying inefficiencies and optimising real-world searches.

The need for such optimisation is amplified by the rapid growth of code repositories, where research highlights the challenge of navigating repositories expanding in size and complexity14. Tools like CTRL+F, powered by RegEx engines, are crucial in enabling developers to efficiently search and navigate these vast code bases.

Finally, optimising RegEx search aligns with the broader goal of creating sustainable software. Research points towards the ICT (Information, Communication and Technology) sector impacting 14% of the global carbon footprint by 204015, forcing us to acknowledge that improving the efficiency of RegEx search can help reduce unnecessary computational overhead and contribute towards greener development.

Methodology

Experiment Procedure

We conduct a controlled experiment to isolate RegEx processing and systematically measure energy consumption across libraries and complexity levels. This allows us to evaluate how efficiently different implementations execute RegEx searches. The procedure is outlined below.

1. Corpus Generation

To standardise the RegEx test environment, we use the largest Python file we found in the NumPy repository, as Python is the most widely used programming language16. To ensure a significant computational load, we expand the file to 100MB by repeating its content. The final .txt file ensures uniform processing across different RegEx engines.

2. Setting the Laptop to Zen Mode

Before conducting experiments, we ensure a controlled environment by setting the device to Zen Mode. This minimises interruptions for consistent measurements. We define Zen Mode as follows:

3. System Warm-up

The laptop warms up by running Fibonacci computations for 300 seconds before any measurement, helping the CPU reach a stable thermal state and reducing fluctuations.

4. Execution of RegEx Search Experiments

Each RegEx engine undergoes the same testing conditions, ensuring a fair comparison. Four RegEx libraries are tested, the choice of which is explained in the Motivation section:

For each RegEx library, we test three levels of RegEx complexity:

Each experiment involves a distinct combination of a RegEx engine and complexity level. To minimise bias from external factors, the execution order of experiments is randomly shuffled. For each combination, 30 runs are conducted. The procedure for each experiment is as follows:

  1. Initialise the RegEx engine for the current library.
  2. Serialise and save the engine state to a temporary pickle file to prevent differences in engine startup times from affecting measurements.
  3. Load the engine state from the pickle file and perform RegEx search.
  4. Measure energy consumption (Joules) and time (seconds) using the EnergiBridge 17 tool for step 3, executed in a Python subprocess. We measure energy (Joules) rather than power (Watts) because RegEx libraries primarily operate as a one-off computation rather than a continuous process. Power represents rate of energy consumption (Joules per second), which is more relevant for continous tasks. However, since RegEx libraries typically execute as a single, finite task, the total energy consumed during execution is a more appropriate metric.
  5. Log energy consumption in a CSV format.
  6. After each run, the system is put to rest for 60 seconds to revert to a steady-state operating temperature, limiting thermal tail effects.

5. Statistical Analysis

Once the experiments are completed, statistical analysis is performed to evaluate RegEx engine performance. The chosen statistical tests ensure that our findings are significant.

Normality Testing

Shapiro-Wilk Test checks data normality (p-value ≥ 0.05 indicates normality), guiding test selection. In this case, we hypothesise the data will be normal and use parametric statistical tests.

Outlier Detection

The IQR (Inter-Quartile Range) is used to identify and remove outliers, preventing extreme values from skewing the results. After testing both IQR and the Z-Score method, the prior was more effective at identifying outliers. A data point is considered an outlier if it lies below the Lower Bound = Q1 - 1.5 * IQR or above the Upper Bound = Q3 + 1.5 * IQR.

Statistical Significance Tests

Even if, on average, one RegEx engine has lower energy consumption than another, it might be just random difference. We use two-sided parametric Welch’s t-tests to help us determine whether these differences in energy consumption are conclusive and significant, assuming a normal distribution. Welch’s t-test identifies whether two distributions have significantly different means. If the p-value is less than 0.05, the difference is significant.

Effect Size Analysis

Effect size metrics are used to quantify the practical significance of differences between RegEx engines:

Hardware/Software Details

We run the experiments on a machine with the following hardware specifications:

Furthermore, we run the experiments with the following software versions:

The other software requirements regarding employed code libraries can be found in the requirements.txt (or environment.yml if you are using Conda) in our Github repository, which is linked in the Replication Package section.

Results

The results from the experiment are presented in the violin plots below (without outliers). They illustrate the energy consumption (in Joules) and execution time (in seconds) of regular expressions across different levels of complexity. Furthermore, this section performs the Shapiro-Wilk test for normality and computes the effect changes between distributions.

Low Complexity Pattern

Low Complexity Energy Low Complexity Time
violin_energy_low violin_time_low

For the low complexity pattern, the energy consumption and processing time distributions show differences across engines. Java is the most energy-intensive (~60J, ~1.8s), while C++ is moderate (~45J, ~1.6s). Following these, .NET consumes less (~20J, ~0.8s), and JavaScript shows the most efficient handling of simple RegEx tasks (~10J, ~0.4–0.5s).

Medium Complexity Pattern

Medium Complexity Energy Medium Complexity Time
violin_energy_medium violin_time_medium

The medium complexity pattern showcases consistent rankings with the low complexity pattern between the energy consumptions and processing times. Java still takes the most amount of energy and time taken (~70J, ~2s), exceeding its low-complexity pattern usage. For the other three engines, the consumptions and processing times stay similar. C++ (~40J, ~1.6s), .NET (~20J, ~0.7s), and JavaScript, remaining the most efficient (~10J, ~0.4s).

High Complexity Pattern

High Complexity Energy High Complexity Time
violin_energy_high violin_time_high

For high complexity RegEx patterns, the results show significant increases in both execution time and energy consumption. C++ and .NET experience sharp spikes, their consumption being (~400J, ~14s) and (~250J, ~9s), respectively. This shows that these two engines struggle to process complex patterns efficiently. Java and Javascript have energy consumptions and processing times that are more consistent with the other two pattern complexities, standing at (~70-80J, ~2s) and (~10J, ~0.4s), respectively. This demonstrates that certain engines manage complexity significantly better than others.

Statistical Tests

Shapiro-Wilk Normality Tests

Initially, the Shapiro-Wilk Test showed non-normal distributions for both energy and time, which we attributed to the presence of outliers after visually inspecting the plots. After removing outliers using the 1.5 IQR method, we reran the tests. The tables below summarise the Shapiro-Wilk p-values for energy consumption post-outlier removal. The exclusion of time results is justified in the Discussion section.

Complexity Engine Shapiro-p-value
Low C++ 0.989373
Low .NET 0.013857
Low Java 0.723034
Low JS 0.138883
Medium C++ 0.238653
Medium .NET 0.299012
Medium Java 0.806078
Medium JS 0.156827
High C++ 0.633164
High .NET 0.755118
High Java 0.012378683
High JS 0.333185

Energy distributions for engine and complexity combinations are mostly normal (p > 0.05), except for Java in high complexity (p = 0.012378683) and .NET in low complexity (p = 0.013857). However, given the vast majority of the measurements follow a normal distribution, we proceed with tests for normal data.

Parametric Significance Test - Welch’s t-test

The following table summarises the p-values from Welch’s t-tests comparing pairs of RegEx engines (C++, .NET, Java, Javascript) for energy metrics.

Complexity Comparison Energy p-value
Low C++ vs .NET 4.55e-58
Low C++ vs Java 5.79e-65
Low C++ vs JavaScript 4.47e-64
Low .NET vs Java 6.72e-71
Low .NET vs JavaScript 1.13e-31
Low Java vs JavaScript 3.84e-76
Medium C++ vs .NET 1.35e-52
Medium C++ vs Java 2.78e-44
Medium C++ vs JavaScript 6.57e-70
Medium .NET vs Java 4.90e-71
Medium .NET vs JavaScript 1.09e-36
Medium Java vs JavaScript 2.18e-67
High C++ vs .NET 3.37e-07
High C++ vs Java 7.24e-08
High C++ vs JavaScript 1.74e-07
High .NET vs Java 2.64e-08
High .NET vs JavaScript 7.06e-08
High Java vs JavaScript 1.05e-08

As seen on this table, p-values are extremely small (p < 0.05), which strongly indicates statistically significant differences between each pair of engines for energy metrics. Thus, the effect sizes are examined in the next subsection to understand the magnitude of these differences.

Effect Sizes

The following table summarises the effect sizes for energy (in joules) from comparisons between pairs of RegEx engines across different complexity levels.

Complexity Comparison Mean Difference (Joules) Percent Change (%) Cohen’s d
Low C++ vs .NET 23.1619 129.2737 29.3839
Low C++ vs Java -23.9762 -36.8553 -44.5968
Low C++ vs JavaScript 29.4681 253.8029 55.4820
Low .NET vs Java -47.1380 -72.4588 -60.8094
Low .NET vs JavaScript 6.3063 54.3147 8.0046
Low Java vs JavaScript 53.4443 460.3046 103.6916
Medium C++ vs .NET 22.8214 109.7879 27.1405
Medium C++ vs Java -14.7042 -25.2163 -19.0978
Medium C++ vs JavaScript 31.8477 270.8041 58.6233
Medium .NET vs Java -37.5256 -64.3527 -37.6765
Medium .NET vs JavaScript 9.0264 76.7519 10.4847
Medium Java vs JavaScript 46.5520 395.8356 58.8379
High C++ vs .NET 129.8053 52.5452 33.9700
High C++ vs Java 300.3934 392.9411 105.4453
High C++ vs JavaScript 357.2846 1826.9605 121.7050
High .NET vs Java 170.5881 223.1443 72.2337
High .NET vs JavaScript 227.4793 1163.2064 94.0288
High Java vs JavaScript 56.8912 290.9109 57.2439

The effect size analysis reveals significant differences in energy consumption among RegEx engines, which increase with complexity. For high complexity patterns, C++ consumes the most energy, followed by .NET, Java, and JavaScript, with Cohen’s d values indicating extremely large effect sizes. The most striking difference is C++ consuming 1826.96% more energy than JavaScript. This suggests the efficiency differences between RegEx engines and underscores the engine selection importance.

For low and medium pattern complexity, Java consistently consumes more energy, while JavaScript is the most energy-efficient. The negative mean differences for C++ vs Java and .NET vs Java indicate that Java is the least optimal for lower complexity RegEx tasks. These findings suggest that developers should prioritise JavaScript or .NET when dealing with simpler RegEx patterns.

Discussion

The experimental results indicate that the energy consumption of RegEx search is highly dependent on the underlying RegEx engine. Additionally, the complexity of the pattern being matched plays a crucial role, with different engines exhibiting varying relationships to it.

The energy consumption was directly correlated to the time taken for the engine to complete a task. This suggests consistent energy consumtion during the experiments and minimal power spikes.

We explain the non-normality occasionally observed in the results due to the RegEx engines relying on non-deterministic finite automaton (NFA) algorithms. This means the engines can take unpredictably varied execution paths (even for the same input), that don’t follow the assumptions of normality. Thus, we only consider energy measurements in our statistical analysis.

JavaScript

The JavaScript (JS) RegEx engine consistently performed the best across all complexities of patterns. And what is more, the energy consumption showed little to no variation across levels of pattern complexity.

These findings suggest JS’s RegEx and its integration with Irregexp, is highly optimised for the type of queries evaluated. A part of what makes this integration so effective at reducing energy consumption accross all levels of complexity evaluated is possibly the use of the highly optimised RegExp Interpreter for infrequent/simple patterns.

Despite the impressive results, JS’s RegEx implements the more conservative ECMAScript specifications. It sacrifices a few common features of Regular Expressions such as Atomic Groups, Conditional-Patterns and more. This decision to omit certain features may contribute to its speed and efficiency in pattern matching.

Java

Java RegEx results indicate the engine is optimised for high complexity queries, where it performed second to JS. It is likely that Java employs sophisticated methods to address the catastrophic backtracking problem. For simpler patterns, Java’s relatively poor performance could be a tradeoff neccassary for efficient backtracking. However, as simpler patterns are likely the majority use-case, we find it questionable that Java optimised specifically for high-complexity patterns at the cost of simpler ones.

C++ (Boost) and C# (.NET)

Both of these engines exhibit similar relationships with complexity of pattern. .NET, clearly the superior of the two, shows particularly impressive results for low and medium complexity patterns. Its RegEx engine benefits from its deep integration within the .NET framework and the continuous iterative improvements made over many years.

When it comes to high complexity patterns, both RegEx engines behave more unpredictably, as reflected in the distributions of time and energy. The leap in energy consumption was expected in high complexity patterns as both search time and memory usage (for storing expanded states) increase due to the inclusion of backtracking.

Implications

The findings of this study have important implications in the domain of software sustainability. Given that JavaScript’s RegExp engine consistently outperformed other engines across all complexity levels, developers should consider leveraging JavaScript-based tools or libraries for RegEx processing, especially in energy-constrained environments.

For developers working in Java, C++, or .NET environments, optimisation strategies should be explored to mitigate excessive energy consumption. Techniques such as pre-compiling RegEx patterns, using non-backtracking RegEx approaches, and leveraging alternative search methods like indexed text searching can help improve efficiency.

Implications on Sustainability

At first glance, the energy consumed by a single RegEx operation (10-400J) appears minuscule compared to daily activities, such as boiling water in a kettle, which takes 165,000 joules18, or charging your phone, which could take around 29,000 joules19. However, in large-scale systems where RegEx operations are executed millions of times daily, these small amounts accumulate significantly. For instance:

JavaScript Engine: 1 million executions x 10 joules = 10 million joules (10 MJ)

C++ Engine: 1 million executions x 400 joules = 400 million joules (400 MJ)

This difference of 390 million joules (390 MJ) is equivalent to boiling approximately 2,300 kettles, or charging around 13,500 smartphones. Thus, we highlight the need for developers to make informed choices about the tools they use, as even minor efficiency improvements can have a meaningful impact at scale.

Limitations and Future Work

This study provides valuable insights into RegEx library energy consumption, but several limitations need addressing:

  1. Pattern Coverage: Testing was limited to only three patterns. Future work should conduct a comprehensive study to select the most commonly used RegEx patterns and evaluate them for better real-world representation.
  2. Single-File Testing: Experiments were conducted on a single file, whereas real-world searches often span multiple files, which adds disk I/O and memory overhead. Future studies should consider multi-file scenarios.
  3. Device & OS Variability: Results were based on a single machine, with energy usage potentially varying across different hardware and operating systems. Cross-platform testing on diverse devices is needed.
  4. Thermal Fluctuations: Despite system cooldowns, CPU temperature variations could affect energy measurements. Future research could account for these fluctuations more rigorously.

Conclusion

This study highlighted significant variations in the energy efficiency of different RegEx engines, and demonstrated that JavaScript consistently outperforms Java, .NET, and C++ in both execution time and energy consumption. While the differences may seem minor in isolated cases, their impact is substantial when scaled across large-scale applications. Optimising RegEx usage is crucial for reducing computational overhead, minimising energy costs, and promoting sustainable software development. Developers should consider leveraging more efficient engines or alternative search techniques to mitigate excessive energy consumption.

Replication Package

For the sake of reproducibility, the code for the experiment and data analysis can be found here.

References

  1. Winslow, R. (2021, February 18). Regex basics. Canonical Ltd. Retrieved February 27, 2025, from https://ubuntu.com/blog/regex-basics 

  2. GitHub. (2024, November 22). Octoverse: AI leads Python to top language as the number of global developers surges. GitHub. Retrieved February 27, 2025, from https://github.blog/news-insights/octoverse/octoverse-2024/#the-state-of-open-source 

  3. Atadoga, A., Umoga, U., Lottu, O., & Sodiya, E. (2024, February). Tools, techniques, and trends in sustainable software engineering: A critical review of current practices and future directions. World Journal of Advanced Engineering Technology and Sciences, 11, 231-239. https://doi.org/10.30574/wjaets.2024.11.1.0051 

  4. PYPL (2025, February). Top IDE Index. Retrieved February 27, 2025, from https://pypl.github.io/IDE.html 

  5. JetBrains. (2024, October 11). Finding and replacing text using regular expressions in IntelliJ IDEA. JetBrains. Retrieved February 27, 2025, from https://www.jetbrains.com/help/idea/tutorial-finding-and-replacing-text-using-regular-expressions.html 

  6. JetBrains. (2024, October 11). Finding and replacing text using regular expressions in PyCharm. JetBrains. Retrieved February 27, 2025, from https://www.jetbrains.com/help/pycharm/tutorial-finding-and-replacing-text-using-regular-expressions.html 

  7. JetBrains. (2024, October 11). Finding and replacing text using regular expressions in Webstorm. JetBrains. Retrieved February 27, 2025, from https://www.jetbrains.com/help/webstorm/tutorial-finding-and-replacing-text-using-regular-expressions.html 

  8. Google. Pattern. Android Developers. Retrieved February 27, 2025, from https://developer.android.com/reference/java/util/regex/Pattern 

  9. Lourens, R. (2024, November 13). VSCode: Search issues. GitHub. Retrieved February 27, 2025, from https://github.com/microsoft/vscode/wiki/Search-Issues#notes-on-regular-expression-support 

  10. Microsoft (2024, September). Use regular expressions in Visual Studio. Retrieved February 27, 2025, from https://learn.microsoft.com/en-us/visualstudio/ide/using-regular-expressions-in-visual-studio?view=vs-2022 

  11. Ho, D. (2003, November 25). Notepad++ [Computer software]. GitHub. https://github.com/notepad-plus-plus/notepad-plus-plus 

  12. Wikipedia. (n.d.). Regular expression: Implementations and running times. Retrieved February 27, 2025, from https://en.wikipedia.org/wiki/Regular_expression#Implementations_and_running_times 

  13. Regular-Expressions.info. (n.d.). Catastrophic backtracking. Retrieved February 28, 2025, from https://www.regular-expressions.info/catastrophic.html 

  14. Ma, Y., Yang, Q., Cao, R., Li, B., Huang, F., & Li, Y. (2018). How to understand whole software repository? arXiv. https://arxiv.org/abs/2406.01422 

  15. Belkhir, L., & Elmeligi, A. (2018). Assessing ICT global emissions footprint: Trends to 2040 & recommendations. Journal of Cleaner Production, 177, 448-463. https://doi.org/10.1016/j.jclepro.2017.12.239 

  16. Carbonnelle, P. (2025, February). PYPL Popularity of Programming Language Index. Retrieved February 27, 2025, from https://pypl.github.io/PYPL.html#google_vignette 

  17. Sallou, J., Cruz, L., & Durieux, T. (2023, December 21). EnergiBridge: Empowering software sustainability through cross-platform energy measurement (Version 1.0.0). GitHub. https://doi.org/10.48550/arXiv.2312.13897 

  18. SEE Sustainability. (n.d.). Boiling water - how much energy? SEE Sustainability Blog. Retrieved February 28, 2025, from https://seesustainability.co.uk/blog/f/boiling-water—how-much-energy#:~:text=A%20kettle%20or%20a%20boiling,in%20the%20form%20of%20heat 

  19. Warden, P. (2015, October 8). Smartphone energy consumption. Pete Warden’s Blog. Retrieved February 28, 2025, from https://petewarden.com/2015/10/08/smartphone-energy-consumption/