Back to all projects

Energy Implications of Language and Runtime Choices in Data Ingestion Pipelines

Yuvraj Singh Pathania, Viktor Seršiḱ, Emils Dzintars, Madhav Chawla.

Group 8.

Compare the energy consumption of Python, Java, Go, and Rust when performing a data ingestion task.

Introduction

Energy consumption is becoming an important concern in software engineering, yet developers rarely have concrete guidance on how implementation choices influence energy efficiency. In this study, we investigate how programming language selection affects energy usage in a common data-intensive workload. We design and implement an identical data ingestion pipeline that reads large JSONL datasets, performs schema validation and lightweight transformations, aggregates records, and writes the results to Parquet format. The pipeline is implemented across multiple languages representing different runtime and compilation models, namely Python, Go, Java, and Rust.

Using EnergiBridge, we systematically measure execution time, energy consumption, and resource utilization under controlled experimental conditions. By analyzing both total energy usage and average power draw, we examine whether observed differences are primarily driven by runtime performance or by variations in power behavior. Our results provide practical insights into the energy-performance trade-offs of language and runtime choices for data processing systems. We further provide a replication package containing implementations, automation scripts, and measurement procedures to support reproducibility and future comparative studies.

Why compare language energy for ingestion pipelines?

Data ingestion is a “hidden” but ubiquitous workload: ETL jobs, observability pipelines, log processing, feature generation, and data lake compaction. The same pipeline can be implemented in many languages, and the choice often comes down to ecosystem or developer experience. However, ingestion is also compute- and IO-intensive, so small differences in runtime behavior (parsing, allocations, GC, compression libraries, Parquet writers) can translate into non-trivial energy differences at scale.

This project investigates a focused question:

Our goal is not to declare a universal “winner”, but to provide actionable intuition: are the differences mainly about doing the work faster (lower time), or about drawing less power while doing it?


Workload definition (same task across languages)

All four implementations perform the same high-level pipeline:

JSONL(.gz) → validate → transform → aggregate → Parquet + checksum

For each JSON record we:

Dataset slice

The single-hour slice is small enough to run many repetitions, while still being representative of messy real-world JSON lines with nested fields.


Implementations

We implemented the same pipeline in four languages:

All implementations download from the same GH Archive endpoint and write outputs into a shared data/ directory when run under Docker.


Measurement methodology

What we measured

Per run, we report:

What we did not measure (and why it matters)

It is important to distinguish where energy is being measured:

This distinction matters most when the workload is IO-heavy (storage/network dominated). In our experiment we deliberately used cpu/strict validation to keep the workload more compute-heavy and make package energy a meaningful proxy.

Tooling

We used EnergiBridge (invoked as energibridge) to sample energy counters at a high frequency and output time-series CSV files. Each per-run CSV contains cumulative energy readings; we compute per-run energy as:

\[E_{run} = E_{last} - E_{first}\]

We also compute per-run time from the same CSV:

\[T_{run} = (t_{last} - t_{first}) / 1000\]

where timestamps are in milliseconds.

Protocol

Environment notes

The recorded CSVs include energy counters like PACKAGE_ENERGY (J) and DRAM_ENERGY (J), which typically come from Intel RAPL domains. We report package energy, which is closer to “CPU + uncore” energy than whole-system energy (i.e., it does not include everything the PSU would see).

From the measurement CSV metadata, the system had 8 logical CPUs (per-core frequency/utilization columns) and ~33.36 GB total memory (reported as TOTAL_MEMORY=33360306176 bytes).


Results

Summary table (package energy)

Below we summarize mean ± 95% CI across 20 measured runs per language:

Language Mean energy (J) Mean time (s) Mean power (W) Mean EDP (J·s)
Rust 58.69 ± 1.86 3.692 ± 0.048 15.90 ± 1.03 216.83 ± 17.50
Python 62.42 ± 2.23 3.902 ± 0.064 16.00 ± 1.15 243.77 ± 23.43
Go 101.86 ± 3.46 8.574 ± 0.200 11.94 ± 1.36 871.10 ± 42.41
Java 114.94 ± 1.73 5.883 ± 0.056 19.54 ± 0.70 676.24 ± 27.35

Ranking by mean energy (lower is better): Rust < Python « Go < Java.

Relative differences (practical effect size)

Sometimes “statistically different” is less important than “how big is the difference in practice?”. Using Rust as the lowest-energy baseline:

Even with just a one-hour slice, these gaps are large enough to matter if this pipeline is executed frequently (e.g., hourly ingestion jobs across many datasets).

Stability across repetitions

Energy and time naturally vary due to OS scheduling, background load, and thermal state. Across 20 runs we observed:

This is one reason we ran 20 measured repetitions and randomized run order: it makes the averages more robust to run-to-run noise.

Visualizations

Mean energy with 95% CI:

This bar chart shows the mean package energy per language with 95% confidence intervals. The error bars represent the uncertainty in our mean estimates based on 20 measured runs per language. Rust and Python cluster together at the low end (~58–62 J), while Go and Java show substantially higher mean energy consumption.

Energy vs time per run (each dot is one measured run):

This scatter plot shows the relationship between execution time and package energy for each individual measured run. Points in the lower-left quadrant represent the most efficient runs (low energy, low time). The clustering pattern reveals that:

Energy distribution (violin plot):

Violin plots combine the information of a box plot (median, quartiles, whiskers) with a kernel density estimate showing the full distribution shape of energy values across all 20 measured runs per language. The width of each “violin” at any energy level indicates how many runs had values near that level—wider sections mean more runs clustered there.

Key insights from the energy violin plot:

The median line (thick horizontal line) shows the central tendency, while the quartile box shows where 50% of runs fall. The whiskers extend to the full range of observed values.

Time distribution (violin plot):

The time violin plot reveals the distribution of execution times across measured runs. This complements the energy analysis by showing whether time variability contributes to energy differences.

Observations:

The violin plots help us understand that variability matters: even if two languages have similar mean energy, differences in distribution shape (symmetry, tails, clustering) can affect real-world predictability and worst-case behavior.


Discussion: where do the differences come from?

1) Fast can be energy efficient (but not always)

Rust and Python finish the workload in ~3.7–3.9 seconds on average and also show the lowest package energy. This suggests the dominant driver here is time-to-complete: finishing earlier reduces the time the CPU package stays in a high-activity state.

However, Java is an instructive counterexample: it completes faster than Go (5.9s vs 8.6s) but still consumes more package energy (114.9J vs 101.9J). The scatter plot shows Java points at higher power and moderate time: Java tends to “go harder” (higher average W), whereas Go tends to run longer at lower average W.

2) Power behavior differs significantly by runtime

The mean power numbers show two clusters:

So Go’s energy is not high because it draws high power; it’s high because it runs longer. Java’s energy is high because even though it’s not the slowest, it runs at higher power.

3) EDP favors Rust/Python strongly

If your decision metric is “save energy and finish quickly” (e.g., latency-sensitive pipelines), EDP highlights the same story:

4) Likely causes (hypotheses)

We did not instrument the pipeline into fine-grained phases in this iteration, but based on typical ingestion behavior we expect:

5) Connecting the hypotheses to our observed ranking (Rust ≈ Python « Go < Java)

Rust: lowest energy and lowest EDP

Rust’s result is consistent with a typical ingestion profile for an AOT-compiled language:

Even if some of the benefit comes from library implementation quality (e.g., Arrow/Parquet writer), the practitioner takeaway remains: in this pipeline, Rust delivered the best combined energy+latency outcome.

Python: surprisingly close to Rust (but with caveats)

In many CPU-bound tasks, Python is slower due to interpreter overhead. Our results show Python close to Rust in both energy and time, which suggests the workload may not be “pure Python execution” dominated:

However, this closeness should not be over-generalized: a pipeline with heavier per-record transformations (e.g., complex schema normalization, joins, enrichment) would likely magnify Python’s interpreter overhead and shift the ranking.

Go: lower power, but much longer runtime

Go stands out as the lowest average power (~11.9W) but second-worst energy because it runs much longer:

This is a concrete example of why “lower average watts” does not automatically imply “lower energy” for batch jobs.

Java: not the slowest, but the highest energy (power-heavy execution)

Java uses the most energy despite completing faster than Go. The reason is visible in the power figures: Java draws the highest average power (~19.5W). Likely contributing factors include:

In other words, Java may be “fast” in a wall-clock sense while still being “expensive” in energy because it keeps the package in a higher power state for most of the run.

Cross-cutting factor: ecosystem/libraries are part of the reality

Even if all four implementations follow the same logical pipeline, the actual work performed differs subtly via:

From an engineering standpoint, this is not a bug in the comparison—it reflects real decisions teams make. But it does mean our results should be interpreted as language + typical ecosystem for this ingestion task, not “language runtime in isolation”.


Threats to validity (what could bias these results?)


Reproducibility checklist (what to report in your own replication)

If you replicate or extend our experiment, we recommend recording:


Replication package

We provide a replication package with Dockerfiles, scripts, and the measurement outputs used in this report:

At a high level, you can reproduce the measurement matrix by:

Minimal reproduction steps

From the repo root:

To ensure you use the same dataset slice as this report, set the hardcoded download date in each language to:


Conclusion

For a strict-validation ingestion pipeline on a one-hour GH Archive slice (2026-02-19), we observed large differences in package energy across languages:

The practical takeaway is that language choice affects both how quickly the pipeline finishes and how much power it draws while doing so. For teams operating large-scale ingestion workloads, these differences can compound, making language/runtime decisions relevant not only for performance but also for energy and sustainability goals.

Back to all projects