Difference between revisions of "Discussion of: Energy-Aware Non-Preemptive Task Scheduling With Deadline Constraint in DVFS-Enabled Heterogeneous Clusters"

From epsciwiki
Jump to navigation Jump to search
Line 19: Line 19:
 
** "For instance, training the GPT-3 model consumes 1,287 megawatt-hour(MWh), which is equivalent to 120 years of electricity consumption for an average U.S. household."
 
** "For instance, training the GPT-3 model consumes 1,287 megawatt-hour(MWh), which is equivalent to 120 years of electricity consumption for an average U.S. household."
 
** "NVIDIA estimates data centers could save a whopping 19 terawatt-hours of electricity a year if all AI, HPC and networking offloads were run on GPU and DPU accelerators (see the charts below). That’s the equivalent of the energy consumption of 2.9 million passenger cars driven for a year."
 
** "NVIDIA estimates data centers could save a whopping 19 terawatt-hours of electricity a year if all AI, HPC and networking offloads were run on GPU and DPU accelerators (see the charts below). That’s the equivalent of the energy consumption of 2.9 million passenger cars driven for a year."
* Number from TOP500 list (https://www.top500.org/lists/top500/list/2022/11/): Frontier is running at 21,100 kWatt, ~180 million (21k * 24 * 365) * unit cost (say 1ct/KWh which is low) = 1.85 million $ every year.
+
* Number from TOP500 list (https://www.top500.org/lists/top500/list/2022/11/): Frontier is running at 21,100 kWatt, ~180 million (21k * 24 * 365) * unit cost (say 1ct/KWh which is extremely low) = 1.85 million $ every year.

Revision as of 20:28, 18 January 2023

David

  • pg. 4089: energy-prior =? energy-priority and deadline-prior =? deadline-priority
  • "theta describes how much we can sacrifice the runtime energy for a shorter make-span and less occupied servers."
    • "We also introduce the theta parameter to discard the minimum E run for reducing E idle when appropriate."
  • Figure 4: 36% energy efficiency only achieved for simulation where GPU freq. scaling range is unrealistically wide
    • Do we know why this is limited? (by Cissie) My answer is no, but it's becoming even more strict from generation to generation.
    • Does the efficiency improvement need the full range, or does it always go to the edge?
  • No plots showing optimal freq. and voltage as determined by algorithm -- (by Cissie) Fig.4 has optimized normalized frequency/voltage setting for a single task without scheduling if I answered the right question.
    • (e.g. did it always push these to a limit?)-- (by Cissie) Most of them do. I have an early publication at http://www.comp.hkbu.edu.hk/~chxw/papers/hotpower_2013.pdf which talks about real data only on an old Fermi architecture. The best strategy in general is to use lowest core voltage and the maximum allowed core frequency decided by this voltage.

Cissie