Difference between revisions of "Discussion of: Energy-Aware Non-Preemptive Task Scheduling With Deadline Constraint in DVFS-Enabled Heterogeneous Clusters"

From epsciwiki
Jump to navigation Jump to search
Line 13: Line 13:
 
* Presentation slides: [https://jeffersonlab-my.sharepoint.com/:p:/r/personal/xmei_jlab_org/Documents/DVFS%20paper.pptx?d=w274fc7b784fb4043a67e544e321dda58&csf=1&web=1&e=gXXNCe DVFS on GPU clusters]
 
* Presentation slides: [https://jeffersonlab-my.sharepoint.com/:p:/r/personal/xmei_jlab_org/Documents/DVFS%20paper.pptx?d=w274fc7b784fb4043a67e544e321dda58&csf=1&web=1&e=gXXNCe DVFS on GPU clusters]
 
* Related pubs from other institutions:
 
* Related pubs from other institutions:
** Work by Google Deepmind in 2016: https://www.deepmind.com/blog/deepmind-ai-reduces-google-data-centre-cooling-bill-by-40
+
** Work by Google Deepmind in 2016: https://www.deepmind.com/blog/deepmind-ai-reduces-google-data-centre-cooling-bill-by-40, https://www.datacenterknowledge.com/archives/2016/07/19/google-cuts-its-giant-electricity-bill-with-deepmind-powered-ai
 
** Zeus, how to save 13-75% energy to with real AI workloads: https://ml.energy/zeus/
 
** Zeus, how to save 13-75% energy to with real AI workloads: https://ml.energy/zeus/
 
** NVIDIA's DPU technique (on network): https://blogs.nvidia.com/blog/2022/11/03/bluefield-dpus-energy-efficiency/
 
** NVIDIA's DPU technique (on network): https://blogs.nvidia.com/blog/2022/11/03/bluefield-dpus-energy-efficiency/
 
* Some numbers quoted from above resources
 
* Some numbers quoted from above resources
 +
** “Google said it used 4,402,836 MWh of electricity in 2014, equivalent to the average yearly consumption of about 366,903 US family homes. Saving a few percentage points of electricity usage means major financial gains for Google. Typical electricity prices companies pay in the US range from about $25 to $40 per MWh.”
 
** "For instance, training the GPT-3 model consumes 1,287 megawatt-hour(MWh), which is equivalent to 120 years of electricity consumption for an average U.S. household."
 
** "For instance, training the GPT-3 model consumes 1,287 megawatt-hour(MWh), which is equivalent to 120 years of electricity consumption for an average U.S. household."
 
** "NVIDIA estimates data centers could save a whopping 19 terawatt-hours of electricity a year if all AI, HPC and networking offloads were run on GPU and DPU accelerators (see the charts below). That’s the equivalent of the energy consumption of 2.9 million passenger cars driven for a year."
 
** "NVIDIA estimates data centers could save a whopping 19 terawatt-hours of electricity a year if all AI, HPC and networking offloads were run on GPU and DPU accelerators (see the charts below). That’s the equivalent of the energy consumption of 2.9 million passenger cars driven for a year."
 
* Number from TOP500 list (https://www.top500.org/lists/top500/list/2022/11/): Frontier is running at 21,100 kWatt, ~180 million (21k * 24 * 365) * unit cost (say 1ct/KWh which is extremely low) = 1.85 million $ every year.
 
* Number from TOP500 list (https://www.top500.org/lists/top500/list/2022/11/): Frontier is running at 21,100 kWatt, ~180 million (21k * 24 * 365) * unit cost (say 1ct/KWh which is extremely low) = 1.85 million $ every year.

Revision as of 20:36, 18 January 2023

David

  • pg. 4089: energy-prior =? energy-priority and deadline-prior =? deadline-priority
  • "theta describes how much we can sacrifice the runtime energy for a shorter make-span and less occupied servers."
    • "We also introduce the theta parameter to discard the minimum E run for reducing E idle when appropriate."
  • Figure 4: 36% energy efficiency only achieved for simulation where GPU freq. scaling range is unrealistically wide
    • Do we know why this is limited? (by Cissie) My answer is no, but it's becoming even more strict from generation to generation.
    • Does the efficiency improvement need the full range, or does it always go to the edge?
  • No plots showing optimal freq. and voltage as determined by algorithm -- (by Cissie) Fig.4 has optimized normalized frequency/voltage setting for a single task without scheduling if I answered the right question.
    • (e.g. did it always push these to a limit?)-- (by Cissie) Most of them do. I have an early publication at http://www.comp.hkbu.edu.hk/~chxw/papers/hotpower_2013.pdf which talks about real data only on an old Fermi architecture. The best strategy in general is to use lowest core voltage and the maximum allowed core frequency decided by this voltage.

Cissie