Solving AI’s growing carbon emissions

How to make your AI model much more greener

7 min readFeb 10, 2023

AI has become an integral part of our lives, but its growth in recent years has come at a cost. As AI systems become increasingly complex, their training and usage can generate large amounts of carbon emissions, contributing to global warming and climate change.

Problem

A study last year found that training a standard AI language-processing system produced 1,400 pounds of emissions, which is the same as flying one person roundtrip between New York and San Francisco. If the full suite of experiments needed to build and train the system from scratch was done, this number could increase to 78,000 pounds.
Hugging Face estimated that their model’s training led to 25 metric tons of carbon dioxide emissions. When taking into account the emissions produced by the manufacturing of the computer equipment and the energy required to run the model, the emissions doubled.
In June, OpenAI unveiled the largest language model in the world, called GPT-3. This model can write creative fiction, translate legalese into plain English, and answer obscure trivia questions. However, the model cost $4.6 million and 355 years in computing time due to its colossal size.

The relationship between model accuracy and complexity is logarithmic. This means that an exponential increase in model size and training requirements can result in linear improvements to performance. Despite this, less priority is given to developing methods with improved efficiency. We need to be aware of the trade-off between accuracy and efficiency and the model’s carbon footprint, both during training and when making inferences.

Carbon-aware training

Researchers from the University of Michigan created Zeus, designed to reduce the amount of energy and carbon emissions used while training artificial intelligence (AI) systems.

Basically what Zeus does is adjust the power consumption of the GPU in real time when clean energy is available, Zeus will use it to quickly make progress in training, and when it’s not available, it will slow down and consume less energy. This can result in a reduction of up to 78% in energy consumption and carbon emissions, without needing to buy any additional hardware or compromising on performance.

Cherry-picking relevant data

Researchers at the MIT-IBM Watson AI Lab have developed an approach to make AI models more efficient when recognizing actions in videos. They found that the human perceptual system is good at picking out the most important information, so they applied this concept to their models.

As an example, if a video clip showed someone making a sandwich, a policy network would selectively target certain frames such as the knife slicing the roast beef or the meat being stacked on a slice of bread, to be represented in high resolution, while less significant frames would be omitted or represented at lower resolution. This abbreviated version of the movie is then labeled by a second model as “making a sandwich”. This method results in faster video classification, with half the computational cost of the next-best model, according to the researchers.

The researchers claim that this approach leads to faster video classification and consumes less computing power than the prior best model. By “cherry-picking” the most relevant information, the models can be made more efficient.

Using AI to design models

Researchers are using AI to create more efficient models through a process known as “neural architecture search”.

Diagram depicting neural architecture search

Neural Architecture Search (NAS) seeks to identify the optimal architecture for a neural network to meet a particular requirement. In this case, finding a fast way to find tickets could save computing power and accelerate the training of AI models. NAS essentially automates the process of manual tuning of a neural network and learning what works best, allowing for the discovery of more complex architectures.

For example, Song Han and his team used this method to create models that can understand language and recognize scenes, such as a driverless car recognizing road signs and pedestrians.
In one study, they tested 1,000 different model designs for recognizing three-dimensional scenes, and eventually settled on a design that is three times faster and uses eight times less computer power compared to the previous best method.
In another study, they applied a similar process to find the most efficient model design for machine translation, which is the process of converting one language to another. By searching for the best design first and then training the model, they found that it leads to a significant reduction in the amount of computing power needed.

Lottery ticket hypothesis

The “lottery ticket hypothesis” suggests that every model has a small subnetwork that can be trained alone with fewer weights. An algorithm can help find these small subnetworks, known as “winning tickets”, in big and small models.

What is the Lottery Ticket hypothesis

Before training a neural network, parts of it will be pruned (removed), where you should get the same result as if the network was not pruned. This is done by using a process called Iterative Magnitude Pruning, which looks at the weights of the network and removes those that are not necessary. This makes the network smaller, faster, and uses fewer resources and would save lots of computing power as these tickets can be used again. The challenge is to find these tickets quickly. Right now, the whole over-parameterized network must be trained even if only a small part is needed.

This idea has been accepted, with 400 citations in the last two years. Some researchers have proven that you can find winning tickets in language and reinforcement learning models, and can be moved from one task to another. The researchers think that finding a fast way to find these tickets could save computing power and speed up the training of AI models.

“Once-For-All” system

Scientists created a new AI system called “Once-For-All (OFA)” to make training machine learning models more efficient. Instead of training one model for each device, the OFA system trains one “mother” network with smaller subnetworks. This mother network adapts to different devices, like small IoT devices or smartphones, by changing the size and structure of the subnetworks to fit the different power or speed limits.

Comparison between OFA and state-of-the-art CNN models on ImageNet

The OFA network is trained using a “progressive shrinking” algorithm. It starts big and then shrinks the network to include smaller subnetworks. This helps each subnetwork learn from the larger ones, and supports multiple devices with no extra cost.

In the end, the OFA network can have over 10 quintillion different settings, covering all the platforms needed. Training and searching it is much faster than training each network for each platform. It also gives state-of-the-art accuracy for image processing tasks, and is more accurate than existing industry-leading models.

Implications and Outlook

The implications of this research are clear: AI models can be made more efficient and produce fewer carbon emissions without sacrificing accuracy and performance. By employing methods such as carbon-aware training, cherry-picking relevant data, using AI to design models, and the lottery ticket hypothesis, AI models can be made more efficient while still producing the desired results.

It is also important to note that the researchers have achieved these results with existing technology. As innovation in the computer hardware field continues, it may be possible to further reduce AI’s carbon footprint. For example, with the introduction of more powerful GPUs and more efficient algorithms, AI models could be trained more quickly with less energy.

The research presented in this article is still in its early stages, and there are still many questions to be answered. For example:

What are the best practices for reducing the carbon emissions of AI models?
How can we optimize the training process to reduce energy requirements?
How will the introduction of new hardware affect the efficiency and performance of AI models?

The future of AI is likely to be one in which the energy and carbon footprints of AI models are minimized, while still achieving the desired results. It is up to us to ensure that this future is realized and that AI is used in a way that is beneficial to both the environment and society.

Accelerating the development of AGI

At the same time, research into AI should also focus on improving its overall efficiency, which could lead to the development of AGI (Artificial General Intelligence). Such an AI system would be able to complete complex tasks and learn new skills quickly, without the need for massive amounts of computing power and energy. This could be achieved through methods such as transfer learning and meta-learning, which allows a system to learn from previous tasks and transfer that knowledge to new tasks. Additionally, research into cognitive architectures, such as neuromorphic computing, could enable AI systems to learn more efficiently. Ultimately, by improving the efficiency of AI systems, we can reduce their carbon footprint and accelerate the development of AGI.

Conclusion

AI systems are increasingly complex, and their training and usage can generate large amounts of carbon emissions. There are a number of methods to reduce the carbon footprint of AI models, such as carbon-aware training, cherry-picking relevant data, using AI to design models, and the lottery ticket hypothesis. Additionally, research into AI should focus on improving its overall efficiency, which could lead to the development of AGI. By improving the efficiency of AI systems, we can reduce their carbon footprint and accelerate the development of AGI.