Training a machine learning model to efficiently perform a task, such as image classification, involves showing the model thousands, millions, or even billions of example images. Collecting such huge datasets can be especially difficult when privacy is a concern, such as with medical images. Researchers at MIT and MIT-born startup DynamoFL have now taken a popular solution to this problem, known as federated learning, and made it faster and more accurate.
Federated learning is a collaborative method of training a machine learning model that keeps sensitive user data private. Hundreds or thousands of users each train their own model using their own data on their own device. Then users upload their models to a central server, which combines them to create a better model which it sends back to all users.
A collection of hospitals around the world, for example, could use this method to train a machine learning model that identifies brain tumors in medical images, while securing patient data on their local servers.
But federated learning has some drawbacks. Transferring a large machine learning model to and from a central server involves moving a lot of data, which has high communication costs, especially since the model has to be sent tens or even hundreds of times. Additionally, each user gathers their own data, so this data does not necessarily follow the same statistical patterns, which hampers the performance of the combined model. And this combined model is made by taking an average – it is not personalized for each user.
Researchers have developed a technique that can simultaneously address these three federated learning problems. Their method improves the accuracy of the combined machine learning model while dramatically reducing its size, which speeds up communication between users and the central server. It also ensures that each user receives a more personalized model for their environment, which improves performance.
The researchers were able to reduce the size of the model by almost an order of magnitude compared to other techniques, resulting in communication costs between four and six times lower for individual users. Their technique also increased the overall accuracy of the model by about 10%.
“Many articles have addressed one of the problems with federated learning, but the challenge was to put it all together. Algorithms that focus solely on personalization or communication efficiency do not provide a sufficient solution. We wanted to be sure that we could optimize for everything, so that this technique could be used in the real world,” says Vaikkunth Mugunthan PhD ’22, lead author of a paper that showcases this technique.
Mugunthan authored the paper with his adviser, lead author Lalana Kagal, a senior researcher at the Computer Science and Artificial Intelligence Laboratory (CSAIL). The work will be presented at the European Conference on Computer Vision.
Cut a template to size
The system the researchers developed, called FedLTN, is based on an idea in machine learning known as the lottery ticket hypothesis. This assumption says that in very large neural network models, there are much smaller subnets that can achieve the same performance. Finding one of these subnets is akin to finding a winning lottery ticket. (LTN stands for “Lottery Ticket Network”.)
Neural networks, loosely based on the human brain, are machine learning models that learn to solve problems using interconnected layers of nodes, or neurons.
Finding a winning lottery ticket network is more complicated than just scratching. Researchers should use a process called iterative pruning. If the accuracy of the model is above a set threshold, they remove the nodes and the connections between them (much like pruning the branches of a bush), then test the lightest neural network to see if the accuracy remains above the threshold.
Other methods have used this pruning technique for federated learning to create smaller machine learning models that could be transferred more efficiently. But while these methods can speed things up, model performance suffers.
Mugunthan and Kagal applied a few new techniques to speed up the pruning process while making the new, smaller models more accurate and personalized for each user.
They sped up the pruning by avoiding a step where the remaining parts of the pruned neural network are “rewound” to their original values. They also shaped the model before trimming it, which makes it more precise and can therefore be trimmed faster, says Mugunthan.
To make each model more personalized for the user’s environment, they were careful not to remove network layers that capture important statistical information about that user’s specific data. Moreover, when the models were all combined, they used information stored in the central server so that they did not start from scratch with each communication cycle.
They also developed a technique to reduce the number of communication cycles for users with resource-limited devices, such as a smartphone on a slow network. These users start the federated learning process with a lightweight model that has already been optimized by a subset of other users.
Win big with lottery ticket networks
When they put FedLTN to the test in simulations, it led to better performance and lower communication costs across the board. In one experiment, a traditional federated learning approach produced a model that was 45 megabytes in size, while their technique generated a model with the same precision that was only 5 megabytes. In another test, a state-of-the-art technique required 12,000 megabytes of communication between users and the server to train a model, while FedLTN required only 4,500 megabytes.
With FedLTN, the worst performing customers still saw their performance increase by more than 10%. And the model’s overall accuracy beat the industry-leading personalization algorithm by nearly 10 percent, Mugunthan adds.
Now that they have developed and refined FedLTN, Mugunthan is working to integrate the technique into a federated learning startup he recently founded, DynamoFL.
In the future, he hopes to continue improving this method. For example, researchers have had success using datasets that had labels, but a bigger challenge would be applying the same techniques to unlabeled data, he says.
Mugunthan hopes this work will inspire other researchers to rethink their approach to federated learning.
“This work shows the importance of thinking about these issues from a holistic perspective, not just individual metrics that need to be improved. Sometimes improving one metric can actually cause other metrics to deteriorate. Instead, we should focus on how we can improve a bunch of things together, which is really important if this is going to roll out to the real world,” he says.