I Tried Federated Learning, and Why You Should Too!

Duong Mai

9/25/20252 min read

white concrete building
white concrete building

What is Federated Learning?

According to Google and the Finnish Center for Artificial Intelligence, Federated Learning (FL) is a decentralized machine learning technique that allows the training of local models without the need to exchange data. Instead of centralizing all sensitive data in one place for a model, each participant or node, such as a hospital, smartphone user, or data center, can train its own dataset to send the parameters to a global shared model.

What makes Federated Learning special?

Acquiring data is undeniably expensive, particularly personal and sensitive information. Traditional approaches often require data to be collected, stored, and processed centrally; however, this comes with some caveats:

  • Scalability issue: moving data across servers is costly and time-consuming

  • Regulatory barriers: centralizing and sharing data is more challenging due to data regulations such as GDPR

Federated Learning can help change the game by training local models and sharing only model updates or parameters, combining the power of collaboration with compliance with privacy.

My application of Federated Learning & findings summary

I applied this to a problem of predicting problematic internet use (PIU). Data used is from the Child Mind Institute, obtained from Kaggle*. This dataset is suitable for a federated learning approach as it represents personal data; however, due to limited label data measurement, I had to use K-Means clustering to simulate nodes as personal devices.

By establishing an FL network with full components running on DecisionTreeRegressor() locally and updating using FedRelax() and comparing the mean squared loss (MSE), the obtained findings showed an edge for the federated learning model. It reduced loss by 20% on the validation set, indicating a more robust model that allows individual model training and "message" exchanging.

The results demonstrated that a privacy-preserving model does not hurt accuracy, but it could also deliver competitive performance. The project, however, was experimented with rather on a conceptual scale that showcases valuable insights and potential application of federated learning in handling well-being data. With a small dataset, the model seemed to overfit, which can be alleviated with more data availability and computational capacity.

Read the full report:

* The data is part of the Child Mind Institute's competition that has been over before the project's starting date. The project is dedicated to educational and research purposes