Speakers
Speaker information will continue to be updated as we get closer to the workshop.
Adam Smith, Boston University
- Differentially Private Covariance-Adaptive Mean Estimation
- Adam Smith is a professor of computer science at Boston University. From 2007 to 2017, he served on the faculty of the Computer Science and Engineering Department at Penn State. His research interests lie in data privacy and cryptography, and their connections to machine learning, statistics, information theory, and quantum computing. He obtained his Ph.D. from MIT in 2004 and has held postdoc and visiting positions at the Weizmann Institute of Science, UCLA, Boston University and Harvard. He received a Presidential Early Career Award for Scientists and Engineers (PECASE) in 2009; a Theory of Cryptography Test of Time award in 2016; the Eurocrypt 2019 Test of Time award; and the 2017 Gödel Prize.
- Covariance-adaptive mean estimation is a fundamental problem in statistics, where we are given n i.i.d. samples from a d-dimensional distribution with mean $\mu$ and covariance $\Sigma$ and the goal is to find an estimator $\hat\mu$ with small error $\|\hat\mu-\mu\|_{\Sigma}\leq \alpha$, where $\|\cdot\|_{\Sigma}$ denotes the Mahalanobis distance. (We call this "covariance-adaptive" since the accuracy metric depends on the data distribution.)
It is known that the empirical mean of the dataset achieves this guarantee if we are given at least $n=\Omega(d/\alpha^2)$ samples. Unfortunately, the empirical mean and other statistical estimators can reveal sensitive information about the samples of the training dataset. To protect the privacy of the individuals who participate in the dataset, we study statistical estimators which satisfy differential privacy, a condition that has become a standard criterion for individual privacy in statistics and machine learning.
We present two new differentially private mean estimators for d-dimensional (sub)Gaussian distributions with unknown covariance whose sample complexity is optimal up to logarithmic factors and matches the non-private one in many parameter regimes. Previous estimators with the same guarantee either require strong a priori bounds on the covariance matrix or require $\Omega(d^{3/2})$ samples.
Based on the paper https://arxiv.org/pdf/2106.13329.pdf, which will appear as a spotlight paper at NeurIPS 2021 and is joint work with Gavin Brown, Marco Gaboardi, Jonathan Ullman, and Lydia Zakynthinou.
Martin Jaggi, EPFL- Federated Learning with Strange Gradients - Martin Jaggi is a Tenure Track Assistant Professor at EPFL, heading the Machine Learning and Optimization Laboratory. Before that, he was a postdoctoral researcher at ETH Zurich, at the Simons Institute in Berkeley, and at École Polytechnique in Paris. He earned his PhD in Machine Learning and Optimization from ETH Zurich in 2011, and a MSc in Mathematics also from ETH Zurich. - Collaborative learning methods such as federated learning are enabling many promising new applications for machine learning while respecting users' privacy. In this talk, we discuss recent gradient-based methods, specifically in cases when the exchanged gradients are violating the common unbiasedness assumption, and are actually different from those of our target objective. We address the three applications of 1) federated learning in the realistic setting of heterogeneous data, 2) personalization of collaboratively learned models to each participant, 3) learning with malicious or unreliable participants, in the sense of Byzantine robust training. For those applications, we demonstrate that algorithms with rigorous convergence guarantees can still be obtained and are practically feasible.
Mosharaf Chowdhury, University of Michigan- Systems Support for Federated Computation
- Mosharaf Chowdhury is a Morris Wellman assistant professor of CSE at the University of Michigan, Ann Arbor, where he leads the SymbioticLab on application-infrastructure co-design for federated learning, resource disaggregation, and systems for AI and Big Data. In the past, Mosharaf invented coflows and was a co-creator of Apache Spark. Artifacts from his research are widely used in cloud datacenters. He has received many individual honors and awards as well as best-of-conference awards thanks to his amazing students and collaborators. He received his Ph.D. from the AMPLab at UC Berkeley in 2015.
- Although theoretical federated learning research is growing exponentially, we are far from putting those theories into practice. In this talk, I will share our ventures into building practical systems for two extremities of federated learning and analytics. Sol is a cross-silo federated computation system that tackles network latency and bandwidth challenges faced by distributed computation between far-apart data sites. Oort, in contrast, is a cross-device federated learning system that enables training and testing on representative data distributions despite unpredictable device availability. Both deal with systems and network characteristics in the wild that are hard to account for in analytical models. I'll then share the challenges in systematically evaluating federated learning systems that have led to a disconnect between theoretical conclusions and performance in the wild. I'll conclude this talk by introducing FedScale, which is an extensible framework for evaluation and benchmarking in realistic settings to democratize practical federated learning for researchers and practitioners alike. All these systems are open-source and available at https://github.com/symbioticlab.
Rachel Cummings, Columbia University- Mean Estimation with User-level Privacy under Data Heterogeneity (joint work with Vitaly Feldman, Audra McMillan, and Kunal Talwar)
- Dr. Rachel Cummings is an Assistant Professor of Industrial Engineering and Operations Research at Columbia University. Before joining Columbia, she was an Assistant Professor of Industrial and Systems Engineering and (by courtesy) Computer Science at the Georgia Institute of Technology. Her research interests lie primarily in data privacy, with connections to machine learning, algorithmic economics, optimization, statistics, and public policy. Her work has focused on problems such as strategic aspects of data generation, incentivizing truthful reporting of data, privacy-preserving algorithm design, impacts of privacy policy, and human decision-making. Dr. Cummings received her Ph.D. in Computing and Mathematical Sciences from the California Institute of Technology, her M.S. in Computer Science from Northwestern University, and her B.A. in Mathematics and Economics from the University of Southern California. She is the recipient of an NSF CAREER award, a DARPA Young Faculty Award, an Apple Privacy-Preserving Machine Learning Award, JP Morgan Chase Faculty Award, a Google Research Fellowship for the Simons Institute program on Data Privacy, a Mozilla Research Grant, the ACM SIGecom Doctoral Dissertation Honorable Mention, the Amori Doctoral Prize in Computing and Mathematical Sciences, a Caltech Leadership Award, a Simons Award for Graduate Students in Theoretical Computer Science, and the Best Paper Award at the 2014 International Symposium on Distributed Computing. Dr. Cummings also serves on the ACM U.S. Public Policy Council's Privacy Committee and the Future of Privacy Forum's Advisory Board.
- A key challenge for data analysis in the federated setting is that user data is heterogeneous, i.e., it cannot be assumed to be sampled from the same distribution. Further, in practice, different users may possess vastly different number of samples. In this work we propose a simple model of heterogeneous user data that differs in both distribution and quantity of data, and we provide a method for estimating the population-level mean while preserving user-level differential privacy. We demonstrate asymptotic optimality of our estimator within a natural class of private estimators and also prove general lower bounds on the error achievable in our problem. We will conclude with a discussion of future challenges and possible extensions for learning from heterogeneous populations in the federated setting.
- Differentially Private Covariance-Adaptive Mean Estimation
- Adam Smith is a professor of computer science at Boston University. From 2007 to 2017, he served on the faculty of the Computer Science and Engineering Department at Penn State. His research interests lie in data privacy and cryptography, and their connections to machine learning, statistics, information theory, and quantum computing. He obtained his Ph.D. from MIT in 2004 and has held postdoc and visiting positions at the Weizmann Institute of Science, UCLA, Boston University and Harvard. He received a Presidential Early Career Award for Scientists and Engineers (PECASE) in 2009; a Theory of Cryptography Test of Time award in 2016; the Eurocrypt 2019 Test of Time award; and the 2017 Gödel Prize.
- Covariance-adaptive mean estimation is a fundamental problem in statistics, where we are given n i.i.d. samples from a d-dimensional distribution with mean $\mu$ and covariance $\Sigma$ and the goal is to find an estimator $\hat\mu$ with small error $\|\hat\mu-\mu\|_{\Sigma}\leq \alpha$, where $\|\cdot\|_{\Sigma}$ denotes the Mahalanobis distance. (We call this "covariance-adaptive" since the accuracy metric depends on the data distribution.)
It is known that the empirical mean of the dataset achieves this guarantee if we are given at least $n=\Omega(d/\alpha^2)$ samples. Unfortunately, the empirical mean and other statistical estimators can reveal sensitive information about the samples of the training dataset. To protect the privacy of the individuals who participate in the dataset, we study statistical estimators which satisfy differential privacy, a condition that has become a standard criterion for individual privacy in statistics and machine learning.
We present two new differentially private mean estimators for d-dimensional (sub)Gaussian distributions with unknown covariance whose sample complexity is optimal up to logarithmic factors and matches the non-private one in many parameter regimes. Previous estimators with the same guarantee either require strong a priori bounds on the covariance matrix or require $\Omega(d^{3/2})$ samples.
Based on the paper https://arxiv.org/pdf/2106.13329.pdf, which will appear as a spotlight paper at NeurIPS 2021 and is joint work with Gavin Brown, Marco Gaboardi, Jonathan Ullman, and Lydia Zakynthinou.
Martin Jaggi, EPFL- Federated Learning with Strange Gradients - Martin Jaggi is a Tenure Track Assistant Professor at EPFL, heading the Machine Learning and Optimization Laboratory. Before that, he was a postdoctoral researcher at ETH Zurich, at the Simons Institute in Berkeley, and at École Polytechnique in Paris. He earned his PhD in Machine Learning and Optimization from ETH Zurich in 2011, and a MSc in Mathematics also from ETH Zurich. - Collaborative learning methods such as federated learning are enabling many promising new applications for machine learning while respecting users' privacy. In this talk, we discuss recent gradient-based methods, specifically in cases when the exchanged gradients are violating the common unbiasedness assumption, and are actually different from those of our target objective. We address the three applications of 1) federated learning in the realistic setting of heterogeneous data, 2) personalization of collaboratively learned models to each participant, 3) learning with malicious or unreliable participants, in the sense of Byzantine robust training. For those applications, we demonstrate that algorithms with rigorous convergence guarantees can still be obtained and are practically feasible.
Mosharaf Chowdhury, University of Michigan- Systems Support for Federated Computation
- Mosharaf Chowdhury is a Morris Wellman assistant professor of CSE at the University of Michigan, Ann Arbor, where he leads the SymbioticLab on application-infrastructure co-design for federated learning, resource disaggregation, and systems for AI and Big Data. In the past, Mosharaf invented coflows and was a co-creator of Apache Spark. Artifacts from his research are widely used in cloud datacenters. He has received many individual honors and awards as well as best-of-conference awards thanks to his amazing students and collaborators. He received his Ph.D. from the AMPLab at UC Berkeley in 2015.
- Although theoretical federated learning research is growing exponentially, we are far from putting those theories into practice. In this talk, I will share our ventures into building practical systems for two extremities of federated learning and analytics. Sol is a cross-silo federated computation system that tackles network latency and bandwidth challenges faced by distributed computation between far-apart data sites. Oort, in contrast, is a cross-device federated learning system that enables training and testing on representative data distributions despite unpredictable device availability. Both deal with systems and network characteristics in the wild that are hard to account for in analytical models. I'll then share the challenges in systematically evaluating federated learning systems that have led to a disconnect between theoretical conclusions and performance in the wild. I'll conclude this talk by introducing FedScale, which is an extensible framework for evaluation and benchmarking in realistic settings to democratize practical federated learning for researchers and practitioners alike. All these systems are open-source and available at https://github.com/symbioticlab.
Rachel Cummings, Columbia University- Mean Estimation with User-level Privacy under Data Heterogeneity (joint work with Vitaly Feldman, Audra McMillan, and Kunal Talwar)
- Dr. Rachel Cummings is an Assistant Professor of Industrial Engineering and Operations Research at Columbia University. Before joining Columbia, she was an Assistant Professor of Industrial and Systems Engineering and (by courtesy) Computer Science at the Georgia Institute of Technology. Her research interests lie primarily in data privacy, with connections to machine learning, algorithmic economics, optimization, statistics, and public policy. Her work has focused on problems such as strategic aspects of data generation, incentivizing truthful reporting of data, privacy-preserving algorithm design, impacts of privacy policy, and human decision-making. Dr. Cummings received her Ph.D. in Computing and Mathematical Sciences from the California Institute of Technology, her M.S. in Computer Science from Northwestern University, and her B.A. in Mathematics and Economics from the University of Southern California. She is the recipient of an NSF CAREER award, a DARPA Young Faculty Award, an Apple Privacy-Preserving Machine Learning Award, JP Morgan Chase Faculty Award, a Google Research Fellowship for the Simons Institute program on Data Privacy, a Mozilla Research Grant, the ACM SIGecom Doctoral Dissertation Honorable Mention, the Amori Doctoral Prize in Computing and Mathematical Sciences, a Caltech Leadership Award, a Simons Award for Graduate Students in Theoretical Computer Science, and the Best Paper Award at the 2014 International Symposium on Distributed Computing. Dr. Cummings also serves on the ACM U.S. Public Policy Council's Privacy Committee and the Future of Privacy Forum's Advisory Board.
- A key challenge for data analysis in the federated setting is that user data is heterogeneous, i.e., it cannot be assumed to be sampled from the same distribution. Further, in practice, different users may possess vastly different number of samples. In this work we propose a simple model of heterogeneous user data that differs in both distribution and quantity of data, and we provide a method for estimating the population-level mean while preserving user-level differential privacy. We demonstrate asymptotic optimality of our estimator within a natural class of private estimators and also prove general lower bounds on the error achievable in our problem. We will conclude with a discussion of future challenges and possible extensions for learning from heterogeneous populations in the federated setting.
Brendan McMahan, Google
- Brendan McMahan has worked in the fields of online learning, large-scale convex optimization, and reinforcement learning. He received his Ph.D. in computer science from Carnegie Mellon University. Brendan is currently a researcher at Google, focusing on decentralized and privacy-preserving machine learning. Brendan's team pioneered the concept of federated learning and continues to push the boundaries of what is possible when working with decentralized data using privacy-preserving techniques.
Daniel Ramage, Google- Daniel Ramage has worked in the fields of natural language processing, machine intelligence, and mobile systems. He received his Ph.D. from Stanford University. Daniel is currently a researcher at Google, focusing on decentralized and privacy-preserving machine learning. Daniel's team pioneered the concept of federated learning and continues to push the boundaries of what is possible when working with decentralized data using privacy-preserving techniques.
Francoise Beaufays, Google- Françoise Beaufays is a Distinguished Scientist at Google, where she leads a team of engineers and researchers working on speech recognition and mobile keyboard input. Her area of expertise covers deep learning, language modeling and other technologies related to natural language processing, with a recent focus on privacy-preserving on-device learning. Françoise studied Mechanical and Electrical Engineering in Brussels, Belgium. She holds a PhD in Electrical Engineering and a PhD minor in Italian Literature, both from Stanford University.
Hubert Eichner, Google- Hubert Eichner received his PhD from Max-Planck Institute of Neurobiology in Theoretical Neuroscience then joined Microsoft to work on embedded speech recognition. He is currently serving as overall tech lead for Google's production Federated Learning platform.
Kallista Bonawitz, Google- Kallista Bonawitz previously led the planning, simulation, and control team for Project Loon at Alphabet’s X and co-founded Navia Systems (a probabilistic computing startup later acquired by Salesforce as Prior Knowledge). She received her Ph.D. in computer science from the Massachusetts Institute of Technology. Kallista is currently a researcher at Google, focusing on decentralized and privacy-preserving machine learning. Kallista's team pioneered the concept of federated learning and continues to push the boundaries of what is possible when working with decentralized data using privacy-preserving techniques.
Ravi Kumar, Google- Ravi Kumar has been a research scientist at Google since 2012. Prior to this, he was at the IBM Almaden Research Center and at Yahoo! Research. His interests include algorithms for massive data, privacy, and the theory of computation.
- Brendan McMahan has worked in the fields of online learning, large-scale convex optimization, and reinforcement learning. He received his Ph.D. in computer science from Carnegie Mellon University. Brendan is currently a researcher at Google, focusing on decentralized and privacy-preserving machine learning. Brendan's team pioneered the concept of federated learning and continues to push the boundaries of what is possible when working with decentralized data using privacy-preserving techniques.
Daniel Ramage, Google- Daniel Ramage has worked in the fields of natural language processing, machine intelligence, and mobile systems. He received his Ph.D. from Stanford University. Daniel is currently a researcher at Google, focusing on decentralized and privacy-preserving machine learning. Daniel's team pioneered the concept of federated learning and continues to push the boundaries of what is possible when working with decentralized data using privacy-preserving techniques.
Francoise Beaufays, Google- Françoise Beaufays is a Distinguished Scientist at Google, where she leads a team of engineers and researchers working on speech recognition and mobile keyboard input. Her area of expertise covers deep learning, language modeling and other technologies related to natural language processing, with a recent focus on privacy-preserving on-device learning. Françoise studied Mechanical and Electrical Engineering in Brussels, Belgium. She holds a PhD in Electrical Engineering and a PhD minor in Italian Literature, both from Stanford University.
Hubert Eichner, Google- Hubert Eichner received his PhD from Max-Planck Institute of Neurobiology in Theoretical Neuroscience then joined Microsoft to work on embedded speech recognition. He is currently serving as overall tech lead for Google's production Federated Learning platform.
Kallista Bonawitz, Google- Kallista Bonawitz previously led the planning, simulation, and control team for Project Loon at Alphabet’s X and co-founded Navia Systems (a probabilistic computing startup later acquired by Salesforce as Prior Knowledge). She received her Ph.D. in computer science from the Massachusetts Institute of Technology. Kallista is currently a researcher at Google, focusing on decentralized and privacy-preserving machine learning. Kallista's team pioneered the concept of federated learning and continues to push the boundaries of what is possible when working with decentralized data using privacy-preserving techniques.
Ravi Kumar, Google- Ravi Kumar has been a research scientist at Google since 2012. Prior to this, he was at the IBM Almaden Research Center and at Yahoo! Research. His interests include algorithms for massive data, privacy, and the theory of computation.
Moderators:Adrian Gascon (Privacy & Security)Shanshan Wu (Federated Optimization & Analytics)
Albert Cheu, Georgetown University
- Shuffle Private Vector Summation - Albert Cheu earned his PhD. at Northeastern University and currently works as a postdoctoral fellow at Georgetown University. He is interested in distributed models of differential privacy. - In this talk, I will present the building block of "Shuffle Private Stochastic Convex Optimization" (joint work with Joseph, Mao, and Peng). Each party has a d-dimensional value and the goal is to privately estimate their mean, minimizing error in L2 norm. We first describe a scalar sum protocol whose privacy guarantee strengthens when given less-sensitive inputs. Then we use a generalization of the advanced composition theorem to account for privacy leakage across d executions of the scalar sum protocol. The asymptotic error of the final vector sum protocol is close to that of the Gaussian mechanism.
Amir Houmansadr, UMass Amherst
- A Critical Evaluation of Poisoning Attacks on Federated Learning - Amir Houmansadr is an associate professor of computer science at UMass Amherst. He received his Ph.D. from the University of Illinois at Urbana-Champaign in 2012, and spent two years at the University of Texas at Austin as a postdoctoral scholar. Amir is broadly interested in the security and privacy of networked systems. To that end, he designs and deploys privacy-enhancing technologies, analyzes network protocols and services (e.g., messaging apps and machine learning APIs) for privacy leakage, and performs theoretical analysis to derive bounds on privacy (e.g., using game theory and information theory). Amir has received several awards including an NSF CAREER Award in 2016, a Google Faculty Research Award in 2015, and the 2013 IEEE S&P Best Practical Paper Award. - Federated learning (FL) is increasingly adopted by various distributed platforms, in particular Google's Gboard and Apple's Siri use FL to train next word prediction models, and WeBank uses FL for credit risk predictions. A key feature that makes FL highly attractive in practice is that it allows training models in collaboration among mutually untrusted clients, e.g., Android users or competing banks. Unfortunately, this makes FL susceptible to a threat known as poisoning: a small fraction of (malicious) FL clients, who are either owned or controlled by an adversary, may act maliciously during the FL training process in order to corrupt the jointly trained global model. In this talk, I will take a critical look at the existing literature on (mainly, untargeted) poisoning attacks under practical production FL environments, by carefully characterizing the set of realistic threat models and adversarial capabilities. I will discuss some rather surprising findings: contrary to the established belief, we show that FL, even without any defenses, is highly robust in practice. I will conclude with several recommendations to the community on the future of research on FL poisoning.
Andrew Hard, Google- Mixing Federated and Centralized Training - I've worked with FL for the past 4 years at Google, first as part of the Gboard team and now with the Federated Assistant team. Prior to joining Google, I earned a PhD in high-energy particle physics while working at CERN on the discovery of the Higgs boson and searches for dark matter. - Standalone Federated Learning is an incredibly powerful tool for learning models on privacy-sensitive, distributed datasets. However, there are many applications in which important inference data domains are missing from the training data cached on federated clients. In such cases, federated learning must be supplemented with additional sources of information, including centrally-trained models and server-hosted datasets. In this talk, we present multiple approaches to the problem of mixing centralized and federated training. Experimental results are provided for both simulation and production FL settings.
Ayfer Ozgur, Stanford University- From worst-case to pointwise bounds for distributed estimation under communication constraints - Ayfer Ozgur is an Associate Professor in the Electrical Engineering Department at Stanford University where she is the Chambers Faculty Scholar in the School of Engineering. Her interests lie in information theory, wireless communication, statistics, and machine learning. Dr. Ozgur received the EPFL Best Ph.D. Thesis Award in 2010, the NSF CAREER award in 2013, the Okawa Foundation Research Grant, Faculty Research Awards from Google and Facebook, the IEEE Communication Theory Technical Committee (CTTC) Early Achievement Award in 2018 and was selected as the inaugural Goldsmith Lecturer of the IEEE ITSoc in 2020. - We consider the problem of estimating a d-dimensional discrete distribution from its samples observed under a b-bit communication constraint. In contrast to previous results that largely focus on the global minimax error, we study the local behavior of the estimation error. We develop optimal schemes that adopt to the difficulty of the underlying problem and provide pointwise bounds that depend on the target distribution p. Our results show that the correct measure of the local communication complexity at p is given by its Rényi entropy.
Bo Li, University of Illinois at Urbana-Champaign- Certifiably Robust Federated Learning against Poisoning Attacks - Dr. Bo Li is an assistant professor in the department of Computer Science at University of Illinois at Urbana–Champaign, and the recipient of the Symantec Research Labs Fellowship, Rising Stars, MIT Technology Review TR-35 award, Intel Rising Star award, NSF CAREER Award, Research Awards from Tech companies such as Amazon, Facebook, Google, and IBM, and best paper awards in several machine learning and security conferences. Her research focuses on both theoretical and practical aspects of trustworthy machine learning, security, machine learning, privacy, and game theory. She has designed several scalable frameworks for robust machine learning and privacy preserving data publishing systems. Her work have been featured by major publications and media outlets such as Nature, Wired, Fortune, and New York Times. - Advances in machine learning have led to rapid and widespread deployment of learning based inference and decision making for safety-critical applications, such as autonomous driving and security diagnostics. Current machine learning systems, however, assume that training and test data follow the same, or similar, distributions, and do not consider active adversaries manipulating either distribution. Recent work has demonstrated that motivated adversaries can circumvent anomaly detection or other machine learning models at test time through evasion attacks, or can inject well-crafted malicious instances into training data to induce errors in inference time through poisoning attacks. In this talk, I will describe my recent research about security and privacy problems in federated learning systems, and provide corresponding guarantees.
Borja Balle, DeepMind- Reconstructing Training Data with Informed Adversaries - Borja Balle is a research scientist at DeepMind working on privacy-preserving ML and the foundations of privacy-preserving data analysis. - Given access to a machine learning model, can an adversary reconstruct the model's training data? This work proposes a formal threat model to study this question, shows that reconstruction attacks are feasible in theory and in practice, and presents preliminary results assessing how different factors of standard machine learning pipelines affect the success of reconstruction. Finally, we empirically evaluate what levels of differential privacy suffice to prevent these reconstruction attacks.
Gauri Joshi, Carnegie Mellon University- Leveraging Spatial and Temporal Correlations in Sparsified Mean Estimation - Gauri Joshi is an assistant professor in the ECE department at Carnegie Mellon University . Gauri received her Ph.D. from MIT EECS in 2016, and her B.Tech and M.Tech in Electrical Engineering from the Indian Institute of Technology (IIT) Bombay in 2010. Her awards and honors include the NSF CAREER Award (2021), ACM SIGMETRICS Best Paper Award (2020), Best Thesis Prize in Computer science at MIT (2012), and Institute Gold Medal of IIT Bombay (2010). - We study the problem of estimating at a central server the mean of a set of vectors distributed across several nodes (one vector per node). When the vectors are high-dimensional, the communication cost of sending entire vectors may be prohibitive, and it may be imperative for them to use sparsification techniques. While most existing work on sparsified mean estimation is agnostic to the characteristics of the data vectors, in many practical applications such as federated learning, there may be spatial correlations (similarities in the vectors sent by different nodes) or temporal correlations (similarities in the data sent by a single node over different iterations of the algorithm) in the data vectors. We leverage these correlations by simply modifying the decoding method used by the server to estimate the mean. We provide an analysis of the resulting estimation error as well as experiments for PCA, K-Means and Logistic Regression, which show that our estimators consistently outperform more sophisticated and expensive sparsification methods.
Hamed Haddadi, Imperial College London- PPFL: Privacy-preserving Federated Learning with Trusted Execution Environments -Hamed is a Reader in Human-Centred Systems and the Director of Postgraduate Studies at the Dyson School of Design Engineering at The Faculty of Engineering, Imperial College London. In his industrial role, he is a Visiting Professor at Brave Software where he works on developing privacy-preserving analytics protocols. He enjoys designing and building systems that enable better use of our digital footprint, while respecting users' privacy. - We propose and implement a Privacy-preserving Federated Learning (PPFL) framework for mobile systems to limit privacy leakages in federated learning. Leveraging the widespread presence of Trusted Execution Environments (TEEs) in high-end and mobile devices, we utilize TEEs on clients for local training, and on servers for secure aggregation, so that model/gradient updates are hidden from adversaries. Challenged by the limited memory size of current TEEs, we leverage greedy layer-wise training to train each model's layer inside the trusted area until its convergence. The performance evaluation of our implementation shows that PPFL can significantly improve privacy while incurring small system overheads at the client-side. In particular, PPFL can successfully defend the trained model against data reconstruction, property inference, and membership inference attacks. Furthermore, it can achieve comparable model utility with fewer communication rounds (0.54×) and a similar amount of network traffic (1.002×) compared to the standard federated learning of a complete model. This is achieved while only introducing up to ~15% CPU time, ~18% memory usage, and ~21% energy consumption overhead in PPFL's client-side.
Nicolas D. Lane, University of Cambridge- Scaling and Accelerating Federated Learning Research with Flower - Nic Lane (http://niclane.org) is an Associate Professor in the department of Computer Science and Technology at the University of Cambridge where he leads the Machine Learning Systems Lab (CaMLSys -- http://http://mlsys.cst.cam.ac.uk/). Alongside his academic role, he is also a Director (On-Device and Distributed Machine Learning) at the Samsung AI Center in Cambridge. - Despite the rapid progress made in federated learning (FL) in recent years, it still remains far too difficult to evaluate FL algorithms under a full range of realistic system constraints (viz. compute, memory, energy, wired/wireless networking) and scale (thousands of federated devices and larger). As a consequence, our understanding of how these factors influence FL performance and should shape the future evolution of FL algorithms remains in a very underdeveloped state. In this talk, I will describe how we have begun to address this situation through recent new features of the Flower open-source framework (http://flower.dev). Not only does Flower make it relatively simple to measure the impact of common real-world FL situations (e.g., compute and memory heterogeneity in clients) it also now is possible to scale experiments to non-trivial client numbers using only a handful of desktop GPUs. I will highlight early empirical observations, made using Flower, as to what the implications are for existing algorithms under the types of heterogeneous large-scale FL systems we anticipate will increasingly appear.
Peter Richtarik, King Abdullah University of Science and Technology- EF21: A new, simpler, theoretically better, and practically faster error feedback -Peter Richtarik is a professor of Computer Science at the King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia, where he leads the Optimization and Machine Learning Lab. At KAUST, he has a courtesy affiliation with the Applied Mathematics and Computational Sciences program and the Statistics program, and is a member of the Visual Computing Center, and the Extreme Computing Research Center. Prof Richtarik is a founding member and a Fellow of the Alan Turing Institute (UK National Institute for Data Science and Artificial Intelligence), and an EPSRC Fellow in Mathematical Sciences. During 2017-2019, he was a Visiting Professor at the Moscow Institute of Physics and Technology. Prior to joining KAUST, he was an Associate Professor of Mathematics at the University of Edinburgh, and held postdoctoral and visiting positions at Université Catholique de Louvain, Belgium, and University of California, Berkeley, USA, respectively. He received his PhD in 2007 from Cornell University, USA. Prof Richtarik’s research interests lie at the intersection of mathematics, computer science, machine learning, optimization, numerical linear algebra, and high-performance computing. Through his work on randomized and distributed optimization algorithms, he has contributed to the foundations of machine learning, optimization and randomized numerical linear algebra. He is one of the original developers of Federated Learning – a new subfield of artificial intelligence whose goal is to train machine learning models over private data stored across a large number of heterogeneous devices, such as mobile phones or hospitals, in an efficient manner, and without compromising user privacy. In an October 2020 Forbes article, and alongside self-supervised learning and transformers, Federated Learning was listed as one of three emerging areas that will shape the next generation of Artificial Intelligence technologies. Prof Richtárik’s works attracted international awards, including a Best Paper Award at the NeurIPS 2020 Workshop on Scalability, Privacy, and Security in Federated Learning (joint with S. Horvath), Distinguished Speaker Award at the 2019 International Conference on Continuous Optimization, SIAM SIGEST Best Paper Award (joint with O. Fercoq), and the IMA Leslie Fox Prize (second prize, three times, awarded to two of his students and a postdoc). Several of his works are among the most read papers published by the SIAM Journal on Optimization and the SIAM Journal on Matrix Analysis and Applications. Prof Richtarik serves as an Area Chair for leading machine learning conferences, including NeurIPS, ICML and ICLR, and is an Area Editor of Journal of Optimization Theory and Applications, Associate Editor of Optimization Methods and Software, and a Handling Editor of the Journal of Nonsmooth Analysis and Optimization. - Error feedback (EF), also known as error compensation, is an immensely popular convergence stabilization mechanism in the context of distributed training of supervised machine learning models enhanced by the use of contractive communication compression mechanisms, such as Top-k. First proposed by Seide et al (2014) as a heuristic, EF resisted any theoretical understanding until recently [Stich et al., 2018, Alistarh et al., 2018]. However, all existing analyses either i) apply to the single node setting only, ii) rely on very strong and often unreasonable assumptions, such global boundedness of the gradients, or iterate-dependent assumptions that cannot be checked a-priori and may not hold in practice, or iii) circumvent these issues via the introduction of additional unbiased compressors, which increase the communication cost. In this work we fix all these deficiencies by proposing and analyzing a new EF mechanism, which we call EF21, which consistently and substantially outperforms EF in practice. Our theoretical analysis relies on standard assumptions only, works in the distributed heterogeneous data setting, and leads to better and more meaningful rates. In particular, we prove that EF21 enjoys a fast O(1/T) convergence rate for smooth nonconvex problems, beating the previous bound of O(1/T^{2/3}), which was shown a bounded gradients assumption. We further improve this to a fast linear rate for PL functions, which is the first linear convergence result for an EF-type method not relying on unbiased compressors. Since EF has a large number of applications where it reigns supreme, we believe that our 2021 variant, EF21, can a large impact on the practice of communication efficient distributed learning. This talk is based on the paper "EF21: A New, Simpler, Theoretically Better, and Practically Faster Error Feedback" - joint work with Igor Sokolov (KAUST) and Ilyas Fatkhullin (KAUST & TUM), NeurIPS 2021 (oral) Time permitting, I may also briefly outline the follow-up work "EF21 with Bells & Whistles: Practical Algorithmic Extensions of Modern Error Feedback" - joint work with Ilyas Fatkhullin, Igor Sokolov, Eduard Gorbunov and Zhize Li, https://arxiv.org/abs/2110.03294 (2021)
Phillip Gibbons, Carnegie Mellon University-Federated Learning under Distributed Concept Drift Suhas Diggavi, UCLA- Privacy-performance trade-offs in the Shuffled Model of Federated Learning - Suhas Diggavi is currently a Professor of Electrical and Computer Engineering at UCLA. His undergraduate education is from IIT, Delhi and his PhD is from Stanford University. He has worked as a principal member research staff at AT&T Shannon Laboratories and directed the Laboratory for Information and Communication Systems (LICOS) at EPFL. At UCLA, he directs the Information Theory and Systems Laboratory. His research interests include information theory and its applications to several area including learning, security and privacy, data compression, wireless networks, cyber-physical systems, genomics and neuroscience; more information can be found at http://licos.ee.ucla.edu. He has received several recognitions for his research from IEEE and ACM including the 2013 IEEE Information Theory Society & Communications Society Joint Paper Award, the 2006 IEEE Donald Fink prize paper award, the 2019 Google Faculty Research Award, the 2020 Amazon faculty research award, and is a Fellow of the IEEE. He was selected as a Guggenheim fellow in 2021. He has also organized several IEEE and ACM conferences. - In this talk we will briefly describe some of our recent work on trade-offs between privacy and learning performance for federated learning in the context of the shuffled privacy models. The challenges include accounting for (client) sampling, obtaining better compositional bounds (using Renyi DP) as well as communication efficiency. We will briefly present our theoretical results along with numerics. This work has/will appear in AISTATS 2021, ACM CCS 2021, NeurIPS 2021.
Walid Saad, Virginia Tech- Distributed Learning and Wireless Networks: A Closer Union - Walid Saad received his Ph.D degree from the University of Oslo in 2010. Currently, he is a Professor at the Department of Electrical and Computer Engineering at Virginia Tech where he leads the Network sciEnce, Wireless, and Security (NEWS) laboratory. His research interests are at the intersection of wireless networks, machine learning, and game theory. Dr. Saad was the author/co-author of ten conference best paper awards, of the 2015 IEEE ComSoc Fred W. Ellersick Prize, and of the 2019 and 2021 IEEE ComSoc Young Author Best Paper awards. He is also a Fellow of the IEEE. - In this talk, we provide an overview on research at the intersection of distributed (federated) learning and wireless networks. In particular, we focus on two areas: a) The at-scale deployment of distributed learning solutions over real-world wireless networks such as 5G and 6G systems, and b) the use of distributed learning to design self-organizing and autonomous wireless systems. For each area, we discuss a select research problem, and we articulate some of the key future challenges. We conclude the talk with some perspectives on this closer union between learning and networking.
Zach Charles, Google- On Large-Cohort Training for Federated Learning - Zachary Charles is a research scientist at Google, working on the theory and practice of federated optimization. Before joining Google, he received a Ph.D. from the University of Wisconsin-Madison in applied mathematics. - In this talk, we explore how the number of clients sampled at each round (the cohort size) impacts the quality of the learned model and the training dynamics of federated learning algorithms. We pose three fundamental questions. First, what challenges arise when trying to scale federated learning to larger cohorts? Second, what parallels exist between cohort sizes in federated learning and batch sizes in centralized learning? Last, how can we design federated learning methods that effectively utilize larger cohort sizes? We give partial answers to these questions based on extensive empirical evaluation. Our work highlights a number of challenges stemming from the use of larger cohorts. While some of these (such as generalization issues and diminishing returns) are analogs of large-batch training challenges, others (including training failures and fairness concerns) are unique to federated learning.
Zheng Xu, Google- Practical and Private Federated Learning without Sampling or Shuffling - Zheng Xu is a research scientist working on federated learning at Google. He got his Ph.D. in optimization and machine learning from University of Maryland, College Park. Before that, he got his master's and bachelor's degree from University of Science and Technology of China. - We consider training models with differential privacy (DP) using mini-batch gradients. The existing state-of-the-art, Differentially Private Stochastic Gradient Descent (DP-SGD), requires privacy amplification by sampling or shuffling to obtain the best privacy/accuracy/computation trade-offs. Unfortunately, the precise requirements on exact sampling and shuffling can be hard to obtain in important practical scenarios, particularly federated learning (FL). We design and analyze a DP variant of Follow-The-Regularized-Leader (DP-FTRL) that compares favorably (both theoretically and empirically) to amplified DP-SGD, while allowing for much more flexible data access patterns. DP-FTRL does not use any form of privacy amplification
Albert Cheu, Georgetown University
- Shuffle Private Vector Summation - Albert Cheu earned his PhD. at Northeastern University and currently works as a postdoctoral fellow at Georgetown University. He is interested in distributed models of differential privacy. - In this talk, I will present the building block of "Shuffle Private Stochastic Convex Optimization" (joint work with Joseph, Mao, and Peng). Each party has a d-dimensional value and the goal is to privately estimate their mean, minimizing error in L2 norm. We first describe a scalar sum protocol whose privacy guarantee strengthens when given less-sensitive inputs. Then we use a generalization of the advanced composition theorem to account for privacy leakage across d executions of the scalar sum protocol. The asymptotic error of the final vector sum protocol is close to that of the Gaussian mechanism.
Amir Houmansadr, UMass Amherst
- A Critical Evaluation of Poisoning Attacks on Federated Learning - Amir Houmansadr is an associate professor of computer science at UMass Amherst. He received his Ph.D. from the University of Illinois at Urbana-Champaign in 2012, and spent two years at the University of Texas at Austin as a postdoctoral scholar. Amir is broadly interested in the security and privacy of networked systems. To that end, he designs and deploys privacy-enhancing technologies, analyzes network protocols and services (e.g., messaging apps and machine learning APIs) for privacy leakage, and performs theoretical analysis to derive bounds on privacy (e.g., using game theory and information theory). Amir has received several awards including an NSF CAREER Award in 2016, a Google Faculty Research Award in 2015, and the 2013 IEEE S&P Best Practical Paper Award. - Federated learning (FL) is increasingly adopted by various distributed platforms, in particular Google's Gboard and Apple's Siri use FL to train next word prediction models, and WeBank uses FL for credit risk predictions. A key feature that makes FL highly attractive in practice is that it allows training models in collaboration among mutually untrusted clients, e.g., Android users or competing banks. Unfortunately, this makes FL susceptible to a threat known as poisoning: a small fraction of (malicious) FL clients, who are either owned or controlled by an adversary, may act maliciously during the FL training process in order to corrupt the jointly trained global model. In this talk, I will take a critical look at the existing literature on (mainly, untargeted) poisoning attacks under practical production FL environments, by carefully characterizing the set of realistic threat models and adversarial capabilities. I will discuss some rather surprising findings: contrary to the established belief, we show that FL, even without any defenses, is highly robust in practice. I will conclude with several recommendations to the community on the future of research on FL poisoning.
Andrew Hard, Google- Mixing Federated and Centralized Training - I've worked with FL for the past 4 years at Google, first as part of the Gboard team and now with the Federated Assistant team. Prior to joining Google, I earned a PhD in high-energy particle physics while working at CERN on the discovery of the Higgs boson and searches for dark matter. - Standalone Federated Learning is an incredibly powerful tool for learning models on privacy-sensitive, distributed datasets. However, there are many applications in which important inference data domains are missing from the training data cached on federated clients. In such cases, federated learning must be supplemented with additional sources of information, including centrally-trained models and server-hosted datasets. In this talk, we present multiple approaches to the problem of mixing centralized and federated training. Experimental results are provided for both simulation and production FL settings.
Ayfer Ozgur, Stanford University- From worst-case to pointwise bounds for distributed estimation under communication constraints - Ayfer Ozgur is an Associate Professor in the Electrical Engineering Department at Stanford University where she is the Chambers Faculty Scholar in the School of Engineering. Her interests lie in information theory, wireless communication, statistics, and machine learning. Dr. Ozgur received the EPFL Best Ph.D. Thesis Award in 2010, the NSF CAREER award in 2013, the Okawa Foundation Research Grant, Faculty Research Awards from Google and Facebook, the IEEE Communication Theory Technical Committee (CTTC) Early Achievement Award in 2018 and was selected as the inaugural Goldsmith Lecturer of the IEEE ITSoc in 2020. - We consider the problem of estimating a d-dimensional discrete distribution from its samples observed under a b-bit communication constraint. In contrast to previous results that largely focus on the global minimax error, we study the local behavior of the estimation error. We develop optimal schemes that adopt to the difficulty of the underlying problem and provide pointwise bounds that depend on the target distribution p. Our results show that the correct measure of the local communication complexity at p is given by its Rényi entropy.
Bo Li, University of Illinois at Urbana-Champaign- Certifiably Robust Federated Learning against Poisoning Attacks - Dr. Bo Li is an assistant professor in the department of Computer Science at University of Illinois at Urbana–Champaign, and the recipient of the Symantec Research Labs Fellowship, Rising Stars, MIT Technology Review TR-35 award, Intel Rising Star award, NSF CAREER Award, Research Awards from Tech companies such as Amazon, Facebook, Google, and IBM, and best paper awards in several machine learning and security conferences. Her research focuses on both theoretical and practical aspects of trustworthy machine learning, security, machine learning, privacy, and game theory. She has designed several scalable frameworks for robust machine learning and privacy preserving data publishing systems. Her work have been featured by major publications and media outlets such as Nature, Wired, Fortune, and New York Times. - Advances in machine learning have led to rapid and widespread deployment of learning based inference and decision making for safety-critical applications, such as autonomous driving and security diagnostics. Current machine learning systems, however, assume that training and test data follow the same, or similar, distributions, and do not consider active adversaries manipulating either distribution. Recent work has demonstrated that motivated adversaries can circumvent anomaly detection or other machine learning models at test time through evasion attacks, or can inject well-crafted malicious instances into training data to induce errors in inference time through poisoning attacks. In this talk, I will describe my recent research about security and privacy problems in federated learning systems, and provide corresponding guarantees.
Borja Balle, DeepMind- Reconstructing Training Data with Informed Adversaries - Borja Balle is a research scientist at DeepMind working on privacy-preserving ML and the foundations of privacy-preserving data analysis. - Given access to a machine learning model, can an adversary reconstruct the model's training data? This work proposes a formal threat model to study this question, shows that reconstruction attacks are feasible in theory and in practice, and presents preliminary results assessing how different factors of standard machine learning pipelines affect the success of reconstruction. Finally, we empirically evaluate what levels of differential privacy suffice to prevent these reconstruction attacks.
Gauri Joshi, Carnegie Mellon University- Leveraging Spatial and Temporal Correlations in Sparsified Mean Estimation - Gauri Joshi is an assistant professor in the ECE department at Carnegie Mellon University . Gauri received her Ph.D. from MIT EECS in 2016, and her B.Tech and M.Tech in Electrical Engineering from the Indian Institute of Technology (IIT) Bombay in 2010. Her awards and honors include the NSF CAREER Award (2021), ACM SIGMETRICS Best Paper Award (2020), Best Thesis Prize in Computer science at MIT (2012), and Institute Gold Medal of IIT Bombay (2010). - We study the problem of estimating at a central server the mean of a set of vectors distributed across several nodes (one vector per node). When the vectors are high-dimensional, the communication cost of sending entire vectors may be prohibitive, and it may be imperative for them to use sparsification techniques. While most existing work on sparsified mean estimation is agnostic to the characteristics of the data vectors, in many practical applications such as federated learning, there may be spatial correlations (similarities in the vectors sent by different nodes) or temporal correlations (similarities in the data sent by a single node over different iterations of the algorithm) in the data vectors. We leverage these correlations by simply modifying the decoding method used by the server to estimate the mean. We provide an analysis of the resulting estimation error as well as experiments for PCA, K-Means and Logistic Regression, which show that our estimators consistently outperform more sophisticated and expensive sparsification methods.
Hamed Haddadi, Imperial College London- PPFL: Privacy-preserving Federated Learning with Trusted Execution Environments -Hamed is a Reader in Human-Centred Systems and the Director of Postgraduate Studies at the Dyson School of Design Engineering at The Faculty of Engineering, Imperial College London. In his industrial role, he is a Visiting Professor at Brave Software where he works on developing privacy-preserving analytics protocols. He enjoys designing and building systems that enable better use of our digital footprint, while respecting users' privacy. - We propose and implement a Privacy-preserving Federated Learning (PPFL) framework for mobile systems to limit privacy leakages in federated learning. Leveraging the widespread presence of Trusted Execution Environments (TEEs) in high-end and mobile devices, we utilize TEEs on clients for local training, and on servers for secure aggregation, so that model/gradient updates are hidden from adversaries. Challenged by the limited memory size of current TEEs, we leverage greedy layer-wise training to train each model's layer inside the trusted area until its convergence. The performance evaluation of our implementation shows that PPFL can significantly improve privacy while incurring small system overheads at the client-side. In particular, PPFL can successfully defend the trained model against data reconstruction, property inference, and membership inference attacks. Furthermore, it can achieve comparable model utility with fewer communication rounds (0.54×) and a similar amount of network traffic (1.002×) compared to the standard federated learning of a complete model. This is achieved while only introducing up to ~15% CPU time, ~18% memory usage, and ~21% energy consumption overhead in PPFL's client-side.
Nicolas D. Lane, University of Cambridge- Scaling and Accelerating Federated Learning Research with Flower - Nic Lane (http://niclane.org) is an Associate Professor in the department of Computer Science and Technology at the University of Cambridge where he leads the Machine Learning Systems Lab (CaMLSys -- http://http://mlsys.cst.cam.ac.uk/). Alongside his academic role, he is also a Director (On-Device and Distributed Machine Learning) at the Samsung AI Center in Cambridge. - Despite the rapid progress made in federated learning (FL) in recent years, it still remains far too difficult to evaluate FL algorithms under a full range of realistic system constraints (viz. compute, memory, energy, wired/wireless networking) and scale (thousands of federated devices and larger). As a consequence, our understanding of how these factors influence FL performance and should shape the future evolution of FL algorithms remains in a very underdeveloped state. In this talk, I will describe how we have begun to address this situation through recent new features of the Flower open-source framework (http://flower.dev). Not only does Flower make it relatively simple to measure the impact of common real-world FL situations (e.g., compute and memory heterogeneity in clients) it also now is possible to scale experiments to non-trivial client numbers using only a handful of desktop GPUs. I will highlight early empirical observations, made using Flower, as to what the implications are for existing algorithms under the types of heterogeneous large-scale FL systems we anticipate will increasingly appear.
Peter Richtarik, King Abdullah University of Science and Technology- EF21: A new, simpler, theoretically better, and practically faster error feedback -Peter Richtarik is a professor of Computer Science at the King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia, where he leads the Optimization and Machine Learning Lab. At KAUST, he has a courtesy affiliation with the Applied Mathematics and Computational Sciences program and the Statistics program, and is a member of the Visual Computing Center, and the Extreme Computing Research Center. Prof Richtarik is a founding member and a Fellow of the Alan Turing Institute (UK National Institute for Data Science and Artificial Intelligence), and an EPSRC Fellow in Mathematical Sciences. During 2017-2019, he was a Visiting Professor at the Moscow Institute of Physics and Technology. Prior to joining KAUST, he was an Associate Professor of Mathematics at the University of Edinburgh, and held postdoctoral and visiting positions at Université Catholique de Louvain, Belgium, and University of California, Berkeley, USA, respectively. He received his PhD in 2007 from Cornell University, USA. Prof Richtarik’s research interests lie at the intersection of mathematics, computer science, machine learning, optimization, numerical linear algebra, and high-performance computing. Through his work on randomized and distributed optimization algorithms, he has contributed to the foundations of machine learning, optimization and randomized numerical linear algebra. He is one of the original developers of Federated Learning – a new subfield of artificial intelligence whose goal is to train machine learning models over private data stored across a large number of heterogeneous devices, such as mobile phones or hospitals, in an efficient manner, and without compromising user privacy. In an October 2020 Forbes article, and alongside self-supervised learning and transformers, Federated Learning was listed as one of three emerging areas that will shape the next generation of Artificial Intelligence technologies. Prof Richtárik’s works attracted international awards, including a Best Paper Award at the NeurIPS 2020 Workshop on Scalability, Privacy, and Security in Federated Learning (joint with S. Horvath), Distinguished Speaker Award at the 2019 International Conference on Continuous Optimization, SIAM SIGEST Best Paper Award (joint with O. Fercoq), and the IMA Leslie Fox Prize (second prize, three times, awarded to two of his students and a postdoc). Several of his works are among the most read papers published by the SIAM Journal on Optimization and the SIAM Journal on Matrix Analysis and Applications. Prof Richtarik serves as an Area Chair for leading machine learning conferences, including NeurIPS, ICML and ICLR, and is an Area Editor of Journal of Optimization Theory and Applications, Associate Editor of Optimization Methods and Software, and a Handling Editor of the Journal of Nonsmooth Analysis and Optimization. - Error feedback (EF), also known as error compensation, is an immensely popular convergence stabilization mechanism in the context of distributed training of supervised machine learning models enhanced by the use of contractive communication compression mechanisms, such as Top-k. First proposed by Seide et al (2014) as a heuristic, EF resisted any theoretical understanding until recently [Stich et al., 2018, Alistarh et al., 2018]. However, all existing analyses either i) apply to the single node setting only, ii) rely on very strong and often unreasonable assumptions, such global boundedness of the gradients, or iterate-dependent assumptions that cannot be checked a-priori and may not hold in practice, or iii) circumvent these issues via the introduction of additional unbiased compressors, which increase the communication cost. In this work we fix all these deficiencies by proposing and analyzing a new EF mechanism, which we call EF21, which consistently and substantially outperforms EF in practice. Our theoretical analysis relies on standard assumptions only, works in the distributed heterogeneous data setting, and leads to better and more meaningful rates. In particular, we prove that EF21 enjoys a fast O(1/T) convergence rate for smooth nonconvex problems, beating the previous bound of O(1/T^{2/3}), which was shown a bounded gradients assumption. We further improve this to a fast linear rate for PL functions, which is the first linear convergence result for an EF-type method not relying on unbiased compressors. Since EF has a large number of applications where it reigns supreme, we believe that our 2021 variant, EF21, can a large impact on the practice of communication efficient distributed learning. This talk is based on the paper "EF21: A New, Simpler, Theoretically Better, and Practically Faster Error Feedback" - joint work with Igor Sokolov (KAUST) and Ilyas Fatkhullin (KAUST & TUM), NeurIPS 2021 (oral) Time permitting, I may also briefly outline the follow-up work "EF21 with Bells & Whistles: Practical Algorithmic Extensions of Modern Error Feedback" - joint work with Ilyas Fatkhullin, Igor Sokolov, Eduard Gorbunov and Zhize Li, https://arxiv.org/abs/2110.03294 (2021)
Phillip Gibbons, Carnegie Mellon University-Federated Learning under Distributed Concept Drift Suhas Diggavi, UCLA- Privacy-performance trade-offs in the Shuffled Model of Federated Learning - Suhas Diggavi is currently a Professor of Electrical and Computer Engineering at UCLA. His undergraduate education is from IIT, Delhi and his PhD is from Stanford University. He has worked as a principal member research staff at AT&T Shannon Laboratories and directed the Laboratory for Information and Communication Systems (LICOS) at EPFL. At UCLA, he directs the Information Theory and Systems Laboratory. His research interests include information theory and its applications to several area including learning, security and privacy, data compression, wireless networks, cyber-physical systems, genomics and neuroscience; more information can be found at http://licos.ee.ucla.edu. He has received several recognitions for his research from IEEE and ACM including the 2013 IEEE Information Theory Society & Communications Society Joint Paper Award, the 2006 IEEE Donald Fink prize paper award, the 2019 Google Faculty Research Award, the 2020 Amazon faculty research award, and is a Fellow of the IEEE. He was selected as a Guggenheim fellow in 2021. He has also organized several IEEE and ACM conferences. - In this talk we will briefly describe some of our recent work on trade-offs between privacy and learning performance for federated learning in the context of the shuffled privacy models. The challenges include accounting for (client) sampling, obtaining better compositional bounds (using Renyi DP) as well as communication efficiency. We will briefly present our theoretical results along with numerics. This work has/will appear in AISTATS 2021, ACM CCS 2021, NeurIPS 2021.
Walid Saad, Virginia Tech- Distributed Learning and Wireless Networks: A Closer Union - Walid Saad received his Ph.D degree from the University of Oslo in 2010. Currently, he is a Professor at the Department of Electrical and Computer Engineering at Virginia Tech where he leads the Network sciEnce, Wireless, and Security (NEWS) laboratory. His research interests are at the intersection of wireless networks, machine learning, and game theory. Dr. Saad was the author/co-author of ten conference best paper awards, of the 2015 IEEE ComSoc Fred W. Ellersick Prize, and of the 2019 and 2021 IEEE ComSoc Young Author Best Paper awards. He is also a Fellow of the IEEE. - In this talk, we provide an overview on research at the intersection of distributed (federated) learning and wireless networks. In particular, we focus on two areas: a) The at-scale deployment of distributed learning solutions over real-world wireless networks such as 5G and 6G systems, and b) the use of distributed learning to design self-organizing and autonomous wireless systems. For each area, we discuss a select research problem, and we articulate some of the key future challenges. We conclude the talk with some perspectives on this closer union between learning and networking.
Zach Charles, Google- On Large-Cohort Training for Federated Learning - Zachary Charles is a research scientist at Google, working on the theory and practice of federated optimization. Before joining Google, he received a Ph.D. from the University of Wisconsin-Madison in applied mathematics. - In this talk, we explore how the number of clients sampled at each round (the cohort size) impacts the quality of the learned model and the training dynamics of federated learning algorithms. We pose three fundamental questions. First, what challenges arise when trying to scale federated learning to larger cohorts? Second, what parallels exist between cohort sizes in federated learning and batch sizes in centralized learning? Last, how can we design federated learning methods that effectively utilize larger cohort sizes? We give partial answers to these questions based on extensive empirical evaluation. Our work highlights a number of challenges stemming from the use of larger cohorts. While some of these (such as generalization issues and diminishing returns) are analogs of large-batch training challenges, others (including training failures and fairness concerns) are unique to federated learning.
Zheng Xu, Google- Practical and Private Federated Learning without Sampling or Shuffling - Zheng Xu is a research scientist working on federated learning at Google. He got his Ph.D. in optimization and machine learning from University of Maryland, College Park. Before that, he got his master's and bachelor's degree from University of Science and Technology of China. - We consider training models with differential privacy (DP) using mini-batch gradients. The existing state-of-the-art, Differentially Private Stochastic Gradient Descent (DP-SGD), requires privacy amplification by sampling or shuffling to obtain the best privacy/accuracy/computation trade-offs. Unfortunately, the precise requirements on exact sampling and shuffling can be hard to obtain in important practical scenarios, particularly federated learning (FL). We design and analyze a DP variant of Follow-The-Regularized-Leader (DP-FTRL) that compares favorably (both theoretically and empirically) to amplified DP-SGD, while allowing for much more flexible data access patterns. DP-FTRL does not use any form of privacy amplification
Moderators:
Zach Garret (Privacy & Security)Sean Augenstein (Federated Optimization & Analytics)
Andreas Haeberlen, University of Pennsylvania- Privacy-Preserving Federated Analytics with Billions of Users - I am a professor at the University of Pennsylvania. My research interests include distributed systems, security, and privacy. Recently I have been working on differential privacy and on a new data-center architecture. - In my talk, I will give a quick overview of our work on massive-scale federated analytics with strong privacy guarantees. We have been developing ways to answer queries about data that is distributed across millions or even billions of user devices, without involving trusted parties (such as a central aggregator), and while guaranteeing differential privacy. Our solutions can efficiently support a variety of machine-learning tasks; the latest system (Mycelium - SOSP'21) adds support for distributed graph data.
Ankit Rawat, Google- FedLite: A Scalable Approach for Federated Learning on Resource-constrained Clients - Ankit Singh Rawat is a Research Scientist at Google Research, New York City. Previously, he held post-doctoral appointments at the Massachusetts Institute of Technology, University of Massachusetts Amherst, and Carnegie Mellon University. Ankit received his Ph.D. from the University of Texas at Austin. His research interests include large-scale machine learning, coding theory, and information theory. Ankit is a recipient of the 2020 EURASIP JASP Best Paper Award and the Microelectronics and Computer Development Fellowship from the University of Texas at Austin. - In classical federated learning, the clients contribute to the overall training by communicating local updates for the underlying model on their private data to a coordinating server. However, updating and communicating the entire model becomes prohibitively expensive when resource-constrained clients collectively aim to train a large machine learning model. Split learning provides a natural solution in such a setting, where only a (small) part of the model is stored and trained on clients while the remaining (large) part of the model only stays at the servers. Unfortunately, the model partitioning employed in split learning significantly increases the communication cost compared to the classical federated learning algorithms. We address this issue by proposing an end-to-end training framework that relies on a novel vector quantization scheme accompanied by a gradient correction method to reduce the additional communication cost associated with split learning.
Athina Markopoulou, UC Irvine- Location Leakage in Federated Signal Maps - Athina Markopoulou is a Professor and Chair in the EECS Department at UCI Irvine. Her research is in the area of computer networks, with current focus on privacy and data transparency for mobile networks and Internet-of-Things. More info can be found here: https://athinagroup.eng.uci.edu/athina/ - In this work, we focus on federated signal maps, where a number of of mobile devices collaborate to train a model that predicts the signal strength of cellular networks. We consider an honest-but-curious server, which launches a deep leakage attack (DLG) in order to infer important locations and the mobility patterns of individual users. We evaluate how various parameters of the federated learning framework provide different privacy-utility tradeoffs in this setting, and we provide recommendations. This is joint work with Evita Bakopoulou at UCI, and K. Psounis and J. Zhang at USC.
Dawn Song, UC Berkeley- Federated frequency moments estimation and its application in feature selection - Dawn Song is a Professor in the Department of Electrical Engineering and Computer Science at UC Berkeley. Her research interest lies in AI and deep learning, security and privacy. She is the recipient of various awards including the MacArthur Fellowship, the Guggenheim Fellowship, the NSF CAREER Award, the Alfred P. Sloan Research Fellowship, the MIT Technology Review TR-35 Award, and Best Paper Awards from top conferences in Computer Security and Deep Learning. She is an ACM Fellow and an IEEE Fellow. She is ranked the most cited scholar in computer security (AMiner Award). She obtained her Ph.D. degree from UC Berkeley. Prior to joining UC Berkeley as a faculty, she was a faculty at Carnegie Mellon University from 2002 to 2007. She is also a serial entrepreneur and has been named on the Female Founder 100 List by Inc. and Wired25 List of Innovators. Lun is a 4th-year PhD candidate at UC Berkeley, advised by Prof. Dawn Song. His research focuses on differential privacy and its application in federated learning. - Frequency moments are a family of non-linear statistics with numerous applications such as hypothesis test or entropy estimation. How to securely calculate frequency moments in a federated setting is an unsolved challenge. We propose a federated protocol to approximate frequency moments with secure aggregation. We also discuss the possibility of intrinsic DP in the protocol and several remaining challenges.
Eugene Bagdasaryan, Cornell Tech- Federated Analytics: Building Location Heatmaps under Distributed Differential Privacy with Secure Aggregation - Eugene is a PhD Candidate at Cornell Tech advised by Deborah Estrin and Vitaly Shmatikov. He is an Apple AI/ML Scholar and works on privacy and security in machine learning. - We design a scalable algorithm to privately generate location heatmaps over decentralized data from millions of user devices. It aims to ensure differential privacy before data becomes visible to a service provider while maintaining high data accuracy and minimizing resource consumption on users' devices. To achieve this, we revisit the distributed differential privacy concept based on recent results in the secure multiparty computation field and design a scalable and adaptive distributed differential privacy approach for location analytics. Evaluation on public location datasets shows that this approach successfully generates metropolitan-scale heatmaps from millions of user samples with a worst-case client communication overhead that is significantly smaller than existing state-of-the-art private protocols of similar accuracy.
Florian Tramer, Google- Better Membership Inference Attacks - Florian Tramer is a visiting researcher at Google Brain and an incoming assistant professor at ETH Zurich. His research interests are in the security and privacy of machine learning, computer security and applied cryptography. - We argue that a successful membership inference attack should be able to identify training examples with high confidence at low false positive rates. We show that existing attacks fail to do this. We then introduce a new attack that carefully combines a number of ideas from the literature to achieve high true-positive rates at low false-positive rates.
Giulia Fanti, Carnegie Mellon University - Reducing the Communication Cost of Federated Learning through Multistage Optimization - Giulia Fanti is an Assistant Professor of Electrical and Computer Engineering at Carnegie Mellon University. Her research interests regard the security, privacy, and efficiency of distributed systems. She is a two-time fellow for the World Economic Forum’s Global Future Council on Cybersecurity, a member of NIST's Security and Privacy Advisory Board, and a recipient of multiple best paper awards and faculty research awards. She obtained her Ph.D. in EECS from U.C. Berkeley and her B.S. in ECE from Olin College of Engineering. - A central question in federated learning (FL) is how to design optimization algorithms that minimize the communication cost of training a model over heterogeneous data distributed across many clients. A popular technique for reducing communication is the use of local steps, where clients take multiple optimization steps over local data before communicating with the server (e.g., FedAvg, SCAFFOLD). This contrasts with centralized methods, where clients take one optimization step per communication round (e.g., minibatch SGD). A recent lower bound on the communication complexity of first-order methods shows that centralized methods are optimal over highly heterogeneous data, whereas local methods are optimal over purely homogeneous data. For intermediate heterogeneity levels, no algorithm is known to match the lower bound. In this work, we propose a multistage optimization scheme that nearly matches the lower bound across all heterogeneity levels. The idea is to first run a local method up to a heterogeneity-induced error floor; next, we switch to a centralized method for the remaining steps. Our analysis may help explain empirically-successful stepsize decay methods in FL. We demonstrate the scheme’s practical utility in image classification tasks.
Jae Hun Ro, Google- FedJAX: Federated Learning Simulation with JAX - Jae Hun is a Software Engineer for the Google Research team. - In this talk, we will introduce FedJAX [https://github.com/google/fedjax], a lightweight Python- and JAX-based library for federated learning simulation that emphasizes ease-of-use and is mainly intended for research purposes. We’ll cover the basics of using FedJAX, including its efficient and easy-to-use primitives for federated learning and standardized collection of datasets, models, and algorithms, as well as various performance benchmarks on GPUs and TPUs. Finally, we’ll cover the existing use cases of FedJAX and our future development plans.
Li Xiong, Emory Univeristy- Federated Learning with Heterogeneous Data and Heterogeneous Differential Privacy - Li Xiong is a Professor of Computer Science and Biomedical Informatics at Emory University. She held a Winship Distinguished Research Professorship from 2015-2018. She has a Ph.D. from Georgia Institute of Technology, an MS from Johns Hopkins University, and a BS from the University of Science and Technology of China, all in Computer Science. She and her research lab, Assured Information Management and Sharing (AIMS), conduct research at the intersection of data management, machine learning, and data privacy and security, with a recent focus on machine learning with differential privacy and certified robustness, both in centralized and federated settings. - One important problem in federated learning (FL) is heterogeneity in data distribution, since the decentralized data are highly likely to follow non-identical distributions. Focusing on the less studied FL for graph data, our graph clustered FL (GCFL) framework dynamically finds clusters of local systems based on the gradients of Graph Neural Networks, and theoretically justifies that such clusters can reduce the structure and feature heterogeneity among graphs owned by the local systems. Another heterogeneity in FL is privacy requirements of different sites due to varying privacy policies or preferences of data subjects. Existing efforts on FL with differential privacy (DP) typically assume a uniform privacy. To leverage the heterogeneous privacy and optimize utility for the joint model, we propose Projected Federated Averaging (PFA), which extracts the top singular subspace of model updates from the “public” clients with less restrictive privacy and then utilizes them to project model updates of “private” clients before aggregating them. I will discuss open research directions that address heterogeneity in both data and privacy requirements.
Marco Canini, Kaust- Resource-Efficient Federated Learning - Marco does not know what the next big thing will be. But he's sure that our next-gen computing and networking infrastructure must be a viable platform for it. Marco's research spans a number of areas in computer systems, including distributed systems, large-scale/cloud computing and computer networking with emphasis on programmable networks. His current focus is on designing better systems support for AI/ML and providing practical implementations deployable in the real-world.Marco is an associate professor in Computer Science at KAUST. Marco obtained his Ph.D. in computer science and engineering from the University of Genoa in 2009 after spending the last year as a visiting student at the University of Cambridge. He was a postdoctoral researcher at EPFL and a senior research scientist at Deutsche Telekom Innovation Labs & TU Berlin. Before joining KAUST, he was an assistant professor at UCLouvain. He also held positions at Intel, Microsoft and Google. - Federated Learning (FL) enables distributed training by learners using local data, thereby enhancing privacy and reducing communication.However, it presents numerous challenges relating to the heterogeneity of the data distribution, device capabilities, and participant availability as deployments scale, which can impact both model convergence and bias. Existing FL schemes use random participant selection to improve fairness; however, this can result in inefficient use of resources and lower quality training. In this work, we systematically address the question of resource efficiency in FL, showing the benefits of intelligent participant selection, and incorporation of updates from straggling participants. We demonstrate how these factors enable resource efficiency while also improving trained model quality.
Satyen Kale, Google- Learning with user-level differential privacy - Satyen Kale is a research scientist at Google Research working in the New York office. His current research is the design of efficient and practical algorithms for fundamental problems in Machine Learning and Optimization. More specifically, he is interested in decision making under uncertainty, statistical learning theory, combinatorial optimization, and convex optimization techniques such as linear and semidefinite programming. His research has been recognized with several awards: a best paper award at ICML 2015, a best paper award at ICLR 2018, and a best student paper award at COLT 2018. He was a program chair of COLT 2017 and ALT 2019. - The classical setting of differential privacy assumes each user contributes a single sample to the dataset and preserves privacy by noising the output in a way that is commensurate with the maximum contribution of a single example. However, in many practical applications such as federated learning, each user can contribute multiple samples. In these applications, the goal is to provide user-level differential privacy which protects privacy of all the samples of the user. In this talk, we present algorithms and information-theoretic lower bounds for the problems of discrete distribution estimation, high-dimensional mean estimation, and empirical risk minimization under user-level differential privacy
Shuang Song, Google- Public Data-Assisted Mirror Descent for Private Model Training - We revisit the problem of using public data to improve the privacy/utility trade-offs for differentially private (DP) model training. Here, public data refers to auxiliary data sets that have no privacy concerns. We consider public training data sets that are from the same distribution as the private training data. For convex losses, we show that a variant of Mirror Descent provides population risk guarantees which are independent of the dimension of the model ($p$). Specifically, we apply Mirror Descent with the loss generated by the public data as the mirror map, and using DP gradients of the loss generated by the private (sensitive) data. To obtain dimension independence, we require $G_Q^2 \leq p$ public data samples, where $G_Q$ is the Gaussian width of the smallest convex set $Q$ such that the public loss functions are 1-strongly convex with respect to $\norm{\cdot}_Q$. We further show that our algorithm has a natural ``noise stability'' property: If in a bounded region around the current iterate, the public loss satisfies $\alpha_\bfv$-strong convexity in a direction $\bfv$, then using noisy gradients instead of the exact gradients shifts our next iterate in the direction $\bfv$ by an amount proportional to $1/\alpha_\bfv$ (in contrast with DP stochastic gradient descent (DP-SGD), where the shift is isotropic). Analogous results in prior works had to explicitly learn the geometry using the public data in the form of preconditioner matrices. Our method is also applicable to non-convex losses, as it does not rely on convexity assumptions to ensure DP guarantees. We demonstrate the empirical efficacy of our algorithm by showing privacy/utility trade-offs on linear regression, and deep learning benchmark datasets (WikiText-2, CIFAR-10, and EMNIST). We show that our algorithm not only significantly improves over traditional DP-SGD, which does not have access to public data, but also improves over DP-SGD on models that have been pretrained with the public data to begin with.
Steven Wu, Carnegie Mellon University- Private Multi-Task Learning: Formulation and Applications to Federated Learning - Dr. Steven Wu is an Assistant Professor in the School of Computer Science at Carnegie Mellon University. He received his Ph.D. in Computer Science from the University of Pennsylvania in 2017 and was a post-doctoral researcher at Microsoft Research-NYC from 2017 to 2018. His recent work focuses on (1) how to make machine learning better aligned with societal values, especially privacy and fairness, and (2) how to make machine learning more reliable and robust when algorithms interact with social and economic dynamics. - Many problems in machine learning rely on multi-task learning (MTL), in which the goal is to solve multiple related machine learning tasks simultaneously. MTL is particularly relevant for privacy-sensitive applications in areas such as healthcare, finance, and IoT computing, where sensitive data from multiple, varied sources are shared for the purpose of learning. In this work, we formalize notions of task-level privacy for MTL via joint differential privacy(JDP), a relaxation of differential privacy for mechanism design and distributed optimization. We then propose an algorithm for mean-regularized MTL, an objective commonly used for applications in personalized federated learning, subject to JDP. We analyze our objective and solver, providing certifiable guarantees on both privacy and utility. Empirically, we find that our method allows for improved privacy/utility trade-offs relative to global baselines across common federated learning benchmarks.
Yang Liu, Tsinghua University - Federated Learning of Larger Server Models via Selective Knowledge Fusion - Yang Liu is an Associate Professor at Institute for AI Industry Research, Tsinghua University. Before joining Tsinghua, she was a Principal Researcher and Research Lead at WeBank where she co-found FedAI.org, the first Federated Learning Ecosystem in China. She holds over 20 patents and over 100 patent applications. She also co-authored the book "Federated Learning". She also serves as an associate editor for ACM TIST, and a guest editor for IEEE Intelligent Systems and IEEE BigData, and co-chaired multiple workshops at IJCAI and Neurips. Her research work has been recognized with multiple awards, such as AAAI Innovation Award. - In this work, we investigate a novel paradigm to take advantage of a powerful server model to break through model capacity in Federated Learning (FL). By selectively learning from multiple teacher clients and itself, a server model develops in-depth knowledge and transfers its knowledge back to clients in return to boost their respective performance. Our proposed framework achieves superior performance on both server and client models and provides several advantages in a unified framework, including flexibility for heterogeneous client architectures, robustness to poisoning attacks, and communication efficiency between clients and server. By bridging FL effectively with larger server model training, our proposed paradigm paves ways for robust and continual knowledge accumulation from distributed and private data.
Zach Garret (Privacy & Security)Sean Augenstein (Federated Optimization & Analytics)
Andreas Haeberlen, University of Pennsylvania- Privacy-Preserving Federated Analytics with Billions of Users - I am a professor at the University of Pennsylvania. My research interests include distributed systems, security, and privacy. Recently I have been working on differential privacy and on a new data-center architecture. - In my talk, I will give a quick overview of our work on massive-scale federated analytics with strong privacy guarantees. We have been developing ways to answer queries about data that is distributed across millions or even billions of user devices, without involving trusted parties (such as a central aggregator), and while guaranteeing differential privacy. Our solutions can efficiently support a variety of machine-learning tasks; the latest system (Mycelium - SOSP'21) adds support for distributed graph data.
Ankit Rawat, Google- FedLite: A Scalable Approach for Federated Learning on Resource-constrained Clients - Ankit Singh Rawat is a Research Scientist at Google Research, New York City. Previously, he held post-doctoral appointments at the Massachusetts Institute of Technology, University of Massachusetts Amherst, and Carnegie Mellon University. Ankit received his Ph.D. from the University of Texas at Austin. His research interests include large-scale machine learning, coding theory, and information theory. Ankit is a recipient of the 2020 EURASIP JASP Best Paper Award and the Microelectronics and Computer Development Fellowship from the University of Texas at Austin. - In classical federated learning, the clients contribute to the overall training by communicating local updates for the underlying model on their private data to a coordinating server. However, updating and communicating the entire model becomes prohibitively expensive when resource-constrained clients collectively aim to train a large machine learning model. Split learning provides a natural solution in such a setting, where only a (small) part of the model is stored and trained on clients while the remaining (large) part of the model only stays at the servers. Unfortunately, the model partitioning employed in split learning significantly increases the communication cost compared to the classical federated learning algorithms. We address this issue by proposing an end-to-end training framework that relies on a novel vector quantization scheme accompanied by a gradient correction method to reduce the additional communication cost associated with split learning.
Athina Markopoulou, UC Irvine- Location Leakage in Federated Signal Maps - Athina Markopoulou is a Professor and Chair in the EECS Department at UCI Irvine. Her research is in the area of computer networks, with current focus on privacy and data transparency for mobile networks and Internet-of-Things. More info can be found here: https://athinagroup.eng.uci.edu/athina/ - In this work, we focus on federated signal maps, where a number of of mobile devices collaborate to train a model that predicts the signal strength of cellular networks. We consider an honest-but-curious server, which launches a deep leakage attack (DLG) in order to infer important locations and the mobility patterns of individual users. We evaluate how various parameters of the federated learning framework provide different privacy-utility tradeoffs in this setting, and we provide recommendations. This is joint work with Evita Bakopoulou at UCI, and K. Psounis and J. Zhang at USC.
Dawn Song, UC Berkeley- Federated frequency moments estimation and its application in feature selection - Dawn Song is a Professor in the Department of Electrical Engineering and Computer Science at UC Berkeley. Her research interest lies in AI and deep learning, security and privacy. She is the recipient of various awards including the MacArthur Fellowship, the Guggenheim Fellowship, the NSF CAREER Award, the Alfred P. Sloan Research Fellowship, the MIT Technology Review TR-35 Award, and Best Paper Awards from top conferences in Computer Security and Deep Learning. She is an ACM Fellow and an IEEE Fellow. She is ranked the most cited scholar in computer security (AMiner Award). She obtained her Ph.D. degree from UC Berkeley. Prior to joining UC Berkeley as a faculty, she was a faculty at Carnegie Mellon University from 2002 to 2007. She is also a serial entrepreneur and has been named on the Female Founder 100 List by Inc. and Wired25 List of Innovators. Lun is a 4th-year PhD candidate at UC Berkeley, advised by Prof. Dawn Song. His research focuses on differential privacy and its application in federated learning. - Frequency moments are a family of non-linear statistics with numerous applications such as hypothesis test or entropy estimation. How to securely calculate frequency moments in a federated setting is an unsolved challenge. We propose a federated protocol to approximate frequency moments with secure aggregation. We also discuss the possibility of intrinsic DP in the protocol and several remaining challenges.
Eugene Bagdasaryan, Cornell Tech- Federated Analytics: Building Location Heatmaps under Distributed Differential Privacy with Secure Aggregation - Eugene is a PhD Candidate at Cornell Tech advised by Deborah Estrin and Vitaly Shmatikov. He is an Apple AI/ML Scholar and works on privacy and security in machine learning. - We design a scalable algorithm to privately generate location heatmaps over decentralized data from millions of user devices. It aims to ensure differential privacy before data becomes visible to a service provider while maintaining high data accuracy and minimizing resource consumption on users' devices. To achieve this, we revisit the distributed differential privacy concept based on recent results in the secure multiparty computation field and design a scalable and adaptive distributed differential privacy approach for location analytics. Evaluation on public location datasets shows that this approach successfully generates metropolitan-scale heatmaps from millions of user samples with a worst-case client communication overhead that is significantly smaller than existing state-of-the-art private protocols of similar accuracy.
Florian Tramer, Google- Better Membership Inference Attacks - Florian Tramer is a visiting researcher at Google Brain and an incoming assistant professor at ETH Zurich. His research interests are in the security and privacy of machine learning, computer security and applied cryptography. - We argue that a successful membership inference attack should be able to identify training examples with high confidence at low false positive rates. We show that existing attacks fail to do this. We then introduce a new attack that carefully combines a number of ideas from the literature to achieve high true-positive rates at low false-positive rates.
Giulia Fanti, Carnegie Mellon University - Reducing the Communication Cost of Federated Learning through Multistage Optimization - Giulia Fanti is an Assistant Professor of Electrical and Computer Engineering at Carnegie Mellon University. Her research interests regard the security, privacy, and efficiency of distributed systems. She is a two-time fellow for the World Economic Forum’s Global Future Council on Cybersecurity, a member of NIST's Security and Privacy Advisory Board, and a recipient of multiple best paper awards and faculty research awards. She obtained her Ph.D. in EECS from U.C. Berkeley and her B.S. in ECE from Olin College of Engineering. - A central question in federated learning (FL) is how to design optimization algorithms that minimize the communication cost of training a model over heterogeneous data distributed across many clients. A popular technique for reducing communication is the use of local steps, where clients take multiple optimization steps over local data before communicating with the server (e.g., FedAvg, SCAFFOLD). This contrasts with centralized methods, where clients take one optimization step per communication round (e.g., minibatch SGD). A recent lower bound on the communication complexity of first-order methods shows that centralized methods are optimal over highly heterogeneous data, whereas local methods are optimal over purely homogeneous data. For intermediate heterogeneity levels, no algorithm is known to match the lower bound. In this work, we propose a multistage optimization scheme that nearly matches the lower bound across all heterogeneity levels. The idea is to first run a local method up to a heterogeneity-induced error floor; next, we switch to a centralized method for the remaining steps. Our analysis may help explain empirically-successful stepsize decay methods in FL. We demonstrate the scheme’s practical utility in image classification tasks.
Jae Hun Ro, Google- FedJAX: Federated Learning Simulation with JAX - Jae Hun is a Software Engineer for the Google Research team. - In this talk, we will introduce FedJAX [https://github.com/google/fedjax], a lightweight Python- and JAX-based library for federated learning simulation that emphasizes ease-of-use and is mainly intended for research purposes. We’ll cover the basics of using FedJAX, including its efficient and easy-to-use primitives for federated learning and standardized collection of datasets, models, and algorithms, as well as various performance benchmarks on GPUs and TPUs. Finally, we’ll cover the existing use cases of FedJAX and our future development plans.
Li Xiong, Emory Univeristy- Federated Learning with Heterogeneous Data and Heterogeneous Differential Privacy - Li Xiong is a Professor of Computer Science and Biomedical Informatics at Emory University. She held a Winship Distinguished Research Professorship from 2015-2018. She has a Ph.D. from Georgia Institute of Technology, an MS from Johns Hopkins University, and a BS from the University of Science and Technology of China, all in Computer Science. She and her research lab, Assured Information Management and Sharing (AIMS), conduct research at the intersection of data management, machine learning, and data privacy and security, with a recent focus on machine learning with differential privacy and certified robustness, both in centralized and federated settings. - One important problem in federated learning (FL) is heterogeneity in data distribution, since the decentralized data are highly likely to follow non-identical distributions. Focusing on the less studied FL for graph data, our graph clustered FL (GCFL) framework dynamically finds clusters of local systems based on the gradients of Graph Neural Networks, and theoretically justifies that such clusters can reduce the structure and feature heterogeneity among graphs owned by the local systems. Another heterogeneity in FL is privacy requirements of different sites due to varying privacy policies or preferences of data subjects. Existing efforts on FL with differential privacy (DP) typically assume a uniform privacy. To leverage the heterogeneous privacy and optimize utility for the joint model, we propose Projected Federated Averaging (PFA), which extracts the top singular subspace of model updates from the “public” clients with less restrictive privacy and then utilizes them to project model updates of “private” clients before aggregating them. I will discuss open research directions that address heterogeneity in both data and privacy requirements.
Marco Canini, Kaust- Resource-Efficient Federated Learning - Marco does not know what the next big thing will be. But he's sure that our next-gen computing and networking infrastructure must be a viable platform for it. Marco's research spans a number of areas in computer systems, including distributed systems, large-scale/cloud computing and computer networking with emphasis on programmable networks. His current focus is on designing better systems support for AI/ML and providing practical implementations deployable in the real-world.Marco is an associate professor in Computer Science at KAUST. Marco obtained his Ph.D. in computer science and engineering from the University of Genoa in 2009 after spending the last year as a visiting student at the University of Cambridge. He was a postdoctoral researcher at EPFL and a senior research scientist at Deutsche Telekom Innovation Labs & TU Berlin. Before joining KAUST, he was an assistant professor at UCLouvain. He also held positions at Intel, Microsoft and Google. - Federated Learning (FL) enables distributed training by learners using local data, thereby enhancing privacy and reducing communication.However, it presents numerous challenges relating to the heterogeneity of the data distribution, device capabilities, and participant availability as deployments scale, which can impact both model convergence and bias. Existing FL schemes use random participant selection to improve fairness; however, this can result in inefficient use of resources and lower quality training. In this work, we systematically address the question of resource efficiency in FL, showing the benefits of intelligent participant selection, and incorporation of updates from straggling participants. We demonstrate how these factors enable resource efficiency while also improving trained model quality.
Satyen Kale, Google- Learning with user-level differential privacy - Satyen Kale is a research scientist at Google Research working in the New York office. His current research is the design of efficient and practical algorithms for fundamental problems in Machine Learning and Optimization. More specifically, he is interested in decision making under uncertainty, statistical learning theory, combinatorial optimization, and convex optimization techniques such as linear and semidefinite programming. His research has been recognized with several awards: a best paper award at ICML 2015, a best paper award at ICLR 2018, and a best student paper award at COLT 2018. He was a program chair of COLT 2017 and ALT 2019. - The classical setting of differential privacy assumes each user contributes a single sample to the dataset and preserves privacy by noising the output in a way that is commensurate with the maximum contribution of a single example. However, in many practical applications such as federated learning, each user can contribute multiple samples. In these applications, the goal is to provide user-level differential privacy which protects privacy of all the samples of the user. In this talk, we present algorithms and information-theoretic lower bounds for the problems of discrete distribution estimation, high-dimensional mean estimation, and empirical risk minimization under user-level differential privacy
Shuang Song, Google- Public Data-Assisted Mirror Descent for Private Model Training - We revisit the problem of using public data to improve the privacy/utility trade-offs for differentially private (DP) model training. Here, public data refers to auxiliary data sets that have no privacy concerns. We consider public training data sets that are from the same distribution as the private training data. For convex losses, we show that a variant of Mirror Descent provides population risk guarantees which are independent of the dimension of the model ($p$). Specifically, we apply Mirror Descent with the loss generated by the public data as the mirror map, and using DP gradients of the loss generated by the private (sensitive) data. To obtain dimension independence, we require $G_Q^2 \leq p$ public data samples, where $G_Q$ is the Gaussian width of the smallest convex set $Q$ such that the public loss functions are 1-strongly convex with respect to $\norm{\cdot}_Q$. We further show that our algorithm has a natural ``noise stability'' property: If in a bounded region around the current iterate, the public loss satisfies $\alpha_\bfv$-strong convexity in a direction $\bfv$, then using noisy gradients instead of the exact gradients shifts our next iterate in the direction $\bfv$ by an amount proportional to $1/\alpha_\bfv$ (in contrast with DP stochastic gradient descent (DP-SGD), where the shift is isotropic). Analogous results in prior works had to explicitly learn the geometry using the public data in the form of preconditioner matrices. Our method is also applicable to non-convex losses, as it does not rely on convexity assumptions to ensure DP guarantees. We demonstrate the empirical efficacy of our algorithm by showing privacy/utility trade-offs on linear regression, and deep learning benchmark datasets (WikiText-2, CIFAR-10, and EMNIST). We show that our algorithm not only significantly improves over traditional DP-SGD, which does not have access to public data, but also improves over DP-SGD on models that have been pretrained with the public data to begin with.
Steven Wu, Carnegie Mellon University- Private Multi-Task Learning: Formulation and Applications to Federated Learning - Dr. Steven Wu is an Assistant Professor in the School of Computer Science at Carnegie Mellon University. He received his Ph.D. in Computer Science from the University of Pennsylvania in 2017 and was a post-doctoral researcher at Microsoft Research-NYC from 2017 to 2018. His recent work focuses on (1) how to make machine learning better aligned with societal values, especially privacy and fairness, and (2) how to make machine learning more reliable and robust when algorithms interact with social and economic dynamics. - Many problems in machine learning rely on multi-task learning (MTL), in which the goal is to solve multiple related machine learning tasks simultaneously. MTL is particularly relevant for privacy-sensitive applications in areas such as healthcare, finance, and IoT computing, where sensitive data from multiple, varied sources are shared for the purpose of learning. In this work, we formalize notions of task-level privacy for MTL via joint differential privacy(JDP), a relaxation of differential privacy for mechanism design and distributed optimization. We then propose an algorithm for mean-regularized MTL, an objective commonly used for applications in personalized federated learning, subject to JDP. We analyze our objective and solver, providing certifiable guarantees on both privacy and utility. Empirically, we find that our method allows for improved privacy/utility trade-offs relative to global baselines across common federated learning benchmarks.
Yang Liu, Tsinghua University - Federated Learning of Larger Server Models via Selective Knowledge Fusion - Yang Liu is an Associate Professor at Institute for AI Industry Research, Tsinghua University. Before joining Tsinghua, she was a Principal Researcher and Research Lead at WeBank where she co-found FedAI.org, the first Federated Learning Ecosystem in China. She holds over 20 patents and over 100 patent applications. She also co-authored the book "Federated Learning". She also serves as an associate editor for ACM TIST, and a guest editor for IEEE Intelligent Systems and IEEE BigData, and co-chaired multiple workshops at IJCAI and Neurips. Her research work has been recognized with multiple awards, such as AAAI Innovation Award. - In this work, we investigate a novel paradigm to take advantage of a powerful server model to break through model capacity in Federated Learning (FL). By selectively learning from multiple teacher clients and itself, a server model develops in-depth knowledge and transfers its knowledge back to clients in return to boost their respective performance. Our proposed framework achieves superior performance on both server and client models and provides several advantages in a unified framework, including flexibility for heterogeneous client architectures, robustness to poisoning attacks, and communication efficiency between clients and server. By bridging FL effectively with larger server model training, our proposed paradigm paves ways for robust and continual knowledge accumulation from distributed and private data.