We made strong headway in ML foundations, with extensive work on algorithms, efficiency, data and privacy. We improved ML efficiency through pioneering techniques that reduce the inference times of LLMs, which were implemented across Google products and adopted throughout the industry. Our research on cascades presents a method for leveraging smaller models for “easy” outputs while our novel speculative decoding algorithm computes several tokens in parallel, speeding up the generation of outputs by ~2x–3x without affecting the quality. As a result, LLMs powering conversational products can generate responses significantly faster. This equates to a greatly improved user experience and makes AI more compute- and energy-efficient. We’re building on this work with draft refinement and block verification. We also examined new ways of improving reasoning capabilities of LLMs via pause tokens — increased reasoning power could make smaller models more powerful resulting in significant efficiency gains. We explored the algorithmic efficiency of transformers and designed PolySketchFormer, HyperAttention, and Selective Attention, three novel attention mechanisms, to address computational challenges and bottlenecks in the deployment of language models and to improve model quality.
Our teams have made considerable additional progress, including research on principled deferral algorithms with multiple experts and a general two-stage setting deferral algorithm. Our RL imitation learning algorithm for compiler optimization led to significant savings and reduction of the size of binary files; our research on multi-objective reinforcement learning from human feedback, the Conditional Language Policy framework, provided a principled solution with a key quality-factuality tradeoff and significant compute savings; and work on in-context learning provided a mechanism for sample-efficient learning for sparse retrieval tasks.
Data is another critical building block for ML. To support ML research across the ecosystem, we released and contributed to various datasets. Croissant, for example, is a metadata format designed for the specific needs of ML data, which we designed in collaboration with industry and academia. We developed sensitivity sampling, a data sampling technique for foundation models, and proved that this is an optimal data sampling strategy for classic clustering problems such as k-means. We advanced our research in scalable clustering algorithms, and open-sourced a parallel graph clustering library, providing state-of-the-art results on billion-edge graphs on a single machine. The rapid proliferation of domain-specific machine learning models highlights a key challenge: while these models excel within their respective domains, their performance often varies significantly across diverse applications. To address this, our research developed a principled algorithm by framing the problem as a multiple-source domain adaptation task.
Google Research is deeply committed to privacy research and has made significant contributions to the field. Our work on differentially private model training highlights the importance of rigorous analysis and implementation of privacy-preserving ML algorithms to ensure robust protection of user data. We complemented these analyses with more efficient algorithms for training and new methods for auditing implementations, which we open sourced for the community. In our research on learning from aggregate data, we introduced a novel approach for constructing aggregation datasets, and explored various algorithmic aspects of model learning from aggregated data, which achieved optimistic sample complexity rates in this setting. We also designed new methods for generating differentially private synthetic data — data that is artificial and offers strong privacy protection, while still having the characteristics required for training predictive models.
As we push the boundaries of what can be achieved in computational optimization, there are meaningful implications for the global economy. Take linear programming (LP), a foundational computer science method that informs data-driven decision making and has many applications across fields such as manufacturing and transportation. We introduced PDLP, which requires less memory, is more compatible with modern computational techniques, and significantly scales up LP solving capabilities. It was awarded the prestigious Beale — Orchard-Hays Prize and is now available as part of Google’s open-sourced OR-Tools. We announced our Shipping Network Design API, a great example use-case of PDLP, for optimizing cargo shipping. This enables more environmental and cost-effective solutions to supply chain challenges, with the potential for shipping networks to deliver 13% more containers with 15% fewer vessels. We introduced Times-FM, too, for more accurate time-series forecasting, a widespread type of forecasting used in domains such as retail, manufacturing and finance. This decoder-only foundation model was pre-trained on 100B real world time-points, largely using data from Google Trends and Wikipedia pageviews, and outperformed even powerful deep-learning models that were trained on the target time-series.
link