24 episodes

Machine learning is driving exciting changes and progress in computing. What does the ubiquity of machine learning mean for how people build and deploy systems and applications? What challenges does industry face when deploying machine learning systems in the real world, and how can academia rise to meet those challenges?

Updates every Monday and Friday - old episodes on Mondays, new episodes on Fridays!

Check out our website and your YouTube channel for full videos!
https://mlsys.stanford.edu/
https://www.youtube.com/channel/UCzz6ructab1U44QPI3HpZEQ

Stanford MLSys Seminar Dan Fu, Karan Goel, Fiodar Kazhamakia, Piero Molino, Matei Zaharia, Chris Ré

- Technology
- 5.0 • 7 Ratings

- APR 27, 2022
#62 Dan Fu - Improving Transfer and Robustness of Supervised Contrastive Learning

#62 Dan Fu - Improving Transfer and Robustness of Supervised Contrastive Learning

Dan Fu - An ideal learned representation should display transferability and robustness. Supervised contrastive learning is a promising method for training accurate models, but produces representations that do not capture these properties due to class collapse -- when all points in a class map to the same representation. In this talk, we discuss how to alleviate these problems to improve the geometry of supervised contrastive learning. We identify two key principles: balancing the right amount of geometric "spread" in the embedding space, and inducing an inductive bias towards subclass clustering. We introduce two mechanisms for achieving these aims in supervised contrastive learning, and show that doing so improves transfer learning and worst-group robustness. Next, we show how we can apply these insights to improve entity retrieval in open-domain NLP tasks (e.g., QA, search). We present a new method, TABi, that trains bi-encoders with a type-aware supervised contrastive loss and improves long-tailed entity retrieval.
- 56 min
- APR 22, 2022
#61 Kexin Rong - Big Data Analytics

#61 Kexin Rong - Big Data Analytics

Kexin Rong - Learned Indexing and Sampling for Improving Query Performance in Big-Data Analytics

Traditional data analytics systems improve query efficiency via fine-grained, row-level indexing and sampling techniques. However, to keep up with the data volumes, increasingly many systems store and process datasets in large partitions containing hundreds of thousands of rows. Therefore, these analytics systems must adapt traditional techniques to work with coarse-grained data partitions as a basic unit to process queries efficiently. In this talk, I will discuss two related ideas that combine learning techniques with partitioning designs to improve the query efficiency in the analytics systems. First, I will describe PS3, the first approximate query processing system that supports non-uniform, partition-level samples. PS3 reduces the number of partitions accessed by 3 to 70x to achieve the same error compared to a uniform sample of the partitions. Next, I will present OLO, an online learning framework that dynamically adapts data organization according to changes in query workload to minimize overall data access and movement. We show that dynamic reorganization outperforms a single, optimized partitioning scheme by up to 30% in end-to-end runtime. I will conclude by discussing additional open problems in this area.
- 59 min
- APR 11, 2022
#60 Igor Markov - Looper: An End-to-End ML Platform for Product Decisions

#60 Igor Markov - Looper: An End-to-End ML Platform for Product Decisions

Igor Markov - Looper: an end-to-end ML platform for product decisions

Episode 60 of the Stanford MLSys Seminar Series! Looper: an end-to-end ML platform for product decisions Speaker: Igor Markov Abstract: Modern software systems and products increasingly rely on machine learning models to make data-driven decisions based on interactions with users, infrastructure and other systems. For broader adoption, this practice must (i) accommodate product engineers without ML backgrounds, (ii) support fine-grain product-metric evaluation and (iii) optimize for product goals. To address shortcomings of prior platforms, we introduce general principles for and the architecture of an ML platform, Looper, with simple APIs for decision-making and feedback collection.

Looper covers the end-to-end ML lifecycle from collecting training data and model training to deployment and inference, and extends support to personalization, causal evaluation with heterogenous treatment effects, and Bayesian tuning for product goals. During the 2021 production deployment Looper simultaneously hosted 440-1,000 ML models that made 4-6 million real-time decisions per second. We sum up experiences of platform adopters and describe their learning curve.
- 1 hr
- APR 4, 2022
#59 Zhuohan Li - Alpa: Automated Model-Parallel Deep Learning

#59 Zhuohan Li - Alpa: Automated Model-Parallel Deep Learning

Zhuohan Li - Alpa: Automated Model-Parallel Deep Learning

Alpa (https://github.com/alpa-projects/alpa) automates model-parallel training of large deep learning models by generating execution plans that unify data, operator, and pipeline parallelism. Alpa distributes the training of large deep learning models by viewing parallelisms as two hierarchical levels: inter-operator and intra-operator parallelisms. Based on it, Alpa constructs a new hierarchical space for massive model-parallel execution plans. Alpa designs a number of compilation passes to automatically derive the optimal parallel execution plan in each independent parallelism level and implements an efficient runtime to orchestrate the two-level parallel execution on distributed compute devices. Alpa generates parallelization plans that match or outperform hand-tuned model-parallel training systems even on models they are designed for. Unlike specialized systems, Alpa also generalizes to models with heterogeneous architectures and models without manually-designed plans.
- 55 min
- MAR 18, 2022
3/10/22 #58 Shruti Bhosale - Multilingual Machine Translation

3/10/22 #58 Shruti Bhosale - Multilingual Machine Translation

Shruti Bhosale - Scaling Multilingual Machine Translation to Thousands of Language Directions

Existing work in translation has demonstrated the potential of massively multilingual machine translation by training a single model able to translate between any pair of languages. However, much of this work is English-Centric by training only on data which was translated from or to English. While this is supported by large sources of training data, it does not reflect translation needs worldwide. In this talk, I will describe how we create a true Many-to-Many multilingual translation model that can translate directly between any pair of 100 languages. We build and open source a training dataset that covers thousands of language directions with supervised data, created through large-scale mining. Then, we explore how to effectively increase model capacity through a combination of dense scaling and language-specific sparse parameters to create high quality models. Our focus on non-English-Centric models brings gains of more than 10 BLEU when directly translating between non-English directions while performing competitively to the best single systems of WMT.
- 57 min
- MAR 4, 2022
3/3/22 #57 Vijay Janapa Reddi - TinyML, Harvard Style

3/3/22 #57 Vijay Janapa Reddi - TinyML, Harvard Style

Vijay Janapa Reddi - Tiny Machine Learning

Tiny machine learning (TinyML) is a fast-growing field at the intersection of ML algorithms and low-cost embedded systems. TinyML enables on-device analysis of sensor data (vision, audio, IMU, etc.) at ultra-low-power consumption (less than 1mW). Processing data close to the sensor allows for an expansive new variety of always-on ML use-cases that preserve bandwidth, latency, and energy while improving responsiveness and maintaining privacy. This talk introduces the vision behind TinyML and showcases some of the interesting applications that TinyML is enabling in the field, from wildlife conservation to supporting public health initiatives. Yet, there are still numerous technical hardware and software challenges to address. Tight memory and storage constraints, MCU heterogeneity, software fragmentation and a lack of relevant large-scale datasets pose a substantial barrier to developing TinyML applications. To this end, the talk touches upon some of the research opportunities for unlocking the full potential of TinyML.
- 57 min