Spotify Engineering

Choosing Sequential Testing Framework — Comparisons and Discussions

1. Introduction
- The post discusses the pros and cons of different sequential testing frameworks for experimentation.
- The choice of framework can have an impact on power properties and false positive rates.
2. Always Valid Inference
- Allows continuous testing during data collection without deciding on a stopping rule or number of analyses.
- False positive rate can be bound by using Bonferroni corrections.
- A good fit for experiments that run for a few weeks and receive data in batches.
3. Evaluating Sequential Tests
- Two important properties: bounded false positive rate and statistical power.
- False positive rate simulation conducted for GST with correctly assumed, underestimated, and overestimated sample sizes.
- Always valid tests (GAVI and mSPRT) are conservative when not performed after each observation.
- Correctly bounded false positive rate guaranteed with always valid inference.
4. GST vs. AVI
- GSTs are preferable when the expected sample size can be estimated accurately.
- AVI family of tests is a good choice when data is streamed and sample size cannot be estimated accurately.
- Probability of identifying an effect is higher with GST when analyzing streaming data in batches.