In this thesis project we propose to apply the scientific method to machine learning. We will explore two lines of research. In the first we will build on recent work applying modern experimental design for algorithm selection and hyperparameter tuning. The main thrust of this sub-project is the multi-problem approach: we will explore the interaction between methods (and hyperparameters) and data sets to find out whether and to what extent experience can be generalized across data sets. The output of this project is a toolbox for practitioners and a stockpile of knowledge on what algorithm works on what (kind of) data sets. This second output will feed into the second line of research: we will ask the question of \emph{why} certain methods work on certain data sets. We will study algorithms as natural phenomena, form hypotheses, design and evaluate experiments, and carry out measurements that could validate or refute our hypotheses.