“Traditional systems performance engineering makes do with vendor-supplied metrics, often involving interpretation and inference, and with numerous blind spots. Much in the field of systems performance is still living in the past: documentation, procedures, and analysis GUIs built upon the same old metrics. For modern systems, we can choose the metrics, and can choose ones we need to support new holistic performance analysis methodologies. These methodologies provide faster, more accurate, and more complete analysis, and can provide a starting point for unfamiliar systems. Methodologies are especially helpful for modern applications and their workloads, which can pose extremely complex problems with no obvious starting point. There are also continuous deployment environments such as the Netflix cloud, where these problems must be solved in shorter time frames. Fortunately, with advances in system observability and tracers, we have virtually endless custom metrics to aid performance analysis. The problem becomes which metrics to use, and how to navigate them quickly to locate the root cause of problems. System methodologies provide a starting point for analysis, as well as guidance for quickly moving through the metrics to root cause. They also pose questions that the existing metrics may not yet answer, which may be critical in solving the toughest problems. System methodologies include the USE method, workload characterization, drill-down analysis, off-CPU analysis, and more. This talk will discuss various system performance issues, and the methodologies, tools, and processes used to solve them. The focus is on single systems (any operating system), including single cloud instances, and quickly locating performance issues or exonerating the system. Many methodologies will be discussed, along with recommendations for their implementation, which may be as documented checklists of tools, or custom dashboards of supporting metrics. In general, you will learn to think differently about your systems, and how to ask better questions.”
Brendan Gregg is a senior performance architect at Netflix, where he does large scale computer performance design, evaluation, analysis, and tuning. He is the author of Systems Performance published by Prentice Hall, and received the USENIX LISA Award for Outstanding Achievement in System Administration. He has previously worked as a performance and kernel engineer, and has created performance analysis tools included in multiple operating systems, as well as visualizations and methodologies.