WHY, WHAT, and HOW
Learning to identify good research problems and building successful projects around them is the core skill you will learn in graduate school. Until you master this skill, the distinction between good research topics and bad can seem mysterious and arbitrary. A first step in learning to identify good research is to realize that most successful systems research projects answer three questions: WHY is it interesting? WHAT does it contribute? and HOW does it do it?
Learning to articulate answers to these questions will help you identify good research questions, refine your approach to solving them, and present your results in compelling way. It will also provide a framework for critiquing other research.
WHY
WHY is the first and most important question to ask about a research project. Why did you do all this work? Why is the problem you are solving important? Why does it need a(nother) solution? Why should I read this paper? You cannot have a good research project without a compelling WHY.
The importance of WHY and its acceptable answers set research apart from other reasons to build systems. In industry, you might build a system because your competitor built something similar. A hobbyist might build something because it’d be cool to build. You might also build a system to learn new skills or just for fun. Or, you might build a system because you think other people would find it useful. All these reasons can make for great projects, but they do not lead to good research.
For research, the answer to WHY must be that the project will extend the reach of human knowledge in a useful and novel way. The more important and far-reaching the WHY, the better the research project.
Here are some examples of good WHY:
- Code for library bindings in high-level languages (e.g., to call C from javascript) is a source of severe security problems that existing techniques cannot detect or prevent.
- In a conventional out-of-order superscalar microprocessor, many resources go unused because single threads lack sufficient internal parallelism.
- Non-volatile memories offer compelling performance advantages, but existing file systems prevent applications from exploiting them.
The WHY does not need to be novel or unique, and many projects reuse an existing WHY. There have been a huge number of papers written about processor branch prediction because the underlying WHY (i.e., “Mispredicted branches prevent CPUs from achieving peak performance”) is extremely important.
The best WHYs can spawn an entire area of research. In fact, the main contribution of some computer science papers is identifying a new WHY whose answers has far-reaching implications.
A good example is “The Case for Energy-Proportional Computing” with its WHY: “Computer system performance does not increase proportionally with system power consumption, leading to inefficiency at low utilization.” It has been cited over 2100 times since 2007, spawned a decade’s worth of work and lead to significant energy savings in data centers.
WHAT
The next question a research project must answers is WHAT: WHAT interesting and novel thing did you do to solve the problem? For most projects, the WHAT is the primary contribution.
To identify a project’s WHAT, is a good rule of thumb is that it includes everything interesting an expert would learn from the project. For this purpose, the expert is assumed to have read all papers in your field and related fields. For example:
- Solutions to common problems are not WHAT, because an expert system builder already knows how to solve those problems. However, novel solutions to existing problems can be WHAT, if the solutions are better than previous solutions.
- The high-level design of your systems, its components, and how they interact can all be WHAT, if the organization is novel. If the organization is basically similar to some existing systems, it is not WHAT.
- Being hard does not make something WHAT. Implementing a lock-free queue is a difficult programming challenge. It may be source of great frustration while it is not working and great pride and satisfaction when it finally does, but it is not WHAT: An expert would know (or could figure out) how to do it, because lock-free data structures are well-understood.
- Applying a lock-free queue in a clever way to solve a new and interesting problem that arose in your system could be WHAT, since the expert would learn about both the problem and its interesting solution.
Your WHAT also includes evidence to support the notion that an expert would learn something interesting from you work. This evidence comes in two forms: measurements (usually of performance) and related work.
If your system performs poorly, it is probably not interesting. To demonstrate good performance, you need to make measurements of your system operating in realistic conditions. Measurements can range from microbenchmarks measuring some particular aspect of system performance to real-world data from deployed systems.
If your system is not novel, it is not interesting. To convince an expert reviewer that your work is novel, you need to discuss how it relates to other work that solved similar problems or solved a different problem in a similar way.
HOW
The final question a research paper must answer is HOW: HOW did you implement your system? HOW is not part of the contribution, but it can be the difference between a project that changes the world and one that goes unnoticed.
There are many different levels of HOW, ranging from a high-level design description to the source code. It includes all the design decisions you made — which compiler you used, how you tested the system, the names of functions or data structures in your source code, etc. Everything about your project that is not WHAT or WHY is HOW.
Even though it is not part of a project’s motivation or contribution, HOW is critically important because it affects the project’s credibility and impact in (at least) two ways.
First, it demonstrates that the system was built skillfully, thoughtfully, and expertly.
For instance, using per-CPU data structures can significantly improve performance in some systems by eliminating lock contention. Per-CPU data structures are not WHAT (they are well-known). Using them appropriately, however, demonstrates that you have thought carefully about building your system and are aware of well-known techniques that might affect your results.
Second, your HOW demonstrates how thoroughly you have addressed the challenges in your WHY and WHAT.
For instance, whether the project built/used a simulator or built a real system is a critical piece of HOW. In simulation, it is easy to accidently simplify away important problems. Building a real system, however, requires that you really solve all the problems your project presents in all their complicated glory. There are also different levels of “real.” Building a working prototype is good , but deploying the system in real world to do real work is far better.
Pursuing good HOW without an good WHY and WHAT is an easy trap for new systems researchers to fall into. The basic misunderstanding is that if something is hard, it must be research. Linux is great example of a non-research project with impeccable HOW: It is an industrial-strength operating system deployed on millions of systems worldwide. But Linux is actually a pretty old-fashioned operating system and did not address an interesting WHY or provide a novel WHAT.