there are 30 years of detailed field research on this. Again, see Suchman's "Plans and Situated Actions," Dourish's "Where the Action Is," etc., or visit the ACM digital library and look at usability research (i.e. involving observation of real people in real settings) in CSCW, HCI, etc.
You have one basic fact wrong: they *do* have to think about what it's "time" to do.
Users in computer-at-desk contexts do not have a detailed roadmap for what to do on a click-by-click basis, either from their boss or inside their heads. They have a general set of goals for, say, the quarter ("Get this project launched"), perhaps the week ("Make sure everyone is on-task and progress is being made; keep the CTO appraised of any roadblocks"), and the day ("Put together charts and graphs for Wednesday's meeting to detail progress").
But it is *these* tasks that are "theoretical" quantities. They translate into dozens and dozens of clicks, mouse movements, UI interactions, and so on, many of them interdependent (or, in Suchman/Dourish terms, indexical—that is to say, order-important and constitutive of an evolving informational and UI flow context).
The user may have "Tell bob about tomorrow's meeting" already decided, but they are imagining Bob and imagining Bob *at* the meeting. From there, activity is practical and adaptive. They emphatically do *not* have this in their heads:
- Take mouse in right hand
- Flick mouse to lower-left to establish known position
- Move mouse 5 inches toward right, 0.5 inches toward top of desk to precise location of email icon
- Click email icon
- Wait 0.4 seconds for email window to appear
- Move mouse 7.2 inches toward top of desk, 2 inches toward left to precise location of To: field
- Click to focus on field
- Type "Bob"
- Wait 0.1 seconds for drop-down with completions to appear
- Hit down arrow three times to select correct Bob
- Press enter ...
You laugh, but in fact this is precisely what you're suggesting: that users have a roadmap already. They don't. That's why we invented the GUI—to provide a visual catalogue of available computing resources and an indication of how to access them on an as-needed basis. Then, the user has to decide, in the moment, what was needed. Every single attempt to make things more "simple" or more "efficient" by presenting *only* that one thing that designers imagined to be needed at a given time—the "obvious" next step—has led to users that either feel the system is useless, that fight it to get it to do what they want, or that simply go around the system (I'll just do this task offline, on a pad of paper). You can make very telling changes to users' productive workflows and levels of productivity by changing orderings or locations of icons, etc. Marketers also know this very well on the web (google "page hotspots" to see the research about positioning of advertising and how deeply it affects CPC and other factors in online marketing).
At a less granular level, something like "Get this project launched" is also not available in a detailed roadmap to a user. Go ahead, ask them to elaborate on the precise set of tasks involved in their big quarterly responsibility. They'll come up with 20, 30, maybe even 80 split into four or five sub-areas. But getting the project launched for an average middle manager over the course of a quarter involves tens of thousands or even hundreds of thousands of discrete actions, gestures, etc., some computing-based, some not, with the computing-based ones split across dozens of applications and contexts.
It cannot be mapped out because it is contingently assembled—it has to be done on an as-we-go-basis. So the tasks in the "to do list" (and, in fact, in cognitive behavior) are theorized ("Create a new instance of the platform on test VPN, set up credentials for team") rather than existing as a detailed, moment-by-moment list of actions. This is why user docs people actually have to sit down and use the system, and interact with designers, to write good docs. Because when anybody tries to make a complete mental map of an even 10 or 15 step software task and write it out *without the software actually in front of them*, they leave stuff out. If it was a test, most people—even people that use a piece of software or a feature *every single day*—would not score a hundred percent.
The detailed, moment-by-moment behavior is assembled by observing context—what's going on in the office, what's going on on the screen (where the mouse pointer is, where the files are, etc.) Users don't place their files randomly. Even people here don't. Why do we put them in folder hierarchies, in some cases even tag them? Why do some people use Evernote and others Bento and so on? All of these things are about allowing us to recall where things are because we cannot track all of our files and filenames (or, in fact, all of our information) mentally. We just don't have that kind of cognitive architecture.
So people put things into folders: "Bryce Account" and so on. But ask them, even a week later, to list all of the files in "Bryce Account" and what they contain. They can't do it. That's why they created the folder—because they literally cannot keep track of the information—the mind won't do it. The "Bryce Account" folder is a practical tool—collect all resourced related to the Bryce account in a folder called "Bryce Account" and continue to accumulate. Over time, whenever something needs to be done in relation to the Bryce account, open the "Bryce Account" folder—which is where all possible resources for the account live, and survey the resources to see what's needed from what's available (i.e. what's stored there).
The file icons on the desktop are essentially a folder/resource understood as "What I'm working on these days." The set of overlapping windows on the screen are essentially a folder/resource understood as "What I'm working on this minute." The dock/start menu/application menus are essentially a folder/resource understood as "What actions are available to me." Because *most users cannot keep track of all of this with any level of fidelity.* Humans are just bad at it. But what we're fabulous at is problem-solving: okay, given this list of "current tasks" and this list of "current enablements," what can I do next that will move me closer to "big goal?"
But for this to work, they have to have some other system (i.e. the desktop) that shows them these things, that collects their past actions and current options for them, so that rather than cognitive overhead, it's a ready resource for "deciding on and taking the next step."
This really isn't novel stuff. This has been around for a long, long time. Once again, it's why we invented:
- GUIs
- The desktop metaphor in particular
- Hierarchical file systems
- and so on
It's because we are practical, adaptive actors as a species. Our minds are oriented toward immediate problem-solving. We do not have deep, accurate recall or chess-computer-style planning capabilities. What we have is cybernetic in nature—conceive of general goal, make plausible initial action, observe state and relative progress toward goal following action, make plausible next action, observe state and relative progress toward goal following action, make plausible next action, and so on.
I believe the colloquial truism is "One step at a time."
Removing useful metaphors and a UI paradigm in which what's on the screen represents a broad cross-section of the user's current work context (as apart from the immediate computing task at hand) forces users to try and be what empirical research says they are not: detailed behavioral planners able to plot out click-by-click and motion-by-motion actions, acting on detailed maps of information, well ahead of time.
Again, check the research, academic and industry. This is not theory, this is well understood for decades now. It's why we bothered to invent most of what we bothered to invent in technology. It goes back to pre-PARC stuff. The unfounded theorizing, if any was done at all, was done by the GNOME people, IMHO.