Explore / Exploit

To build skill from scratch, you need to know How to Climb A Ladder.

But how do you know which ladder to climb?

There are millions of skills you could acquire, and billions of subtle variations. Which set of those skills and variations is the optimal path to getting what you want? How do you determine those things in advance, before you invest in learning them?

And what if "getting what you want" changes over time? What then?

Barring some form of omniscience, is there a better strategy than taking a wild guess and hoping for the best?

Yes, there is.

The Bandit Problem

Figuring out which skills will give you the best outcomes is very similar to a venerable and important problem in probability theory: the multi-armed bandit problem.

Here's a short version of the problem: imagine walking into a casino and deciding to play the slot machines. 1

There's a row of machines, each of which has a different probability of paying a reward when you pull the lever. Some machines pay more – some much more – than the other machines, but you're not sure which machine has the highest return.

If you knew the best machine in advance, you'd just pull that lever all day long, but you don't have a clue, and no one is going to tell you. The only way to find out is to start pulling levers, pay close attention, keep track of what works and what doesn't, and do the math.

There's a tradeoff to be made, however: when you choose to pull a lever you haven't pulled before, you get new information about that option, and that information is valuable in finding the best overall machine. But pulling the less-tested lever has an opportunity cost: you're not pulling the lever you currently think will give you the best return. There's a risk that the lever you pull will return less than what you would've brought in pulling the current optimal lever, and that's a very real cost.

Information is valuable, but it comes at a price - experimentation is sometimes a form of malinvestment. That insight is the key to solving the bandit problem.

Exploration and Exploitation

Without going too much into the math, the solution to the bandit problem is easy to understand: the optimal strategy is to start with a period of exploration, where you pull levers at random and gather information. When you have more information about what works and what doesn't, you shift to spending the majority of your time pulling the best lever (exploitation), but you keep exploring the other options in case your current best option isn't the very best that exists.

Here's the thing: the exploration phase never stops. Even if, in your heart of hearts, you're positively certain you've found the best possible option, you never stop experimenting, because the information you gather by experimenting is still valuable.

The only way to beat the bandit is to keep trying new things.

Life Is A Bandit Problem

There's a set of skills you can pick up that will help you get what you want out of life. You have no guarantee that anything you choose to learn will produce the intended outcome, and you start with very little information on what's best for you as an individual.

You do have a major advantage, though: other people are playing the same game, and you can watch what they do to gather information about what works and what doesn't without having to do the exact same things they did.

The optimal strategy remains roughly the same: experiment as much as possible, with as much variation as you can, and pay close attention to the experiments other people are doing. 2 As you find things that appear to produce the outcomes you want, spend more time and energy doing them. As your efforts produce results and your certainty in that option increases, increase your investment in that option accordingly.

But never stop experimenting: trying new options, discovering new opportunities, exploring new things. The master key to a satisfying life is experimentation. The more you experiment, the more you learn, the more information and options you'll have at your disposal, and the better the chance you'll discover the things that will produce the best outcomes for you.

You can't make positive discoveries that make your life better if you never try anything new. Start experimenting, and never stop.

Special thanks to Jon Kameen and Andre Davoodi for their thoughts on How to Climb A Ladder, which informed this post.

  1. Slot machines are sometimes referred to as "one-armed bandits," which is how the "bandit" problem got its name. I don't recommend playing slot machines, by the way - there's a reason casinos are profitable. Under standard odds and given a long enough period of time, the house always wins, so the only way to win is to refuse to play. Let's assume, for the sake of this thought experiment, you're not paying to play - each game only costs you the time it takes to pull the lever and see the result. 

  2. Of course, try to be sure the people you're observing have the same general goals, aspirations, and values that you have. Otherwise, you risk optimizing for something you don't really want. 

