Automatic difficulty adjustment in Match3 levels

Highlights Journal 112 Anton Heorhiiev May 3

Anton Heorhiiev

2 Posts

Hello! My name is Anton, and I am a game designer and match3 level designer.

I’ve been working on match3 games for over 15 years (am I considered a veteran yet? :)), participated in the creation of new games, and joined projects at various stages of their life cycle.

I’ve accumulated a lot of experience and have some free time, so I want to start a series of articles where I will share useful information.

The first topic I’d like to cover is the use of automatic difficulty adjustment in match3 levels.

When we play cool (successful) match3 projects, we often encounter a situation where you can’t pass a certain level for a long time, everyone remembers level 65 in Candy Crush, right?

Article content — Candy Crush 65 level (old)

But after a few days, or sessions, for example, you spent 5 lives, closed the game, and came back the next day, the level is passed on the first try. Wow! How did that happen?!

You immediately start thinking and analyzing how other levels were passed. Is this not a pattern? And it seems there’s something to it. Developers have come up with some algorithm that makes the game easier or harder, depending on certain conditions. Of course, we cannot claim that these algorithms definitely exist in any specific game, and I certainly won’t claim how it worked in Candy Crush back then and now, but let’s imagine that such algorithms exist and delve into them a bit.

There are many ways to change level difficulty, which can be divided into types – Unnoticeable to the player and Noticeable. Noticeable changes include altering the number of colors or even reducing the number of obstacles, you can even change the level topology. Thus, with each level launch, we would give the player a “different” level. Let’s agree right away that this is bad, disrespectful to the player, and generally not our choice. But with unnoticeable methods, things are much more interesting:

Changing the chance of a certain color dropping on a certain move;
Giving out pre-made combinations or auto-matching bonuses;
Using pre-calculated level generations by fixing the seed.

It’s important to note that each of these approaches can become noticeable in unskilled hands.

Changing the chance of a certain color dropping on a certain move; The generation of new tiles on the level uses drop chances for a specific color. For example, we have 3 colors on the level: blue 100%, yellow 100%, and red 50%. Thus, there will be approximately half as many red tiles as others. With these settings, an observant player might notice that there are fewer red tiles, but if we are working with 4 or even 5 colors, or changing the chance by 20-30%, the player will never notice this. Why reduce the drop of one or two colors? This is one of the most important tools for a level designer; this approach allows you to reduce the difficulty of the level, increase the number of bonuses collected, without fundamentally dropping the difficulty.

We’ve covered the theory of how we can use this for automatic difficulty adjustment. We can come up with an algorithm that will dynamically change the drop chance of one color, depending on the conditions. For example, if we have a hard level, the player has already spent 15 out of 25 moves, and the percentage of completed goals is less than half, we can change the drop chance for one color from 100 to 50, thereby increasing the probability of creating a bonus and winning the level. Yes, this does not guarantee the result; the impact will be minimal on some levels, but it increases our chances.

Giving out pre-made combinations or auto-matching bonuses; With this, there seem to be the fewest questions. Under certain conditions, we can provide chips such that when they fall onto the field, they automatically form bonuses, or conversely, provide chips that only form a match of 3 chips. It sounds very simple and not complicated, but the result is not always predictable. In this case, we get more control over the situation and can come up with rules under which combinations should be generated and what bonuses will result.

Both of these approaches have significant drawbacks: No guarantee of the result Difficult to configure

Each of these approaches does not guarantee the result. We can give the player bonuses, create automatic matches, but the impact will vary on each level, and we simply increase or decrease the chance of winning or losing. Furthermore, the impact will be unique for each level (the state of the level at the moment the algorithm started to interfere).

Okay, we’ve come up with some system of automatic assistance for the player. How do we now evaluate its results? The only way is to conduct an A/B test. We divided players into groups and look at our key metrics: ARPU, revenue, churn. If it got better – you are great, nothing changed? Think and adjust further. Or move on to other algorithms.

Generating chips on the level using Seeds; The principle of operation is very simple. We take a number (this is our seed), and based on it, we generate the initial chips on the level and all the chips that will appear on the level. Thus, if we restart the level and make the same moves, the result will always be the same. It’s important to note, if your team didn’t plan for determinism in the project’s code architecture beforehand, I have bad news for you.

Okay, we’ve taught our game to work with seeds. How do we now achieve the desired difficulty? You need an algorithm (bot) that will play your level. There are also several options here. You can calculate all possible moves on the selected level and count how many solutions lead to a win and how many to a loss; you will get the level difficulty. Sounds cool, but it’s very time-consuming. Random approach – making random moves on the level. This is bad; it’s suitable for finding bugs, but very far from the true state of affairs regarding level characteristics. Empirical approach – we can analyze all available moves on the level and assign a weight to each move. One move will use 3 chips, another move will use 4 chips, and another move will remove 5 goals of the level. This move for 5 goals seems the most correct (or the move for 4 chips?). It should be assigned the highest weight. We make moves according to the evaluation of the current state of the field, do 50-100-200-1k iterations, evaluate the level difficulty, get other level metrics (we’ll talk about this later). Cool? It seems so, but the devil is in the details.

We need to teach our algorithm to play like our player plays. The empirical approach won’t give precise results; you’ll get something like an “average temperature across the hospital”. What to do if we are just developing the game and want to predict level difficulty? I have bad news for you – you won’t be able to accurately configure your algorithm with certainty. Approximately? Yes, that’s possible. We need data from our audience, but there’s a level designer who created the level, isn’t their data suitable? No, because their skill differs from your players. I’ll surprise you even more: if you drastically change your user acquisition approach, for example, start buying misleading advertising, you again need to make an adjustment in the algorithm’s operation. How much data is needed? From my experience, 1k level completions were enough for us to consider the difficulty reliably and base the algorithm configuration on it (in fact, 1k is the minimum threshold, but at 100k, the results are quite accurate). And then we can delve into mathematics: Monte Carlo, reinforcement learning, and other scary words (maybe even wonderful ones :)). I’ll leave a few links:

Playtesting in Match3 Game Using Strategic Plays Reinforcement Learning

Statistical Modelling of Level Difficulty in Puzzle Games

Can you limit yourself to manually adjusting weights? Yes, if you have enough statistics from players, you can predict the difficulty of levels quite accurately, with an error of approximately ~10% on difficult levels and almost no errors on easy ones.

Okay, you’re not scared yet, full of enthusiasm, and ready to continue. I mentioned seeds, it’s their time.

Let’s assume that we have learned to predict the difficulty of a level. The levels were random, meaning each level launch was unique. Now we make a difficult level, fix the seed, and check its difficulty 1k times. We repeat this until the resulting difficulty matches our planned difficulty, as well as several seeds that will be easier. We check the result obtained on real players and see that the level is indeed difficult, and easier seeds give the expected difficulty result.

We’ve come a long way. We can fix seeds and get a predetermined difficulty for a level. Moreover, we’ve come up with an algorithm for when to give the player a difficult seed and when to give an easy one (we did, right? That’s why you’re reading this article?). It’s time to launch an A/B test and check our work.

Some levels show great results, while others don’t.

What metrics should we look at? I’m sure your company has its own set of metrics that you constantly monitor, but for analyzing match3 level A/B tests, the following are sufficient:

Monetization – the amount of coins (equivalent to real money or in-game currency) spent to pass the level. We count not only money spent with real currency but all expenditures of bonuses and coins on this level and divide by the number of players. Thus, we get a characteristic of the level’s monetization.

Difficulty. Each company has its own approach; this can be attempts per level completion, win rate, or fail rate. I will separately draw attention to the fact that you need to consider difficulty without using bonuses/extra moves and difficulty with them.

Churn (7-day) – the percentage of players who started passing the level and did not return to the game after 7 days (you can use 3 or 5 days).

What conclusion can we draw from this data? We significantly reduced earnings on the level, and churn significantly increased. In this situation, we can conclude that the level became worse.(You most likely understand that you cannot evaluate a specific level in isolation from the other levels; this is not enough, and you should look at metrics for a range of levels.) Why could this happen? The first reason might be when exactly our algorithm started providing an easier version of the level; maybe it happened too early? The player wasn’t ready to spend their bonuses/coins on passing the level yet? Or maybe you were unlucky, and the seed you fixed is bad? Yes, it’s difficult, but when losing, a lot of level goals remain?

Every company has its own set of metrics that characterize a level. Some look only at monetization and churn, while others additionally analyze a lot of other things, for example:

Number of reshuffles on the level.We know that reshuffles are unpleasant, players don’t like them, and we shouldn’t have many. Here, we can confidently say that anything that results in 2 or more reshuffles on a level is a reason to re-evaluate the level.
Number of bonuses collected. Players love levels with lots of bonuses; they are enjoyable to play. Number of remaining goals upon losing. This can be presented and analyzed in different ways, but the point is to control the percentage of level completion upon losing. Fuuu factor.
Number of remaining moves upon winning. Average or median value.
Level time. Yes, our level is limited by the number of moves, but time is spent thinking about a move and on bonus animations.
Distribution of collected goals throughout the level.

How do these metrics help? These are additional characteristics of the level. With their correct application, you can collect statistics from existing levels and improve those levels where you find deviations. At the same time, some levels will have deviations in characteristics but show great monetization metrics or low churn. This is normal, and you cannot be limited by dry statistics here. Accept exceptions, try to draw conclusions, and share your findings with the team.

We need to realize that the ‘fuuu factor’ and similar metrics are not the main indicators of success. Every level provides a unique player experience – it’s built on emotions. And can you really measure emotions?

A huge scope for analysis and improvement opens up before you. This should be a continuous process. You must analyze player metrics, analyze new levels, and create them according to the information that is relevant to your project, to your audience.

Okay, we have an algorithm, we have information on thousands of levels from real players. What about new elements? The algorithm doesn’t know how to work with them. Yes, that’s a problem, and when a new element appears, the algorithm will make mistakes, but the more data you have, the smaller the deviation will be.

We have smoothly transitioned to the main question. Is it mandatory to use automatic difficulty adjustment algorithms during the creation stage of a new project? My opinion is that you should not spend time on this. You have a huge number of questions that you need to solve. Focus your efforts on pleasant gameplay, beautiful effects and animations, think through the style and rules of level design, set up live ops. And only after you are sure that your project can be profitable, you can increase the team and work on improvements. At the initial stage, you need level designers who understand how to create quality and interesting levels. At the same time, if you have problems in other parts of the game, even high-quality levels will not be able to take the project to a completely different level.

Automatic difficulty adjustment in Match3 levels

Anton Heorhiiev

Some levels show great results, while others don’t.

Anton Heorhiiev

You may also like

Phasmid Games: The New Winning Concept in Mobile Gaming?

With the right AI stack, you can go from idea → ad → asset in hours.

UA Without Burnout: Productivity Hacks That Actually Work

Login to enjoy full advantages

Go Premium!

Enjoy the full advantage of the premium access.

Stop following

Cancel subscription