Finding the Best Movement Using MinMax with Alpha-Beta Trimming

I am working on an AI for the game, and I want to use the MinMax algorithm with Alpha-Beta cropping.

I have a general idea of ​​how this works, but I still can’t write code from scratch, so I spent the last two days looking for some kind of pseudo-code on the Internet.

My problem is that every pseudo-code I found on the Internet seems to be based on finding the value for the best move, while I need to return the best move, not the number.

My current code is based on this pseudo code ( source )

minimax(level, player, alpha, beta){ // player may be "computer" or "opponent" if (gameover || level == 0) return score children = all valid moves for this "player" if (player is computer, ie, max turn){ // Find max and store in alpha for each child { score = minimax(level - 1, opponent, alpha, beta) if (score > alpha) alpha = score if (alpha >= beta) break; // beta cut-off } return alpha } else (player is opponent, ie, min turn) // Find min and store in beta for each child { score = minimax(level - 1, computer, alpha, beta) if (score < beta) beta = score if (alpha >= beta) break; // alpha cut-off } return beta } } // Initial call with alpha=-inf and beta=inf minimax(2, computer, -inf, +inf) 

As you can see, this code returns a number, and I think it is necessary for everything to work (since the return number is used during recursion).

So I thought that I could use an external variable to store the best movement, and that’s how I changed the previous code:

 minimax(level, player, alpha, beta){ // player may be "computer" or "opponent" if (gameover || level == 0) return score children = all valid moves for this "player" if (player is computer, ie, max turn){ // Find max and store in alpha for each child { score = minimax(level - 1, opponent, alpha, beta) if (score > alpha) { alpha = score bestMove = current child // ROW THAT I ADDED TO UPDATE THE BEST MOVE } if (alpha >= beta) break; // beta cut-off } return alpha } else (player is opponent, ie, min turn) // Find min and store in beta for each child { score = minimax(level - 1, computer, alpha, beta) if (score < beta) beta = score if (alpha >= beta) break; // alpha cut-off } return beta } } // Initial call with alpha=-inf and beta=inf minimax(2, computer, -inf, +inf) 

Now, this is how it makes sense to me, because we need to update the best move only if it turns and if the move is better than the previous one.

So, although I believe this one is correct (even if I’m not 100% sure), the source also has a java implementation that updates bestMove even in the case of score < beta , and I don’t understand why.

An attempt with this implementation led my code to decide to move as best as possible from the opposite player, which does not seem to be correct (assuming that I am a black player, I am looking for the best move that I can make so that I expect a “back” move , not "white").

I don’t know if my pseudo-code (second) is the right way to find the best move using MinMax with alpha-beta trimming or if I need to update the best move even in points <beta.

Please do not hesitate to offer any new and better pseudo codes, if you prefer, I am not attached to anything, and I do not mind rewriting some code if it is better than mine.

EDIT:

Since I cannot understand the answers, I think that maybe the question does not ask what I want to know, so I am trying to write better here.

Provided that I want to get the best move for only one player and that this player , who is a maximizer , is passed to the MinMax function every time I need a new move (so minmax(2, black, a, b) returns the best move for black player, and minmax(2, white, a ,b) returns the best for the white player), how would you change the first pseudocode (or java implementation in the source) to save this given best move somewhere?

EDIT 2:

Let's see if we can do this.

This is my implementation, can you tell me if this is correct?

 //PlayerType is an enum with just White and Black values, opponent() returns the opposite player type protected int minMax(int alpha, int beta, int maxDepth, PlayerType player) { if (!canContinue()) { return 0; } ArrayList<Move> moves = sortMoves(generateLegalMoves(player)); Iterator<Move> movesIterator = moves.iterator(); int value = 0; boolean isMaximizer = (player.equals(playerType)); // playerType is the player used by the AI if (maxDepth == 0 || board.isGameOver()) { value = evaluateBoard(); return value; } while (movesIterator.hasNext()) { Move currentMove = movesIterator.next(); board.applyMove(currentMove); value = minMax(alpha, beta, maxDepth - 1, player.opponent()); board.undoLastMove(); if (isMaximizer) { if (value > alpha) { selectedMove = currentMove; alpha = value; } } else { if (value < beta) { beta = value; } } if (alpha >= beta) { break; } } return (isMaximizer) ? alpha : beta; } 

EDIT 3:

New implementation based on @Codor answer / comments

 private class MoveValue { public Move move; public int value; public MoveValue() { move = null; value = 0; } public MoveValue(Move move, int value) { this.move = move; this.value = value; } @Override public String toString() { return "MoveValue{" + "move=" + move + ", value=" + value + '}'; } } protected MoveValue minMax(int alpha, int beta, int maxDepth, PlayerType player) { if (!canContinue()) { return new MoveValue(); } ArrayList<Move> moves = sortMoves(generateLegalMoves(player)); Iterator<Move> movesIterator = moves.iterator(); MoveValue moveValue = new MoveValue(); boolean isMaximizer = (player.equals(playerType)); if (maxDepth == 0 || board.isGameOver()) { moveValue.value = evaluateBoard(); return moveValue; } while (movesIterator.hasNext()) { Move currentMove = movesIterator.next(); board.applyMove(currentMove); moveValue = minMax(alpha, beta, maxDepth - 1, player.opponent()); board.undoLastMove(); if (isMaximizer) { if (moveValue.value > alpha) { selectedMove = currentMove; alpha = moveValue.value; } } else { if (moveValue.value < beta) { beta = moveValue.value; selectedMove = currentMove; } } if (alpha >= beta) { break; } } return (isMaximizer) ? new MoveValue(selectedMove, alpha) : new MoveValue(selectedMove, beta); } 

I do not know if I am right or something is wrong, but I returned to the problem that I had when I asked the question:

calling minMax(Integer.MIN_VALUE, Integer.MAX_VALUE, 1, PlayerType.Black) returns a movement that only a white player can perform, and this is not what I need.

I need a better move for this player, not a better move for the whole board.

+5
source share
2 answers

After some research and a lot of time spent solving this problem, I came up with this solution that seems to work.

 private class MoveValue { public double returnValue; public Move returnMove; public MoveValue() { returnValue = 0; } public MoveValue(double returnValue) { this.returnValue = returnValue; } public MoveValue(double returnValue, Move returnMove) { this.returnValue = returnValue; this.returnMove = returnMove; } } protected MoveValue minMax(double alpha, double beta, int maxDepth, MarbleType player) { if (!canContinue()) { return new MoveValue(); } ArrayList<Move> moves = sortMoves(generateLegalMoves(player)); Iterator<Move> movesIterator = moves.iterator(); double value = 0; boolean isMaximizer = (player.equals(playerType)); if (maxDepth == 0 || board.isGameOver()) { value = evaluateBoard(); return new MoveValue(value); } MoveValue returnMove; MoveValue bestMove = null; if (isMaximizer) { while (movesIterator.hasNext()) { Move currentMove = movesIterator.next(); board.applyMove(currentMove); returnMove = minMax(alpha, beta, maxDepth - 1, player.opponent()); board.undoLastMove(); if ((bestMove == null) || (bestMove.returnValue < returnMove.returnValue)) { bestMove = returnMove; bestMove.returnMove = currentMove; } if (returnMove.returnValue > alpha) { alpha = returnMove.returnValue; bestMove = returnMove; } if (beta <= alpha) { bestMove.returnValue = beta; bestMove.returnMove = null; return bestMove; // pruning } } return bestMove; } else { while (movesIterator.hasNext()) { Move currentMove = movesIterator.next(); board.applyMove(currentMove); returnMove = minMax(alpha, beta, maxDepth - 1, player.opponent()); board.undoLastMove(); if ((bestMove == null) || (bestMove.returnValue > returnMove.returnValue)) { bestMove = returnMove; bestMove.returnMove = currentMove; } if (returnMove.returnValue < beta) { beta = returnMove.returnValue; bestMove = returnMove; } if (beta <= alpha) { bestMove.returnValue = alpha; bestMove.returnMove = null; return bestMove; // pruning } } return bestMove; } } 
+4
source

This is slightly different since this code is not a real Java implementation; To achieve what you want, there must be specific types to represent movement and position in the game tree. Typically, the game tree is not explicitly encoded, but moves in a sparse representation, where the implementation actually performs the move in question, recursively evaluates the resulting resulting resulting problem, and cancels the move using a depth search using the call stack to represent the current path.

To get a real best move, simply return the instance from your method, which maximizes the subsequent evaluation. It might be useful to first implement the Minimax algorithm without alpha-beta cropping , which is added to the next steps after the underlying structure works.

The implementation from the link in the question (section 1.5) actually returns the best move, as indicated in the following comment taken from there.

 /** Recursive minimax at level of depth for either maximizing or minimizing player. Return int[3] of {score, row, col} */ 

Here, the user type is not used to represent the movement, but the method returns three values, which are estimates of the best result and the coordinates that the player will move in order to actually perform the best move (which has already been made to obtain estimates), which represent the actual move.

+1
source

Source: https://habr.com/ru/post/1209359/


All Articles