Description
The task consists of programming a Python implementation of a player that plays Super Nim.
1.1 The Rules for Super-Nim
The game of Super-Nim is played by two players, max and min. The game situation is given by a number of heaps with sticks and a scoreboard for both players
1.1.1 Moves
Given a game state with a number of heaps, each of them containing a non-zero number of sticks, a move consists of one of the following:
- taking a heap and put it on another heap, as long as it does not have the same size; when you exceed 10 sticks on a heap, cap the number at 10. E.g. if you have heaps (2,4,2,5,7), you can put the rst heap onto the second, getting (6,2,5,7).
Or you can put the second heap onto the last instead, getting (2,2,5,10). What you cannot do is to put the rst heap onto the third as they have the same size.
- alternatively, you can perform a Collatz-Ulam step on the heap, i.e.:
- for a heap that has an odd number of sticks n, you can replace it by 3n + 1 sticks. The number is not capped. I.e., if you have heaps (2,4,2,5,7), you can replace the 4th heap by (2,4,2,16,7).
- for a heap that has an even number of sticks n, you can split it into two heaps of half size n=2. E.g. the second heap in (2,4,2,5,7) can be split to give (2,2,2,2,5,7).
If you split a heap of size 2, you get two 1-stick heaps | these are removed and converted into 2 points, for the player that made the move. So, if max played, max gets two points, if min plays, min gets two points, and the 1-stick heaps are removed.
For instance, if min splits the rst heap in (2,4,2,5,7), min gets 2 points, and the remaining heap is (4,2,5,7).
- the game ends when there are no more sticks to move.
- the purpose is for the max player to accumulate as many as possible points, and same for the min player. Both points are accumulated separately, the winner is determined by having the most points.
- note that the game may not necessarily end and cycles are possible. In this case, whoever keeps the most points is considered the winner (we do not require a hard criterion here).
1.1.2 Game Structure
The state of the Super Nim game is de ned by
<player>, <board tuple>, <score max>, <score min>
For instance
"min", (2, 4, 2, 3), 6, 2
would mean that it is min's turn, the board contains a heap with 2, a heap with 4, a heap with 2 and another heap with 3 sticks, the score is 6 points for max and 2 points for min.
Conventions Note that the board does not contain empty heaps (entries with 0 sticks; if you have any, make sure you remove them, as they would lead to unnecessary state duplication), and it is denoted by a tuple. Strictly spoken, you could use a list instead, but the tuple form will permit to use a dictionary with states as keys for the cache (you cannot use a list as a key for a dictionary, as it is mutable). If you wish to use a list, you will have to convert it to a tuple for use in the dictionary.
Examples Starting from the state "min", (2, 4, 2, 3), 6, 2, you can make a number of moves.
Here are a few examples:
- The rst heap can be placed on the second, giving "max", (6, 2, 3), 6, 2
- Alternatively, you could put the rst heap on the last, resulting in "max", (4, 2, 5), 6, 2
- Of course, the current player, min, could just cash in" on the third heap, giving 2 points, i.e. the successor state "max", (2, 4, 3), 6, 4
- min could also split the second heap, giving "max", (2, 2, 2, 3), 6, 2
- or they could apply an Ulam step on the last one: "max", (2, 4, 2, 10), 6, 2
Other moves, not mentioned, might be possible.
Note that we applied no capping. Also remember: Capping applies only when a heap is put onto another, and not on an Ulam step.
1.1.3 Winning/Eternal Play
The game ends if there are no further heaps to move. The player with the larger score wins, else, if the scores are identical, it's a draw.
The games may end up in a cycle. In this case, you can decide whether you run an eternal game" where scores may be accrued continuously, stopping the game after a larger (e.g. n 100) number of moves. Explain your choice in the report.
1.2 Task and Interface
1.2.1 Task
You have to write a Python program that is able to play Super-Nim.
We provide a depth cut-o minimax algorithm, including caching1 and a move selector that uses this cut-o minimax to select a move.
Game Class Write a class Super_Nim for the game which provides the following methods:
super nim.py
terminal(self, state): returns True if state is terminal
U terminal(self, state): returns the value of a terminal state (e.g. +100 or 100, depending how you wish to model that)
U heur(self, state): returns a heuristic value for a non-terminal state to permit early cut-o
player(self, state): returns "max" or "min", depending on whose turn it is in state
succ(self, state): returns a list of all successor states of state2
Random Player Write a random player that uses the succ(.) method of the Super_Nim class to generate random moves. Your player should consist of a function random_move(state) which takes a legal state and returns a legal successor state. You can assume that state is not terminal.
AI Player Compose the minimax and move functions (with cuto ) provided with the Super_Nim game class to provide a function move(game, state, depth). Use this move(.) function with a depth cut-o of your choice to write an AI player that tries to play Super Nim well. The AI player interface is a function ai_move(state) which takes a legal non-terminal state and returns a legal non-terminal state.
Game Manager Write a game manager program that can run a game of Super Nim between two players. They can be random players, AI players or (at your choice) human players.
1.2.2 Other Tasks
You should test your players using your game manager and analyse the games.
- at the very least make sure you have a random player that produces legal (even if not good) moves
- if you have an AI player, test that it makes winning moves if these are possible; you can set up favourable positions for the AI player to check that | if it can win, it should pick the winning move
- use the game manager to run
games between random and random player games between random and AI player
1Make sure your game states are immutable, so that they can be used as keys for caching | in particular, no lists, and tuples only, when you return them (internally, of course, a player can unpack them and make them mutable, but the nal states should be immutable if you want to use caching)
2If you know how to use a generator, i.e. if you understand how to use yield, your implementation of succ(state) can yield the successor states instead, one by one. If you do not know what a generator and/or yield is, just return a list of successor states.
from di erent starting states
- record the scores accumulating during the games, and if the games end, record the end score; if the games do not end, let the game manager cut o the game after 100 runs; create a statistics of how many games are won by the players when both are random and when one is random and the other one is an AI player.
Note whether the AI player loses systematically to the random player, since this indicates a bug.
- project work: experiment with other settings and combinations/solutions, for instance with other AI's, deeper minimax searches or alternative approaches to nd good moves.
Experiment also with slight modi cations of the rules to break loops or get more varied/interesting results. Your AI algorithms should, of course, continue to work.
Write a report where you explain what you implemented and the experiments, including results and graphs (e.g. for eternal games where rewards were accumulated over time, or even only at certain times in the game). Add any discussion of special methods you used to determine optimal moves, e.g. if you used additional algorithms (e.g. MCTS) or other heuristics to select good moves.
Summary At the very least make sure you produce a random player that produces legal moves. Then extend to an AI player. Once this is done, add a game manager and make sure that your report covers your design choices for the AI. Finally, your report should ideally analyse di erent games with di erent starting positions, esp. between the random player and the AI player, and investigate whether the AI player really performs better. Experiment with di erent settings and starting positions and rules.
Submission
2.1 Marking Criteria
Since this is a complex homework, do aim to solve the problem in components. As this is a project-type work, you can deviate from the structure suggested above, so the following marking scheme can only serve as a guideline, and will be adapted correspondingly.
Move Generator: 15 marks
for a move generator that generates the legal successors for a given game state (which includes the player turn). It includes the acceptance and return of the correct data structure (as in Sec. 1.1.2) for the game state.
Basic Player: 15 marks
for a Python player that returns some legal move for a given state. (Note that you do not need to generate all successors for that!)
Limit the moves to reasonable search time. Computation time should not regularly exceed 0.5-1 min.
AI Player: 8 marks
produce an AI player that plays takes a state and returns a legal successor, guided by some heuristic of your choice. You can use the provided minimax/move functions, or write your own functionality for that.
Game Manager: 7 marks Write a game manager that at the very least permits to run games of random player vs. random player or AI player.
Other Extensions (Mini-Project): 8 marks for any successful extension beyond above basic AI model; this includes, but is not limited to implementing and utilizing alphabeta/UCT or other methods in the player; experiments testing your players (random vs. AI) also with modi cations of the rules, and analysis, e.g. of end-game
Report: 7 marks for a report denoting:
- your solution to the AI player | this should not just be rehash of the algorithm, but explain what heuristic you used and how (if at all) you changed the provided algorithms to develop your players. The report should explain those aspects which are not easily seen in the code, especially the high-level aspects of your code where it deviates from the default.
You should explain if you used alpha beta and if you modi ed it to some extent, including Monte Carlo Tree Search, how you implemented it, where applicable.
Explain the results of your mini-projects: If you run a statistics over di erent games, show graphics and/or tables. If you try variations of the game, explain the variations; you can discuss end-game scenarios as per the results of your mini-project.
The report should be limited to 4 pages. Plan your research in such a way that you can cover your explanations in this space.
Mark Deductions: for unexplained and/or intransparent code, for an unclear report or a report that bears no obvious relevance to what has been coded or for failing to mention how to run the player. Specify whether you use Python 2.x or 3.x.
2.2 Submission Requirements
Submit a (g)zipped le with your anonymity number or some other nonidentifying tag (e.g. Super Gib.zip) which contains
- your player AI code as single or multiple Python (.py) les and
- your report (up to 4 pages A4) detailing the rationale of your approach as a .doc or pdf leon StudyNet.
Important Note: Do not submit your Python script code as .doc le. Python code in .doc format will not be marked; it loses indendation and will not work as Python code. The code is expected to run out of the box. Especially test your indentation and whether you return legal moves on .move(state) being called. This is half of the total mark
2.3 How the Work Is To Be Conducted
The work is to be conducted individually.
2.4 General Submission Requirements
The material is to be submitted on StudyNet.
- Hints
3.1 Player Development
Apart from the interface requirements outlined in Sec. 1.2, you are entirely free to design your playing strategies. Your submission will be marked according to the quality of your approach, AI architecture, original ideas, clarity of design (not necessarily how good it plays), and experiments.
In the following, some hints are given for a successful submission.
Move Generator: as highest priority, ensure that you develop a move generator that, for a given non-terminal input state is able to generate at least some legal successor states. Aim for completeness of the found moves only in a second step.
This generator is completely internal to your player and is not seen outside of it. For the basic player to perform, generating one legal successor state is su cient.
Basic Player: a minimal player consists of a player whose move(self, state) method selects one of the moves produced by the move generator above and returns it.
Hints:
- Devise at rst a minimal player that selects and plays a random move. Make sure it runs correctly, before you attempt any smarter extension. Already a random player will attain signi cant marks.
- It is emphatically recommended that you develop a working basic player rst, before attempt-ing any more complicated player strategy.
Minimax: 1. The game is too deep for full computation. You need to limit the depth of the mini-max search by modifying the minimax search by taking into account the depth, see provided software.
- For this, you will need to write a heuristic evaluation method U_heur.
Pruning
- The game is far too deep for full computation. You need to strongly prune the search tree and take into account the depth of the tree.
- For this, you will need to adapt the static evaluation function to handle un nished games.
- The branching factor might require a pruning heuristic which limits the number of moves investigated in a move (to guarantee the 1 min time limit from Sec. 2.1).
A very straightforward solution is to create a priority list for what moves should be de nitively considered, and what moves should be de nitively avoided. Sort the move list according to this criterium and cut o at a suitable point.
3.2 Style and Documentation
The submission should ful l the following requirements:
Coding Style: The code must be written in Python.
Your code needs to be well-structured, readable and thoroughly commented. Rather overcomment than undercomment. Obscure code will lead to mark deduction, especially if it does not work.
The marker will attempt to salvage code that does not work, but appears to be close to working; this is only possible if the code is well-structured, understandable and well documented.
Report: You are required to submit a separate report in pdf or .doc format, up to 4 pages, which details how your player operates. You should clearly explain how your player selects its moves. Explain extensions, and further experiments that you tried.
The report should not be just a single, large block of text, but should be sensibly structured. A good way of thinking about it is as a mini-thesis.
A possible (but not obligatory) structure could be:
General Idea, Architecture: Explain what you had in mind in generating your player, how the player is structured. Essentially, you need to outline how the innards" of move(self, state) look like.
There is of course no need to explain the client/server architecture, as this is given.
Move Generation: if you documented your code well, this section can be short, unless there are special tricks or ideas that you wish to speci cally discuss.
Search Algorithms Used and their Extension: What algorithms did you use for search? If Minimax or Alpha-Beta, explain how you extended them for cutting o depth and pruning.
If you tried and dismissed various ideas before you settled for your submission, explain these ideas in the appropriate section.
Further Extensions: If you used any further ideas to extend the players, whether they are par-ticularly clever, insightful, e cient or other, or if you found structure in the game that allowed you to make a much stronger player than by naive search. Explain variations studied and how well the various AI players performed as compared e.g. to the random player. Are there di erent strength AI players? Does a deeper-searching AI player nd better solutions or not?
Note: Together with your code, it is your responsibility to clearly indicate which le should be run to start the player.
Note that the report should rst and foremost give a high-level picture of what your code is doing
- detailed aspects of your solution should be commented in-code, not in the report unless they embody a major idea. There is no point in repeating code in the report on the same level of detail as in the code itself.
If you implemented various alternative solutions, you should document them and their development in the report. The software submitted should be the most recent (or advanced) one that works.
No comments:
Post a Comment