GT - Strategies Without Frontiers - Meredith L. Patterson

Name: GT - Strategies Without Frontiers - Meredith L. Patterson
Uploaded: 2016-12-18
Duration: 1 h 21 min 22 s
Description: GT - Strategies Without Frontiers - Meredith L. Patterson Ground Truth BSidesLV 2014 - Tuscany Hotel - August 05, 2014

BSides Las Vegas1:21:22118 viewsPublished 2016-12Watch on YouTube ↗

About this talk

GT - Strategies Without Frontiers - Meredith L. Patterson Ground Truth BSidesLV 2014 - Tuscany Hotel - August 05, 2014

Show transcript [en]

all right so this is mostly a talk about Game Theory which as a discipline is right about as old as general purpose Computing it was founded in 1944 by Oscar Morgan Stern and this other guy by the name of John Von noyman um so as a field game theory is considered part of Economics um which is which is really the study of decision making it's way more than just you know microeconomics macro mro macroeconomics where money goes this is about decision- making about all kinds of resources and it it's always seemed strange to me that the study of making better and better decisions is referred to as the Dismal science although to be fair the more that you look at the

problem of allocating finite resources the more hard truths you end up running up against about both physics and human nature so what I like about Game Theory is that it provides a framework for refining our decision-making models as more information about the structure of the data that we encounter comes in and we're going to we're going to build up um a model for you know what these kinds of uh what these kinds of structures can look like um throughout the course of the talk starting very simple and moving up to uh games as complex as an entire Society so um why am I not actually getting my speaker notes that's really irritating um okay so I don't like

boring problems um and there are a lot of hold on that's GNA be a problem all right so I don't like boring problems and th this if you followed my if you followed my language theoretic security work at all you already know this because you know we're we're taking individual problems of things like SQL injection cross-site scripting hoisting them up to the to the metal level saying what's the structure that's really going on here how can we turn making sure that the input we care about is actually the input we want uh into a tractable decidable problem I especially hate solving the same damn problem over and over and over again um again I like solutions that

generalize unfortunately um most of the internet is the same [ __ ] boring problem over and over again um both when it comes to decidable problems like injection and uh completely undecidable problems like uh how do humans cooperate and not get into giant fights all the time and this applies you know both in terms of I have a bunch of data sitting in the cloud people access it how can I tell whether the people who are trying to access it are doing so legitimately or not and on social media I mean this is not my circus these are not my monkeys I'm just tired of it and so I want to solve the problem Keith Alexander is

Consulting for $600,000 a month right now uh on the grounds of some kind of behavior analytics Secret Sauce so I mean other people are thinking about the problems of how do we predict how other people are going to behave so other things that we will see in this talk include some information Theory um keep the Shannon Weaver model of communication in your head um there will be two end points communic communicating over a possibly noisy channel that has finite bandwidth they have to serialize their messages to the channel and they have to parse incoming messages off the channel and both serialization and parsing can produce errors there will also be quite a bit of probability Theory although we're going

to be uh pushing most of that under the hood of a library that cpfr here is responsible for this is not really a Langs SEC talk but we're still going to be talking about boundaries of confidence particularly in signaling games which involve people communicating to each other and then making action taking some action in a signaling game how much confidence you can have in the signal that you receive being the one that was actually transmitted again that Shannon Weaver problem that's going to depend on how reliably you can receive signals in the language of the channel you know if the language is Spanish and um you know you're getting by with what you learned in kindergarten you're not going to be

receiving messages very reliably how reliably the sender serializes them also matters we won't be getting super deep into feedback loops but if you know how they work uh it could be useful to keep them in mind um I also kind of lied about the only math you need being the ability to compare two numbers um it will help later in the talk if you can read first order logic notation but it doesn't actually matter I'll explain it all in English um and also hasle that's his

fault so when an unknown agent acts how do you react to what they do there are number of things that you can take into account informationally when you're making this decision first of all what kind of side effects does it have on the surrounding environment what kind of signals does it send this is so important that they named an entire class of games after them um interactions that have happened in the past both with you and with other with other entities you know here the quality of your data is really important you know if if you're if you're getting fed noisy information about how uh an agent has interacted with other agents in the past the extent to which that can

help you is limited although there are probabilistic methods that can make that a little better um again there won't be all that much of an appearance of Lang sec in this talk but when all of the agents are machines this is relevant you know I mentioned um you know Keith Alexander's uh Behavior modeling stuff earlier but you know think about the DARPA cyber Grand Challenge who do you think is going to be driving all of those automated exploit generators people I mean at first probably yeah but you know not forever drones are expensive and you know if if your drone goes down you want someone to blame um you more servers you can spin those up any goddamn time time

and in any case being able to tell where formal language Theory matters which is to say which problems are decidable and which problems aren't is an important distinction decidable problems are Priceless for everything else there's theistic and when those inevitably fail there's MasterCard and so game theory is the framework that we will be building up this knowledge around but we'll be pulling from all of these fields that I just mentioned

so um this talk is basically divided into three parts uh we're going to start out talking about classical Game Theory which is two players just one interaction uh two choices you know very very simple conceptual framework that we're then going to expand on we'll start with the mathematical explanation of it and then take a look at how it's been used in Psychology then we're going to take a look at how games that last longer than one move can change over time time um we'll start with the extensive form of iterated games uh which basically expands uh a matric out into a tree um that you're then basically moving around and uh pruning other people's branches

off of um and then we'll also look at multiplayer games and games that run for uh many many rounds and finally um then we're going to shift tactics and take a look at okay well if all we have is data and we have to infer structure how do we go about doing that so on to classical Game

Theory all right so the definition of a game includes four elements a game has players uh actors who take some actions those actors have information available to them when they make decisions they have a set of possible actions at each point where they have to make a decision and there is some set of payoffs for each outcome um in the in the classical case we assume that the players know what the payoffs are but later on we'll be looking at what happens when you don't actually know what the payoffs are so from these four properties that Define a game we can derive strategy IES um that suggest how players ought to play um a pure strategy is um at each

turn there is one obvious move you should make um a mixed strategy is at each move there's some probability distribution over moves that you could make um there's also a thing called Behavior strategies which are similar to uh uh to mix strategies but we'll look at we'll look more closely at them later um another uh property that falls out of the base structure is uh an equilibrium an equilibrium is a state from which uh you know all players have made their choices and it would not be rational for any of them to make a different choice they would be they they would be subjecting themselves to a loss of some kind if they were to

if they were to change their move and this can sometimes result in situations where there's a better possible outcome than the one that you could get at equilibrium but nobody but but it's in nobody's self-interest to make the choice that would that would lead to that better Cooperative outcome and so nobody actually makes the choice um and we'll also see that um the structure of games um also defines what kinds of equilibrium they can have there there are refinements on equilibrium where um the where it becomes resistant to things like noise U again we'll look at that later as well okay so this is what a normal form game looks like um it's uh this is a

generic structure um on the axes we have actions cooperate and defect those are just kind of the generic actions that uh uh that people assign to uh to to actions in a game um they can be other things which we'll see in a which we'll see in a moment um you know cooperate and defect sounds like it's kind of morally valanced but actions themselves are not necessarily um so the uh the text in red corresponds to the row player um it's you the the rows are what's annotated the rows are uh where the red is on the axis uh blue is the column player um and in the cells of the Matrix um we have the payoffs that each

player gets if the if those two actions are chosen so if both players cooperate red gets uh gets the a payoff and uh blue gets the B payoff if they both defect red gets the G pay off and blue gets the H pay off

Etc so uh I mentioned strategies before um if there is a pure strategy for the game then that's at every situation there is one obvious move um for a mixed strategy um there's a probability distribution over possible moves and you're basically rolling the dice at every step of uh of the game tree um and then behavioral strategies um assign probabilities at information sets so they're functionally equivalent as long as the player has perfect recall um um behavioral strategies are more you know what happens if players have faulty memories um which is a little more like how people actually act in real life so the prisoners dilemma this was first described in 1950 by Merl flood and

Melvin drer and there are four payoffs total in the game um the payment that the payoff that you get for screwing the other guy um D and E um in in this case you'll notice that uh the best you can do in this game is is zero so the so the idea here is you know you've got two prisoners um one of the they both have uh information on what the other person did and the police want them to rat out their buddy um so cooperating is saying nope I'm not going to snitch and defecting is ratting out your buddy so if you both cooperate and refuse to rat each other out out then the cops find some minor thing to get

you both on and you both do a year in jail um but that's all um if you turn in your buddy and your buddy doesn't snitch on you then you go free and your buddy does three years and if you both snitch snitch on each other then you both do two years so the Temptation is Go free the reward is due just one year the punishment is um do two years and then the sucker payoff uh what you get if if you're the cooperator who gets defected on is you do three years in jail so a couple of properties that fall out of this um the reward payoff is greater than the punishment payoff and

this ends up meaning that Mutual Co cooperation is better than Mutual defection you know which is kind of obvious you know it's better to only do one year in jail than to do two years in jail however because the Temptation is greater than the reward the Temptation for screwing your buddy is greater than and going free is greater than the reward of only doing a year in jail um and also the uh the payoff for both of you defecting the punishment payoff uh is greater than the you got screwed over payoff uh defecting is the dominant strategy for both agents so mutually cooperation is useful but in terms of self-interest it is in neither agent self-interest to uh uh to

cooperate instead of defecting so this is a dilemma you know at the individual level defection is superior to cooperation and it's individuals who act so here's another game that doesn't quite have that that moral loading of cooperation and defection um this game is basically rock paper scissors but you only have two options um you have two P you have a penny and your opponent has a penny and each of you choose uh to turn the penny to either heads or tails and then you reveal your choice at the same time um and the uh The Matrix describes the payoffs so there is no pure strategy that is a best response here because what you always want is the opposite of

what your opponent picks um so if you're playing if you're playing this over and over and over again kind of like if you're playing rock paper scissors over and over and over again you might be able to infer some pattern in how your opponent makes those choices but in the you know in in the one round case you really can't do better than random um so this game has an even different set of properties it's called deadlock um so the idea here is that um the the outcome that is best for everybody also happens to be the dominant outcome there there's not any conflict between self-interest and mutual benefit um so I mean this is

really kind of a boring game right uh the the the pretty the obvious choice is uh is just defect defect um and go on with the rest of your life um but it's kind of an interesting basis for a signaling game because there is a little bit of there is a little bit of incentive to screw the other

guy all right so now um this type of game is a cooperation game um Jean Jac rouso originally described this long before game theory was even a a glimmer in Von neyman's eye um he described the problem of you've got two hunters and they're going out to the woods to hunt and they can either work together to bring down a stag or they can uh go off alone and shoot rabbits so if they agree to go hunt hunt stag and one of them decides in the middle oh screw it I'm just going to go get a rabbit instead then his buddy goes home with nothing um and he goes home with a rabbit um but if

they if they decide to if they decide to cooperate uh then they both come home with more than they would have had so there are actually two pure strategy equilibria here you're you're better off either always cooperating or always always defecting um we say that cooperating is payoff dominant and defecting is risk

dominant so in the Stag hunt people had to coordinate in order to obtain the best payoff in the game of chicken where you and your opponent are driving down a road and each of you has to decide whether to swerve at the last minute or keep going um this is more of an anti-coordination game because if you choose the same action particularly if that action is going straight um you produce the extremely large negative externality of crashing your car and probably dying um and then of course if you if you both swerve then well you know you're both equally wimps what we're seeing by the way at the bottom here with uh with all with

all of these inequalities um is really just uh defining a partial order on the payoffs any game that has this kind of uh ordering of the of the payoffs is an is an instance of this class of game so with chicken um there's actually a really interesting um variant of it called Hawk and Dove which was proposed by John Maynard Smith and George price in 1973 in nature to describe conflict uh over resources among animals so not even rational at all um just how do animals make choices I mean Game Theory usually assumes that the that the actors are rational but you know when you're when you're modeling animals basically all you can do is say yep they got

preferences they try and satisfy them so here V is the value of the resource that's in dispute um so if you if you share it then you split it and C is the cost of getting into a fight so um you know if you if you both decide to fight then um you both you basically split the you you split the reward and the cost of uh of having gotten into a fight so this is also often uh examined as a signaling game because there's a uh if if you include a round where the players get to threaten each other before they choose what they're moves are um and The credibility of that signal um May influence whether the

um uh whether the player takes an action or not and you know credibility is usually modeled pretty fuzzily um and I haven't actually gotten to this point in the code that we're going to be presenting later but uh I would I would ultimately like to get to the point where we can actually make you know reasonable heuristic uh estimations of like you know how much can I actually trust this signal that I'm receiving ceing so this is another coordination game called that's that's also known as conflicting interest coordination the idea here is you've got you know one partner wants to go to the Opera the other one wants to go to the ball game but they'd rather be together than go to

different events and they've forgotten each one they've forgotten which thing they're going to go to tonight each one knows that the other one forgot and they can't communicate like they lost their cell phones or something so where should each of them go so on the pure strategy level uh again we have two equilibria kind of like the Stag hunt either both people go to the Opera or both people go to the ball game but that's not really fair because that means that one person is consistently getting a higher payoff than the other and usually in a coordination game you know you want to try to be fair to the other the other person um there's also a mixed strategy

which is go to the event that you like best with 60% probability um which you what you can see uh essentially is a sum of the payoffs um you know your your preferred event uh you know if you prefer it then you get uh a payoff of three out of the total of five so that's three fifths 60% but that's inefficient because um players end up miscoordination

I'm always going to go to that thing I

hate so what have we seen so far first of all games can be zero sum or they can be non-zero sum um in a zero sum game the gains and the losses of all of the players balance out to zero so the matching pennies game is zero sum but the prisoners dilemma and the Stag Hunter not um all zero some games are competitive um people are you know contesting a finite pool of stuff um nonzero sum games can be competitive or non-competitive and really way more games are non-zero sum than are zero sum anyway so they're more interesting so uh let's see um we talked about conflict and cooperation um you know coordination is hard and when your

payoff Matrix is structured in such a way that um it incentivizes doing the self-interested thing at the individual level you know trust comes into the picture you have to figure out okay you know how am I going to decide whether I'm going to rely on this person actually doing what they signal they were going to do but an action is just an action um you know there's nothing inherently good or bad about choosing heads or taals in matching pennies um furthermore the morality of snitching in the prisoners dilemma depends on on what your ethical framework is about snitching and the morality of going off to hunt rabbits in the Stag hunt depends on whether you

agreed to hunt a stag first and how seriously you take keeping your word but as we go on we'll be looking at more complicated games ones that go on for longer ones that have more players uh ones where players have uncertain information about each other and even ones where the game that is being played changes form as the game goes on but before that uh I just want to talk about uh a couple of the different kinds of equilibrium um so game theoretic equilibrium really has its roots in the idea of Corno equilibrium which uh was uh established by antoan August dorau in 1838 so he was talking about businesses like factories or you know shoe makers

um you know people who are producing some product and they're in a market with other actors who are producing that product and other products um and they have to decide uh what are you know how much are they going to produce of what uh in order to maximize their profit so Nash equilibrium we've already talked about um that's where uh each actor is making the best decision that they can given what they know about each other's decisions and this is really a generalization of the Corno equilibrium in that um in the nashy in the Nashi Kini case you're not necessarily trying to maximize your output um you're trying to maximize something else and we we

were we we ball up you know all of the possible something else's into this meta concept of payoff but it goes further so a subgame is a subset of the tree of a game um and in subgame perfect equilibrium all sub games have a Nash equilibrium so if you can if you have a game that exhibits subgame perfect equilibrium um you can start at the outcomes and then work backward pruning off branches that would involve a player making a move that they are not incentivized to make which is to say a non-optimal move and it gets better uh some games even exhibit uh a thing called trembling hand equilibrium where um there's a possibility that a player might just do

the wrong thing you might your hand might slip and you hit the big red button instead of picking up the phone to call the Kremlin and say hey the war is off um so what this ends up meaning in practice is that if you can structure the payoffs of a game you expect to play such that that game exhibits a more refined equilibrium you have more tools to make decisions um how much you can actually influence the payoffs is a very open question but that's essentially what the what these are useful for so now let's move on to psychology um so traditional Game Theory assumes that all agents are rational but in the 1960s Eric burn looked at

Irrational Games basically uh the kinds of social games that people entice each other into for attention or for sympathy or for other kinds of psychological payoffs while hiding their true motives and the ulterior part is the important part that's what distinguishes burns games from like the more General Game Theory games so he drops the assumption that players are driven by the most rational agents of their nature and he looks at the payoffs of these ulterior motive social games as ways for players to satisfy unmet emotional needs so in effect we're now considering players to have two sets of preferences that impact their decision-making one that um system 2 in uh Daniel Conan's Thinking Fast and

Slow terminology you know system two is your rational mind and so you use one set of preferences when you're making considered decisions but you have an entirely different set of preferences that your prerational system one uses for making quick heuristic decisions and these you know these might be decisions driven by emotion or driven by training that you've had um so humans are social animals we've all had we all have biological drives to interact with other members of our species to some extent or another and when that drive demands to be satisfied an argument can serve can serve the exact same purpose as a productive discussion or even a hug if what a person is fundamentally looking for is

just external recognition that they exist at all so payoff basically comes you know on a you know in a biochemical level here in the form of neurotransmitter activity you so burn was doing this stuff in the 60s and you know they were they were just figuring out Parkinson's then so you know we didn't know a whole lot about like you know the role of dopamine and addiction or anything like that and the Imaging equipment that we need to investigate this directly is you know either still really expensive or doesn't exist but we can blackbox it um and what I actually use in my own mental model is behaviorism um the idea is that each player EXP experiences some consequences

from each interaction either as reinforcement or as punishment so there there's you know there there's positive and negative reinforcement and Punishment so positive reinforcement is you know a rewarding stimulus you know like a candy or a kiss um negative reinforcement is when an aversive stimulus goes away so like you know they stop they sto beating you because the morale improved um positive punishment on the other hand sounds kind of paradoxical but um it's an aversive stimulus somebody yelling at you um and then negative reinforcement is removal of a rewarding stimulus you know they take the Foosball Table Away um and burn identified a couple of different uh sort of Primal hungers that that people seek to have Satisfied by

playing games uh stimulus hunger recognition hunger and structure hunger um it's probably also useful to think about status hunger um although that's uh that's also probably a combination of recognition hunger and structure hunger so um burn enumerated about uh five general types of interactions um first of all procedures uh a series of complimentary transactions towards some physical end you know you're you're putting a bike together um operations are similar to procedures they're a set of transactions that you undertake for some stated purpose you know if you ask explicitly for something like reassurance or support and you get it that's an operation a ritual is a stereotyped series of simple complimentary transactions you know one complimenting

the other um burn says programmed by external social forces but basically what he what he's saying here is you know when when you say hi to the per to the guy up the street you know there there's kind of a ritual for how you say hello to each other and how long you how long you go on talking before you both go on your way so a Pastime is an iterated ritual and it carries State um people spend a lot of time on pastimes which is kind of where they're called that um for most people Facebook is largely a pasttime uh ditto for Twitter this can you and and because pastimes can carry state um they can

they have the Poss they have the potential to turn into status gaming you know as sort of a pecking order gets established um and when different clusters that have you know they they use the same uh they use the same game board for their Pastime you know they're all on Facebook or they're all on Twitter but they're they're different clusters and they have sort of different their own their their own group norms for how discourse should work you end up getting fireworks because pastimes have this ritual quality you know they include jargon um you know you kind of have to speak the lingo um they they include signaling of certain beliefs um and if you don't know what pre-existing

State you're walking into um you may very well end up sending a signal that uh upsets a whole bunch of people and then suddenly you have the whole internet on your ass and finally um the games that uh that burn calls games uh are specifically a series of ulterior transactions so the person who uh you know the person who starts it is hiding their motives and they're trying to progress to you know a predictable outcome they have some goal in mind but they're not being upfront about what that goal is so you know we we said that uh you know if you ask for reassurance and you get it that's an operation you know if you ask for reassurance but then

you flip that around on the person and use it to make them feel bad somehow um that's a game so Burns games also have some structure uh they have players the players take actions in response to each other and there are payoffs um he talks about five of them um existential Advantage seems pretty similar to confirmation bias to me I think you I there there there it seems like there's some the there there's some similar Machinery going on there you know you believed something about the state of the world like you know all martians are pigs or something um and playing the game out to its end um confirms that in some way um you know and and so it's that sense that

events in the world are confirming your beliefs about how the world works even if you actually manipulated the events towards that end uh so internal psychological advantage is just the raw emotional payoff um that pretty much analogous to positive reinforcement um and similarly external psychological advantag is analogous to negative reinforcement um if you win you're raising the likelihood that you'll behave that way again because you've reinforced the evidence that playing games works and then you've also got internal and external social Advantage which are basically about status and limiting other players moves um you know if you signal that you're oppressed uh people who think that oppression is important are going to limit their own actions on

your behalf and you know this is exploitable so what are some examples of uh these ulterior motive games that that burn describes kick me so the idea behind this is you know somebody really just wants sympathy and so they they they just go around and being as annoying as possible until somebody just can't until somebody just you know takes the the punch and then and and then you complain about it and you know you you're trying to you're you know you're trying to basically one up people and say oh I'm way worse off than you are give me sympathy give me sympathy um another one uh ain't it awful um the way he describes this you

know it can be a Pastime you know people complaining about oh the state of the world is so terrible and just sort of taking turns uh you know describing how great things used to be in the old days and how it isn't now um but it can also manifest as a game um you know if the uh if the player uh ma displays distress uh and they get sympathy and help uh in response to that distress that can be pretty seductive and uh taken to the pathological extreme um that's called mowen syndrome or mowen by proxy um actually that's just one possible extreme of that um another one why don't you yes but um so what the initiator really

wants is reassurance that their problem is not their fault but you know they get it in a manipulative Way by challenging people to find a solution that they can't find fault with you know they they don't they don't really want a solution they just they they just want to to be reassured that it's not their fault and so they can nitpick anything to death um another one that uh that we see a lot on the internet uh courtroom pick some victim uh basically scapegoat and then just verbally tear them tear them apart um in front of a judge if you can um you know this is what happens when uh Bobby and Susie are arguing over a toy

and Bobby says Susie hit me and uh and mom has to decide between the two um but on the internet we often get this this ailis uh all jury all self-appointed jury version um that uh I I don't think burn really anticipated so the Badger game is actually um a classic con job um so the idea here is um guy picks up uh a woman in a bar uh she takes him off to a hotel room and me and Infante decto you know her husband or her boyfriend uh shows up and threatens to either uh and and threatens to to beat the hell out of the guy um and the the the Confederate and the and the

aggressor uh basically nudge the guy to just give up his wallet and and you can leave um so you know it can be for uh you burn called this game now I've got you you son of a [ __ ] um and you can play it two-handed um wi with just a victim and just an aggressor as long as the victim does something that the aggressor can construe as the victim screwing up in some way but you know with the with the Confederate then uh the Confederate and the aggressor um work together the Confederate lures the victim into provoking the aggressor who then accuses the victim of you know hey what are you doing with my wife um the

victim defends himself um and is you know and that's when you get the shift to well maybe you just you we we'll let you go if you just give if you just give us your wallet um you know he tries to defend himself and ends up either getting beat up or giving up the wallet and trolling another one that I don't think burn anticipated but it's basically the schlam game this is it's often about getting the target to embarrass themselves in some way uh typically by overreacting and saying something that they're going to regret later although to be honest I really doubt whether the target ever actually does regret it later but know we'll set that aside for now so the idea

is the troll provokes somebody who you know then has a bunch of resentment to deal with and they have to decide am I going to yell at this person or just you know suck it up and be the better person and take it lather rinse repeat and if B snaps then player a uh appears justified in stepping it up and just accelerating it so but if if they keep their you know the person still just keeps pushing the boundary and pushing the boundary so you know it's it's really it's it's a losing game for the person being trolled either way um you know burn talks about there being a a phase of the game I guess in

situations where people expect to see each other again like you know we're part of the same Social Circle and run into each other at parties where there's this apology and forgiveness phase where the person apologizes for screwing up and then basically just takes that as license to go run around screwing up more and more and more um but trolls aren't really in it for the Forgiveness so this might be better considered a modification of that game so not notice that a trolls actions revolve pretty much entirely around sending signals to some receiver in an attempt to provoke an overreaction so if you engage that's a feedback loop that provides the troll with more material to

feed into its signal generation function so the only winning move is really not to play and on that note let's take a closer look at the class of games that we can use to model interactions involving two-way communication signaling games well but actually yeah I'm I'm actually running out of time so I'm gonna skip this slide uh yeah so this is from a British game show called Golden Balls get your laughter out of the way now um because uh you know you're going to hear in the in the video clips to follow you're going to hear this uh you're going to hear the word balls more often than any other noun in you know in any of the clips

that follow I know I counted so ah hang on I'm gonna have to restart that just a second audio issues that's on okay here I hope this works damn it okay one more try you each have a golden ball with the word split written inside you both have a ball with the word steel written inside you will know which is split and which is steel because you can have a look all right so that was the setup um let's talk about uh what this game actually involves so this is the very beginning of an extensive form game tree for this game uh the empty dot in the middle is the root and that indicates who makes

the first move in this case the first player so traditionally nature makes the first move and that move is is considered to be choosing the type of the player you know in a job interview is the candidate that you're interviewing competent or incompetent when you buy somebody a drink are they interested in you or they not interested in you if you're deciding whether to tell somebody a secret are they trustworthy or are they untrustworthy but since the first player has already decided whether he's going to split or steal he's going to make the first

move if you both pick the split ball you split the 13,600 and you go home with 6,800 each can any of you even hear that okay yeah I I'm also not able to get for some reason I'm not able to get this playing on the uh playing on the on the projector either but we'll try we we'll try to move on um okay so we're building out the uh the game tree here um so hang on there we go um so from the Split Branch um we see that there are two possible signals that the red that player one the red player can send so it branches out to uh signal a or signal B

um if player two also pick split then they split the uh they split the 13 the 13,000 um what happens if they if one of them splits and one of them steals if one of you chooses the steel ball and the other chooses the split ball whoever chooses the steel ball goes home with the whole lot 13,600 so this expands the uh the steel Branch um for both for both sides um both on the right where we now know what happens if the red player steals and the blue player splits and on the left where we know what happens if the red player splits and the blue player steals

oh there we go those choose the steel ball you leave today's game with what you came with nothing okay and uh finishing out the graph uh we now have uh the branches for what happens if both players steal um they all end up going home with nothing so uh this game has a normal form um it's similar to the prisoners dilemma but not quite the same because of the fact that the uh the the payoff for both players stealing is the same as the sucker payoff for if you get defected on um so this is also known as the friender fox [Music] game so at this point uh the players observe uh the actions available to

them

whoops uh except it apparently doesn't want to play all right screw it um yeah basically they they both take a look at the balls in front of them and recognize that yes one is split and one is steal

uh they they're they're about to actually so um I've uh this at this point we've got the uh uh the signaling part uh filled in so Nick is either going to Signal um that he's likely to split or that he's likely to steal um and then Ibraham the other guy is going to have to make his decision based on what uh you know how credible he believes Nick's signal to be so now they're going to signal to each other Ibraham I want you to um trust me 100% I'm going to pick the steel ball sorry you're going to I'm going to choose the steel ball you're going to take I want you to do split and I

promise you that I will split the money with you well after you took the steel yeah you're going to take steel yeah I'm going to take split yeah so you take the money and I will split it with you after sorry sh [Applause] yeah I promise you I'll do that if if you do steal we both walk away with nothing I'm telling you 100% going to do it I appreciate that right I'll give you another alternative why don't we just both pick split I'm not going to pick split I'm going to steal Ibraham honestly 100% I'm going to steal it's in your nature to steal no I I'm honest and I'm going to tell you I am that's why I'm telling I'm

going to steal if you do split then I will split the money I can't see myself doing that okay well I'm going to steal so we're going to leave with nothing where's your brains coming from I can't work out I know that I'm a decent guy and I will split the money with you well we should just both spit then no I'm going to do steel there is no legal I know there I know for him to give you the money of course if I gave you my word now let me let me tell you what my word means okay my father once said to me a man who doesn't keep his word is not a man he's not worth nothing not

worth a not worth a dollar I agree so Abraham I'm going to steal so you've got the choice you either steal and we both walk away with nothing cuz you know I've told you my intention and I've told you that I will split the money with you Braham if I gave my word I was going to split I would split and you're going to take Ste so only way you can guarantee to walk away with 6,800 to guarantee that you both put the split ball in and I do now have to push you for a decision it's a tough one we've lost it we've lost everything okay we lost then we're walking away with no

money because you're an idiot no that's an idiot you're an idiot that's what you are you're an idiot you're an idiot that's what you are we this can go on all night and these people got to get up for breakfast Nick choose split or steal Ibraham choose split or steel now please choose a right I'll tell you what I'm going to go with you okay I promise you I all right quick audience poll who thinks that Ibraham is going to choose steel uh Ibraham is the guy on the left who thinks that Ibraham is going to choose steel who thinks that Ibraham is going to choose split all right who thinks that Nick is going to choose

steel who thinks that Nick is going to choose split all right I just want to point out that Radio Lab interviewed both these guys after the show and it was pointed out that like in the studio the argument went on for 45 minutes and the audience was booing Nick over and over again he stuck to his guns the entire freaking time so you know in in uncompressed time that signal was was pretty unambiguous but boy he had to go through a lot of work to send it so um we now see that Nick has signaled that he is likely to steal but these uh you these these dots here mean that um Ibraham doesn't really know what

uh what action he actually took so we have to consider both ends of the graph but he's signaled that he's likely to steal so convincingly that we can basically ignore this entire upper half of the graph so now what actually happens split or steal yes congratulations you have both split and each received 6,800 LB why did you put me through there yeah as as he described it later Nick conned him into 6,800 lbs so now we have the complete path through the graph um where we see that Nick actually chose split but signaled that he was likely to steal because I mean on the sure it it it's true that he could have you know chosen steel and

just given him the the 13600 after the show but the outcome you know actually works out better if you just follow the rules and lie so you know he's he signal you know but we didn't know before you whether Nick had actually chosen splitter steel He had signaled unambiguously that he planned to um but you know Nick's signal changed the structure of the game that they were playing it went from being Friend or Foe to ultimatum ultimatum is the game where um you know there's an amount of money um you get to propose how to split it and another person gets to to choose yes or no that's literally what's just happened here so the risk

that Nick is taking is whether Ibraham is going to decide that the ultimatum is so insulting that he should punish Nick by forcing them both to go home with nothing or whether the promise of 6,800 pounds after the show is a credible enough incentive that he ought to cooperate so the takeaway here should be that you know extensive form helps you see how the structure of a game changes as branches of the decision tree get pruned away and with that let's move on to larger games involving societies and lots more information but we're going to start with iterated games um so Axel Rod uh Robert Axel rod in uh 1981 took a look at uh what happens when you

play the same game with another person over and over and over again and you can remember what happened so now instead of just the payoff Matrix your strategy also depends on history and so in 1981 he ran a bunch of tournaments uh playing strategies against each other uh 200 times total up the score at the end um to see how well they perform against each other um he also did what he called ecological tournaments uh where um players can abandon bad strategies um every strategy's success in the previous round determines how frequently it appears in the current round and Cooperative strategies end up out competing non-cooperative ones you know as an aside it would be really awesome if like

people in the real world actually did abandon bad strategies as soon as they recognize that a strategy wasn't working but in practice people are actually really really awful at this I mean people are really invested in the strategies that they choose um you know there there's a bias for that too it's called Choice supportive bias um but what what the interesting thing that turned up in these tournaments is that um if all you know is how the person you're playing with interacted with you the last time the best you can do is um essentially an eye for an eye a tooth for a tooth if they punch you punch back if if they cooperate cooperate um people tried iies with more

complex inferences but they they didn't work very well because the inferences were usually wrong another thing that they determined was that um although Tit for Tat is the best strategy you know in the long run it can't it can't score higher than its opponent um which axel Rod uh kind of refined into an axiom of don't be envious you know don't you know don't try to you know to get one over on the other guy it also turned out that um if you're playing against a tit fortat player the best you can do is cooperate with them because they won't defect on you if you don't defect on them um and so Axel Rod uh reified this as don't be

too clever um again you know if your inferences are wrong um that's not going to help you so what properties do iterated games have first of all niess um success in iterated prisoners dilemma as it turns out requires you to not defect on somebody who has not defected on you um and this basically describe niess is pretty much cooperation um successful strategies are also retaliatory though um if somebody defects on them they will in fact defect right back um but they're also forgiving which is to say that if somebody stops defecting on them they will also stop defecting um however can we do better than this you know because in the real world you know we can get information

from other people about how people interact and we also know that there are plenty of people whose motus operandi is just moving from victim to victim opportunistically defecting whenever they think they can get away with it and you know Burns games are a fairly socially acceptable example of that kind of behavior they're certainly you know far more far more dangerous flavors of it as well so if we have information from other people or uh or other sources you know are there strategies that can incorporate that information to expose social Predators so what Orton Blair said in 2002 is in fact yes um if you can take into account all interactions in the past then you can express strategies in

first order logic um so in the uh in the examples that follow C is the column player the person you're playing against R is the row player that's you um p is the last round and B is a predecessor function so Tit for Tat defect on the column player I am the row player if they defected on me last round that's it tit for two tats uh this is another strategy that turned up initially in axle rods tournament uh or tried it in his um it's defect on somebody if they defected me on the last round and the round before last uh so Grim um is a grim is not a nice strategy uh Grim will if at any time the column

player has defected on it it will defect on them uh forever um there's a somewhat more uh elaborate version of this that introduces logical knot uh bully says that if at no time the column player has defected on me if they have always cooperated on me then sucker punch them uh there's another one called spiteful bully which is similar um but so bully will cooperate if you turn around and punch the bully spiteful bully if you turn around and punch the bully three times he'll keep punching you back there are also strategies that try to keep the peace between um predatory strategies like Bully and spiteful bully and vulture um vigilante is one of them

so if there is no player uh so sorry uh so for Vigilante it's uh defect on the column player if the column player defected on anybody else last round so look at everybody else this is where the this is where information about what's happening in the society comes into play similarly you've got police um where uh just like Tit for Tat you know if you go up and punch a cop he'll punch you back but also if if somebody defects on somebody else who has just cooperated with everybody the policeman says Okay um you're trying to screw over somebody who always cooperates I'm going to punish you so the peacekeeping strategies ignore who somebody defected on they

only care that a person defected and from an evolutionary standpoint um it turns out that um these pred uh these predatory strategies oh sorry I went too far uh so if you start out with a population of 1,50 people um 500 of whom cooperate all the time 500 of whom play Tit for Tat and only 50 of whom are spiteful bullies very quickly the spiteful bullies will uh will just out compete the Cooperators down to zero it's just you you cannot survive in a society with people who are going to punish you like that if you're going to cooperate operate all the time however if you replace yeah if you replace the uh uh the tid for tats with police um you

still keep a population of all Cooperators um and the spiteful bullies eventually get pushed down to zero uh that strategy just does not pay off in the long run so we can now talk about the concepts of neness and loyalty uh you niess is more nuanced in a society um there's both individually nice and communally nice so if you're individually nice you're not going to defect on somebody who's never defected on you um similarly if you're meta individually nice you will not defect on a strategy that's individually nice um whereas if you're communally nice um you won't defect on somebody who has not defected on anybody so like police for instance uh is uh is communally nice and

then you've also got communally nice that doesn't defect on community nice strategies similarly there's uh individually and communally retaliatory and forgiving um if you defect on somebody who defects on you that's individually retaliatory if you defect on somebody who defects on anyone that's communally retaliatory and so on so a loyal strategy will not defect on somebody else that's playing the same strategy as it uh if tit fortat plays another Tit for Tat they will cooperate forever uh ditto for police but not for vigilant Vigilantes will defect on other Vigilantes so tit fortat is individually nice retaliatory and forgiving but vigilante is communally nice retaliatory and forgiving um all right um so we were talking about you know sort

of the the The Meta niceness um what do you do about the problem of you know peacekeepers don't always agree you know the police and and Vigilantes for instance will defect on each other um it also turns out that you know police sorry peacekeepers protect uh strategies that aren't peacekeepers at their own expense you know so in this tournament um you have 400 police who are you know letting the Tit for tats flourish um pretty much at their own expense even though the the population of spiteful spiteful bullies is kept down and you know there are some people who are just able to cooperate all the time still then we've got absolutist so this strategy says defect on the

column player if they have ever cooperated with somebody when I defected or vice versa so this is a loyal strategy it does not defect on other absolutists of its own kind this means that if you put two groups of absolutists into a population they will defect on each other this actually showed up in like in in the tournaments that uh that or ran uh it is also unforgiving uh just like Grim once it starts defecting on somebody it is never going to stop um it is not individually nice it is not communally nice because um if an all cooperator cooperates with uh with somebody who has defected on it um it will defect on that all cooperator you

know the absolutist says you know you're basically tarred by the sins of Who You associated with and I'm not going to deal with you and the scary thing is this actually turns out to be a pretty tenable strategy if you look at the numbers um you know you start out with you know more all Cooperators than you have absolutists and very very quickly you end up with a society full of just camps of absolutists who are defecting on each other I mean this is essentially the failure this is essentially the failure case of Internet arguments

um and again I mean it it is pretty terrifying just how few absolutists you have to have in a population to take over you know we had uh we had 400 in the last slide but you know here we've got 50 spiteful bullies 100 tit for tats and 500 's replace those tit for tats with uh with absolutists and again you know just over five generations it's an all absolutist population so so that that that should give you a little pause all right I'm not at does anybody actually have any questions at this point because I think I've got like five minutes left and um I'd really love to talk about the probabilistic programming stuff but I

just don't think we have time yes

yeah um well I mean there are a couple of ways to model it right you can always have the nature player um and that's somebody who plays against everybody and doesn't care about what payoffs happen to it um and that might not be the right um that that might not be the right model for uh you know for a prison Warden you know because a prison Warden has you know vulnerabilities that can be taken advantage of in some way although the the ability of the prisoners to affect the the warden is pretty limited you know it it it it might be reasonable to essentially you know simplify the warden to Nature

yesli security seems like may not even getting the signal

yes all right [ __ ] it I'll try and actually do the last uh the last section of this talk in five minutes so probabilistic programming all right so we're not frequentists we're uh you know absolute probabilities are great but what happens if it's a onetime event you know estimates are nice but you know you'd really like to have probabilities of various outcomes so we're going to we're going to use basian probability again what's basian probability it's how confident you are that something is going to happen um so in basian statistics we reason from relative probabilities um for a large enough number of samples the uh the basian and frequenc interpretations typically converge but a lot of the time you just

don't have enough samples to choose from so you know really big data problems can be solved by frequentist analysis but for mediumsized data and really small data uh this little equation here works um and [ __ ] it Wikipedia we got four minutes all right so probability distributions probability distribution functions assign probabilities to outcomes there there are ones for discret values like if you need to assign over in numeration uh there are also continuous ones so discret uh probability functions or probability Mass functions uh the pan distribution that guy is one example that's actually the expected util the expected uh value of the pan distribution um then you've got continuous where you're just picking arbitrary Precision value values within

a range uh we call that a probability density function um such as the exponential function whose expected outcome is there um and you can also mix those um so when you uh when you plot a distribution uh the the the narrower the peak on your graph uh the greater your certainty is so that begins to answer your question you know when we when we have data and we try to and we try to fit a model to that data um when and we look at how confident we are in that um so you don't know what it is you're unaware of um game theory is awesome when you know what the payoffs are but if you don't

know what the payoffs are um well you kind of have to fill in you kind of have to fill in the blanks from what information you do have so what do you have well you usually have at least some vague idea of who the players are not always I mean if uh you know if in a network security uh scenario you probably have far less information about who the players are um than you would in um you know in a social scenario where you at least know people's names um but you also have some idea what you can do um as well as what other people might be capable of doing um and you can make you

can make educated guesses about that um you can also look at the history uh both of people's interactions with you and people's interactions with others um and make inferences over that data all of these can be random variables um they can be deterministic which is to say that they're controlled by um the uh they're they're controlled by some inputs and if we know all of those inputs then we can figure out exactly what the value is um and then there are stochastic ones where there's still some Randomness so um if you are going to simulate um a game first of all you got to figure out what dist distribution you're going to use I mean you need to you need to

figure out you know sort of what you're what are you going to model uh you know what does your data describe what are you interested in what does that thing that you're interested in look like structurally and what influences it um you this will help you choose uh knowing something about uh you know about how your data is distributed will help you decide what probability distribution to use um probability distributions have parameters and you have to figure out uh you know how which which parameters you're going to use um you know you can keep on assign you can actually um assign a distribution to a parameter um and just have the model update itself um

and you can you can keep reassigning distributions to parameters as long as it's useful but if you don't have any strong beliefs about a parameter then it's probably actually not useful so you know for in that case you can just pick an average value and let inference update it for you um so from the data um you know a few things about your priors that helps you fix values in your stochastic random variables and then you just sample the posterior distribution which is to say the conditional distribution of um some outcome given some parameter um which is to say the the probability distribution of y when X is known to be a particular value um

so Markoff chain Monte Carlo um Monte Carlo simulation happened to also be discovered by John Von noyman and in so in normal Monte Carlo you've got a bunch of independent and identically distributed variables um they don't influence each other and you basically just sample all of them and average the samples for each one so in marov chain Monte Carlo um variables condition each other they influence each other on um the the uh the the they can influence each other's values um and that essentially ends up defining the Markoff chain um when you combine probabilities you're basically reducing the effective volume of your search space and the but the problem there is that you have to

know which regions of your search space to search in otherwise you're always going to be searching for a sample that isn't useful so mcmc helps you narrow the search to places where you're likely to find values that satisfy the data and the conditions so um how does uh Markoff chain mon Carlo work um it's actually remarkably simple um you need a Markoff chain um Wikipedia um you need a function that is at least proportional to the probability distribution that you care about um so here's how the algorithm Works um you pick an initial State uh just an assignment of all the of of all the variables um then you've your Markoff chain has a state transition Matrix um and so you

just you apply this to the state this is literally just matrix multiplication and you can do this on wolf from alfra I totally did it this morning um uh you're you're literally just multiplying your state Vector times the state transition Matrix over and over and over again until it converges um so if your new State uh the the the F ofs Prime over f ofs is basically an error function um and so if if that value exceeds one um then your new state is definitely more likely than your previous state so definitely accept it and if not then um then maybe accept it um and just keep doing this until it converges so we have actually modeled a

game that has no payoffs at all the payoffs are completely hidden and all we assume is that the players consider some to be cooperating and others to be defecting so an outcome is um a uh is is you this is this is the actions that happened um we're going to assume here that players are uh more or less paranoid essentially uh that's what this trust value is that's what that that range is between zero and one and so a strategy is you know starting with the type of the player I.E you know how how paranoid are they um and their previous action and the other player's previous action um what is the outcome so here's an example of Tit for

Tat um so in the confusingly in this case true means defect so um the the second line of that bottom function is saying um in the case where I have this is this is pattern matching in hascal so if you know in the case where I have some trust value and it doesn't really matter um and the person defected on me last time no matter what I did um um my response is going to be um selected from the beri distribution with uh uh with P equals 0.9 um and then in the opposite case where the person cooperated last time well if we're really paranoid and suspicious then maybe we're not sure they cooperated so um we're going to

condition that on you we're not we're not really sure we believe the evidence of our senses um and so we condition that outcome on uh uh on the paranoia value so um this is right um actually I'm going to bring both of these up so this is the the play function um uh computes the um excuse me the so the the the play action basically models you know what do the players do on the basis of uh of what was played last round and what are their strategies um and then in the iterated game def definition down there um A and B uh are our paranoia values uh and all we know about them is that you

know they're not conditioned on anything they're just randomly chosen between uh somewhere between on a uniform distribution between zero and one so that's Nature's Choice there um and we say that a game is 10 Rounds um and the players are going to be playing um the Tit for each player is playing the tit fortat strategy and their their initial move is cooperate and so then we simulate this over the over this array of input so the left column is player a the right column is player B um player a never defects player B defects sometimes and what's happening here is that uh we're passing the iterated game that we defined into the marov chain money

simulator um and then plotting the uh the expected values of A's and B's paranoia so you'll note that this is not a particularly tight distribution um you know a did cooperate you know all of the time so we're pretty sure that you know that a is not that much of a wack job um and again this is not a particularly tight distribution either more data would certainly be useful but we can you know we can see that you know there's definitely a cluster around the center um and it's far less likely that the person is completely trusting or completely paranoid so now we can add um some more strategies to the game such as uh all cooperate

um again pattern matching it doesn't matter how much you know how uh how how paranoid the the player is it doesn't matter what they did last what the other person did last round it doesn't matter what they did last round uh they're they're most likely just going to cooperate all the time um you know the this the this distribution can also be taken to you know count as a sort of a fuzz factor for noise on the channel this is really really inexact and I don't like it but you know we're we're we're building out more more useful models like the reality this is the reality people in to the cooperate all the time [ __ ] up and accidentally hurt

people that's just how [ __ ] happens he's right I mean yes even people who intend to cooperate all the time do in fact screw up and hurt people um so yeah and similarly even people who defect all the time um sometimes end up you know screwing somebody into 6,800 pounds and here's our buddy Grim trigger again um in this case uh we've got a more uh complicated set of pattern guards um so in the case where the other player defected last round um but uh I did not then I'm almost certainly going to defect on them um if they did not defect and I did not defect then I almost certainly will not um and so this this guard actually

captures all the the past part because you know in in Grim trigger if you've defected on somebody before then you're going to defect on them forever so as as long as um you know as long as we defected on them at some point in the past then well we know that they're just a terrible person and we can't and we can't trust them ever um so we can also consider strategy to be a random variable so we make that uh a data type uh Choice among our strategies um and we'll make the we'll make them uh equally equally likely across uh uh sorry uh each each category is going to be uh equally likely and then we can do just exactly

the same thing um we produce the uh uh sorry we select the uh the paranoia value um as an unconditioned uh uniform random variable um and then we also uh select a random strategy um and what we're what we're trying to infer this time is the strategies that the players are using um and again uh we just passed this uh iterated this this new game definition and the data that we had before uh and try to infer what A's strategy and what B strategy are based on uh based on the samples that we obtained and it's pretty obvious I mean you you could kind of tell from the data right that a cooperates all the

time um whereas B is probably playing Tit for Tat um there's a nonzero possibility that they're playing all defect and just really bad at it but it's not anywhere near as likely um given the probability densities um that that they're actually playing TI for tat all right so now that I have run way the hell over time things I would like to do uh as time goes on um cpfr and I cpfr and I are already building out societ literate prisoners dilemma using probabilistic programming I was really hoping to have that today and I'm sorry that I don't um societal literat prison societ literate prisoners dilemma with signaling because that would be cool I mean people

actually talk to each other before they before they make decisions in fact that's how information gets around people talking um you know we don't really have uh you in in the real world we don't have perfect information we don't have this complete history of everything that's ever happened in the world and so you know we can we can restrict our sampling um to kind of model uh the the Imp the imperfectness of information in the real world um but you know similarly it would also be nice to be able to look at you know noise on the channel um and you know whether the channel is one that you can even make decisions about coordination I guess I got yeah

I'm bad at coordination too and I and I guess we're done thank you [Applause] everybody situation where you have a small

GT - Strategies Without Frontiers - Meredith L. Patterson

Related talks