>>Thank you all for coming for today’s

distinguished lecture. It is a week of attrition since AAAI is happening at the same time, WSDM is happening

at the same time. Eric Horvitz said Triple A.

Susan Dumais said Wizdom, Harry Shum said Wizdom. So yeah. It’s awesome

that [inaudible] is building 99 has come on down. Are there any announcements

before we get started? Okay. So, it’s a privilege to host Eva Tardos for

the distinguished lecture today. Eva is a professor at Cornell

and she was chair at 2010. I remember for sure. And

she’s won so many awards. I noticed the bio

is not yet updated. So she’s won things

like the Fulkerson Prize, the Gödel Prize, but the EATCS has not yet up on the website or it’s

not showing up on the bio. She won EATCS last year. And she’s done foundational

work in algorithms, most famous for network flow. I know the first

computer textbook I picked up was

Kleinberg and Tardos. The first time I actually saw algorithms named after

people and things like that. You know like [inaudible]

there was the Lovász number. And there was

the Tardos function. And I was like, “No. I’m going to go to AI instead.” And also the first book I picked up in grad school was

Algorithmic Game Theory. So she’s played a foundational

role in that field. Yeah. Her contributions

are too numerous to list but one thing I did

realize in reading through her bio

is it’s definitely incomplete because

it does not make any mention of

her unwavering effort at a diversity and inclusion and stem outreach

and things like that. I think the first

email I got from you was the Expanding

Your Horizons email, encouraging grad

students to go and teach those in middle schools

and things like that. So yeah. It’s an honor to have, Eva, come here to talk today. I’ll handle it over to her.>>Thank you very

much. Thank you Hadid. Thank you all for coming and I hope you’ll find

the discussion interesting. I guess one comment

on Hadid summary, I hope you guys all do at improving diversity

and outreach. I think there’s a lot. We

all collectively can do. I do not keep this kind of

things on my CV because I guess I think of my CV as

a sort of professional, sort of my science

contributions. It’s not because I don’t

think these are important. I think they’re

very important and I hope you guys all do this. In particular,

especially I mean, the people who are not

minority and women. Thank you all. This is an effort we

should all participate in especially the

non-minorities. And I hope you guys care. Anyway, but I will limit my discussion to

the particular topic. So the thing I want

to talk about today is how I think about learning in

the context of Game Theory. And I guess it really sort

of come in kind of two parts. Our people learning in

a setting where there is games and if I can

assume that the learning, what kind of things

can I say about what happens in games if people were nice enough to actually play in this relatively wise manner where I have to be

careful on what I really mean whether it’s

wise to learn or not. And that’s what I’m going

to really start with. So that’s the efficiency

part is what can I say about the outcome of a game or efficiency

of art compart. And then, the first part

is learning in games. So what do I mean by games? Unfortunate. I thought

I had a forwarder, but apparently do not. What I mean is a game of

some sort where some players, which in depends on the application that

particular player can be either the people driving

cars or it could be rerouters routing traffic. Some algorithm, can be

a human or an algorithm, is making decisions as you

go one decision at a time trying to repeatedly optimize

some objective function. To make it more mathematical, I will assume that there are a couple of things that

are written on the slide. The example here is

a Congestion Game. So I guess the assumption that cars or [inaudible] try

to follow a shortest path. They try to figure

out where they get to the destination of the fastest. Of course this is a game. Congestion. Your speed depends on who else is going

on the same path, whether your current traffic. But there is an implicit

assumption in here, is that you’re not trying

to hurt other people, your goal isn’t

that other people don’t get to their destination. You just want to get

to your destination. And then I guess

one part that will come in certainly in our results, or I think what’s important to think about in

the context of games. I think I’m giving up on this, is that the traffic isn’t as stable necessarily

as you can hope. Depending on whether

the Superbowl ended, or some new site and then on

the New York Times website, or whatever happened,

traffic does change. And so, as you return even you

optimally send traffic, you’re going to have to learn, or adjust to changing environment

and that’s going to be an important feature as I

get later into the talk. I will alternate between using traffic routing

in my example, or actually using

auction as my example. Where auction I think of

mostly advertisement auction. I need to think of a setup that’s where learning

will be relevant, where we repeating a game such as the

advertisement auction, or such as traffic routing, where the individual outcomes in a game are not very important. Like in ad auctions, a particular value of one ad, it’s a tiny fraction of a cent. In traffic routing, car I think they might not

apply so we’ll but certainly, in packet traffic, if

one packet goes bad, the internet drops

it and tries it again, not that big a deal. If a host chime of

packet does back, that’s kind of a problem.

It’s a big problem. Similarly, in ad auction, if your whole campaign tanks

that’s a problem, if one ad didn’t work out, oh well, one it doesn’t matter. So, that’s the kind of setup we’re learning

will make a lot of sense. And here again, the

advertisers not very fast, but can leave and

join the system, and there is an ecosystem where things can

change as we go. So, to make it

slightly mathematical, I guess the game will be that their auctions that

people can take, put up ads, choose

a rat to drive. And then, something

happens and you repeat this action over time. And the assumptions are players somehow take

the past data and figure out from the past data

what the hell to do next. So that’s the model of learning. I assume that the goal is, that they try to

an average do well. Again, not very

sensitive to outliers, but most of the time

should do well. And that’s because small value

of cost in each period, so exploration is a good way to figure out

what the hell to do. And again, the idea

would be that in the beginning they

don’t know what’s up and then later on they know. So I guess, that’s

the main topic of the sort of repeated games and what the hell happens

in repeated games. And I want to start

with something which given that this

is the AI colloquial maybe I shouldn’t

spend too much time on of why do we

want to do learning? And what might be

the goal of learning? I want to start with actually, learning and games was

a topic combination that we weren’t

the first to start. Economists started it before AI was really

such a dominant field. So, they were asking this question and

not the question that I might want to ask, they asked this question

or learning is cool, because players might learn

to find the Nash Equilibrium, we’ll just figure out

what mode of the learning is. They weren’t

actually thinking of the kind of game

I’m thinking of, which we naturally

or have to play with small six day instead thinking of what

they called pre-play. So, they were

thinking of there’s a giant enormous

auction going on, think of the FCC spectrum

auction that just happened. But before it happened, for

the previous couple of years, there were all kinds of

training things going on, to train the auctioneer’s

how to bid in this game. So, they were

thinking, let’s think of this training as a learning. If you’re on the training

before the real auction then, the players might,

via the learning, figure out what

the Nash Equilibrium is. And then, when

the real auction comes, they can play the Nash

Equilibrium. That was the goal. So, they were thinking of learning just the way

we’re thinking of learning, but they were hoping that learning will find

the Nash Equilibrium and that’s basically was what

they hoped and just to realize how

early they started. Robinson in 51, saw about

fictitious play which apparently means

what I said there, your best response

to the past history. So you watch what

happened in the past, you think of past history as a sample distribution of

what other people are doing, assuming they’re going to sample the very same distribution, and just best respond to that. This is a sort of bayesian

style learning assumption, that the previous behavior is a bayesian sample from something and you react to

that bayesian sample. And that’s what Robinson

was studying and she proved, by the way, just because of

Odette’s comment, she proved. Robinson was a woman. That this pre-play was

hoping to learn lead to a Nash and what was hoped and that’s what she was studying what time does it lead to Nash? And what do I mean

by lead to a Nash? I hope that maybe it doesn’t

need a definition either, but just in case,

oops, what did I do? What it means that they will

find a solution that stable. Stable in the following sense, that stable as a one-shot game. If I take and there is a slight awkwardness in alternating between routing and auction as my two examples, then one is an example where we minimizing the delays

some sort of cost. The other one is an example of everyone maximize our utilities. When we want to

minimize delays then, Nash solution a, this is a vector with AI is

the component for player i. Then, this is what

it means to be Nash. For every player i, if they choose to

do something else instead of their

own component AI, they choose something x instead that would have a higher cost. That is they don’t want

to do this other thing. That’s what it means to be Nash. And again, the notation

here is a minus i is a n minus one dimensional

vector with the i component missing, which got replaced by x. And that’s what it

means to be Nash. And what they hoped

is somehow learning will either achieve this

one-shot game Nash Equilibrium. So that would be

an equilibrium verdict, repeatedly play the same thing

over and over again, or at least converge to it in some sort of distance

like metric distance. And I guess, we would call this as a no regret assumption. And the question

was, does fictitious Play, that is best responding

to the past history, leads to Nash equilibrium. And an unfortunate answer

is mostly not, but Robinson proved

that sometimes it does. In ’51, proved that

it’s a two person, two strategy each game, then it does. Kind of nice. Two by two game. Matching pennies, which

is on top of it, zero sum. It doesn’t have to be zero sum. Any two-person game, or

generic two-person game, algebraically

independent numbers or something, then it does. And then, in ’61,

10 years later, rock-paper-scissors that

is any zero sum game, two-person zero sum game, and it’s kind of over here, like remember that

line is mostly not, but sometimes it does. There have been other attempts

in this line of work, which I will not

survey going on even today where still the goal is to find the Nash equilibrium, and so they come up with weirder and weirder

definition of learning where really

your goal is to find a Nash Equilibrium and

not the moral natural behavior. That’s not what I want to do. So I thought fictitious plays a pretty reasonable model of

what people might be doing, the more modern things are not. And I disagree with the goal. I also disagree with the goal of finding

Nash equilibrium. I don’t know why would we want or expect players to find

the Nash Equilibrium, but there are a lot

of troubles with it. One is there are

multiple Nashes, so why would I expect

them to find one of them? If it’s a multiplayer game of the kind we think about

the internet, many, many players, it’s

an incredible amount of knowledge even to

know who else is playing, let alone what the hell

they’re thinking. And then there is

computational difficulties, something that’s

algorithmically hard to find, the sort of distributed

learning algorithm is not going to succeed in

finding it either. But further, you

look at behavior, we know it’s not happening. This is from Microsoft Data, which because I don’t

work for Microsoft, there are some secrets on the screen of the kind that

I don’t know the answer to, like the bids are normalized and I don’t know the keyword. But this is

advertisers’ seven day, a keyword on which I

don’t know how many, some six or seven people

are advertising on, and these are days,

seven consecutive days. And this is what

their bids look like. These are advertisers who

frequently change their bid. There’s ample evidence that they must be using some sort of online tools because

they change their bid too often for a human intervention. Also, these are people who

have insanely high budgets, which they

practically never hit. So they’re really using their budget as

some sort of insurance, and the thing they played is that number that you

see on the screen. To me that those

numbers especially the red one or this guy

here or that guy here, totally look like gradient

descent style algorithm. Every now and then,

they adjust a little bit like gradient descent would. They’re learning, they’re trying to figure out what to do. So, I don’t want to model these people as playing

a Nash equilibrium for many, many reasons because learning

doesn’t find the Nash, because they don’t appear

to be finding a Nash. They’re not that stable. Instead, what I want to

think off is a repeated play where they repeatedly

experiment with the games. Now, if you’re really

more E-com backgrounded, then you might tell

me yes, yes, yes, of course, they are not stable because this

is a repeated game. They’re doing something

much more complicated than finding a Nash equilibrium

of the repeated game, which is a very, very

complicated object. And I think the pictures are not suggesting that

they’re doing that. That would be sort of

a threats of the form that if I and Hadid are

advertising for the same keyword, I tell Hadid that, “Look, on even day,

it’s my turn. You want a bid

zero. On odd days, it’s your turn.

You’re bidding zero.” And this way, of course,

we both paying low. And that will be kind of cool. There’s no evidence in the data

that they’re doing this. Those are very,

very complicated. And I doubt they’re doing it. Alternately, I can claim

that they repeatedly playing a one-shot game

in which you would claim that on this day, that red player here

had a high value, that’s why he was bidding high. On this day, his value

changed to low or that doesn’t

seem very likely. So none of these seem

very viable assumptions. Instead, I want to assume that they’re learning

and they’re doing something like no-regret

learning or fictitious play. And there are couple

of slides here, which I’m going to flip

through really fast given that this is

the AI of the answer, maybe this is something that

all of you know very well. Instead of playing

fictitious play which is a deterministic strategy, that is you deterministically best respond to what happened, something that

works much better, at least in analyzing and maybe more likely something

similar that they’re doing, is they’re not quite best responding deterministic

doing something similar to it. For example, they

randomize between some relatively

similar playoffs. So a fictitious play

would tell you to choose the best strategy, the what’s called often

smooth fictitious play, put in a normalization

factor and somehow try to be seen similar

strategies, increased the entropy. This is also the same as what otherwise known as

multiplicative weight, and has this very nice property, what’s called no regret

or small regret property. Remember, and here is the awkwardness switch

to utilities right now. Nash equilibrium would require deleting this thing

from the slide, that the utility of

a what they played is better than the single

best thing this hindsight. This thing here says, that’s not true individually

every single time step. So in Nash equilibrium, something like these and other should delete

the sums over there. It’s not true

every single timestep, but if I add it up over time, then it’s almost true. Almost true meaning there’s

a small regret error. So let’s try to analyze

this is a little bit better. There’s a fixed. I can change my strategies

as often as I play. Every player is

allowed to do this. But compared to a fixed

strategy with hindsight, your behavior should

be reasonably good, that is the error that will be better should

be kind of small. And in particular, that rule and many other rules get through

error than to, somehow, root T. And maybe, hopefully, many of you know about

this sort of no regret learning. I guess if you don’t, let me say one more

sentence before I try to put it in context. What this means is if there is a really good strategy that’s consistently good, I’m asking. Either find it sooner

or later not too fast, not too slow that is

after not so much time. Or do something better. That is if I’m driving to work and I don’t know

the highway numbers here so let’s go California area maybe you guys

will know those numbers. If it turns out that we’re

driving in Route 101 is ideal, I don’t have to drive on 101, I can drive in 280 and

alternate between things. But if I’m consistently versed in 101 and

there’s this 101, it’s so simple I

should just do it, then I should discover it. This is what it’s asking

and it’s giving you a little error

because it might take you a while to

discover this fact which is of course natural because you’re learning

from past data only. So, that’s what

no-regret learning is. And I guess a couple

of things about no-regret learning again maybe putting in context

of where I’m heading. So, the first thing

I want to tell you is maybe one slide of game theory tells you of why

this might be a nice thing, but more importantly

are people doing this? Is the data based at that? Is this a reasonable assumption? Will I recommend people

to do no-regret learning? So asking is no-regret learning a behavioral assumption

that I want to assume? Do I want to

recommend to people? How good is this? And then, I want to also tell

you a little bit of what can I tell you about games, if I can make the assumption that people

are doing no-regret learning. So, actually maybe

I should cheat and flip through this and

come back to it later. So, let’s start with is

no-regret learning an assumption I want to make about

the data or do I like it? So, again the utility terms

I want to abstract away from the exact error term which was root T not so important whether it’s

root T or something else. If I played for

capital return T, I want the error to be kind of small compared to T.

So the way I wrote it right nice a more

generous epsilon times T. A small fraction

should come from the error and as

utilities go, time goes, the toy utility you’re

gaining is growing with time, this probably growing with time thereby should grow

a little bit slower with time, if at all at least

an epsilon fraction. Now, what are good things

about this assumption? Or you can do

no-regret learning. There are many algorithms multiplicative rate is one of them there are million of them. And if you don’t know

enough of them go ask Seb or ask many of the other

people in the audience. There are million great

algorithms, very simple ones. The behavior to me the sort of, if there is a good strategy

please notice it. It seems so natural that it’s even believable that

humans can do this, not just algorithms,

very natural. And one thing that

maybe you’ll see at the end is it’s

a very handy behavior model. It’s such a nice equation

or inequality I can use it. It’s a nice assumption that I can build a whole theory on. The same way

Nash equilibrium was a beautiful assumption and economist built

a beautiful theory on Nash equilibrium as

a behavioral assumption which sometimes they’re

questioning but nowadays whether that’s

a reasonable assumption. But there is a beautiful

theory built in it because this is

such a nice equation. The same applies to this, but I also should

confront it with data and the data

is the data set that I already showed you one being

advertisement data set. It’s coming from, I already showed you

one picture from it. It’s coming from a paper

of Denis Nekipelov and Vasilis Syrgkanis

and maybe you’ve seen some of this if Vasilis

talked about this but let me remind you that maybe

the most dominant feature of the data set. So, this is again

advertisers that I can think of as

optimizing something, they are changing

arbitrary frequently. Also can think of them as

a single parameter advertisers. The budget is so

sky high that that’s an insurance that’s not

a strategic parameter, it’s the bid that they’re

changing all the time. And I look at the regret, how much error they

have their regret. Now, this is a plot that you should definitely think about. I guess the blue bars are

probably easier to see. So, the regret relative to what they gained.

Did that make sense? If you gained $100. One dollar loss of regret is different than if

you only gained $1 and your regret is $1 loss

or so normalizing to your gain to me makes

sense as a measure. If I look at someone out here, that poor guy is doing horribly This is the $1 gain, $1 regret. Okay, $1 gain, 77 cents

regret, shitty. If I look at someone down here, the guy has

practically no-regret. These people literally have no-regret and maybe a

better way to put it, the plot is maybe broken. These guys could actually

have negative regret, that is they could

be doing better than no-regret which is

possible because they’ll have to change

the way they bid all the time and the regret measure is against a single bid

with SignSite. So these guys might actually

have negative regret here. Thirty percent of the bidders have negative

non-positive regret. So, these guys are

doing very well. I’m reasonably happy and

can say that these guys here maybe I don’t know

how far I should go, maybe till here, 40, 50 percent, they’e

doing really well. These guys are not. Now, you have to

be a little careful here or I should be

a little careful. I don’t actually know what

this guy’s values are. I infer the value

from for this plot. So, what I really know that these guys are doing

badly, that’s a fact. No matter what the value is, the regret is close to 50

percent of the utility. So these guys are

all doing very badly. These guys here might be

doing well, reasonably well. What does this tell me? It certainly

doesn’t tell me that the little or no-regret

is a good assumption. There was only 30 percent

of the people down here and 70 percent

have some regret. And if I think about it, it may suddenly makes me decide that I like this

alternate version, which I put here on

the slide and I guess we have a paper a year ago, NIPS. Working this alternate

assumption which we called approximate no-regret. Instead of comparing

utility additively I said, “Well, you know

a little bit of error.” You can be that sensitive. If you’re within 10 percent, come on that doesn’t

matter anymore.” So one way to think about

this comparison here. If I change my classically

regret is an additive measure, I just compare

the difference between my utility and the optimum

utility that’s the top line. And as I pointed out, the classical measure

gives you a routine regret after time T. If

I’m willing to give you a little bit of

multiplicative error that is I’m willing to give you

a parameter epsilon and say, “You know, within 10 percent

I don’t care so much. Please make the additive error

much as small as you can, but I give you 10 percent multiplicative error

in addition.” Then you can significantly

push down the additive error, in particular I can push

it down to a constant. So this has all kinds

of advantages, I’ll come back to

this in a second. But looking at the data I think this is actually

probably a better model. That multiplicative regret like giving you a little

bit of error. The data would suggest

over 50 percent of the participants have less

than 10 percent regret error. Whereas, if I have

to go with literally zero there was

practically no one. Yeah.>>Can’t you say that like here people are

reaching something like correlated equilibrium and in this seems definition is

almost same as that, right?>>Yes I’ll come back to this. Give me one second

coming back to the games of the connection to correlate to the equilibrium. I want to spend one more slide on negatives on the issue. Is this a good

assumption or not? But I’ll come back to

your point in one second. Yes, the answer is yes and I’ll come back to

you in one second. This is an assumption. I can offer you one more thing

for the assumption and then I’m going to offer you the main argument

against it. This is reacting to

what he’d asked me, try to talk about

the general picture here. This is an assumption

here additional slides. I like it too, but to prompt discussion I should be more honest that there

is the downside. It’s not clear how good

this assumption is. Yeah?>>Is it going back to this, to back chart there to show

and the square root detail, because you normalize it, it’s hard to say. The noodle is going

zero there, right? There are square root T, it’s hard to say if you allow

some kind of square root T, what would this

distribution look like?>>One very actually

coming even harder, coming more up from you, actually this question that I don’t know how to

ask on this dataset. Is Vada nurses T. What is the moment I

am looking at this? And I guess one thing we’re looking at which I

can’t show data yet, but I guess we now

have some chart, some datasets showing,

what happened, I want to distinguish between

this 50% versus that 50%. 50% that I know for sure has high regret according

to what I’m looking at. What are they doing, the next seven days

and my ingoing assumption which I

think the data shows its true that the people

who have high regret, more drastically

changed a bit on the next seven days than people who I think are doing okay. That is, they too

think they’re having a problem and so they’re

varying their bits very widely. And part of it is, maybe something changed

and they just walk up to it like this regret is

a good assumption if you think you’re against

the study distribution of some sort and if the

best response changes, you should adjust and

maybe for these guys here, something did change

and they slowly waking up to this

and effectively the T smaller than

these other guys T, but I don’t actually know and I think this new data viewer currently collecting

or more precisely. We’ll get back to dealing

with after the EC deadline. We’ll hopefully give a better

answer to this question. There are a couple other

datasets other than my own that I can rely on. There is Er’ev and Roth

in 86 in a lab experiment with two person

coordination game that I think actually it’s

pretty good support for this. I originally started

to think about some of the newer results

I’m going to tell you about. Listening to a talk at EC’14, about a lab experiment

was trying to tell us that people are actually not, no regret learners and in fact that’s what

the people does. It’s a buyer and seller game. Again it’s a human

subject experiment, the buyer and seller game, where only the

salaries are human, the buyer is automate, the game is that the buyer has a uniformly random value

between zero and 10. You’re the seller,

you’re the buyer, you’re supposed to

make him an offer. No, you’re the seller,

you’re supposed to make him an offer and if you make

an offer below his value, he’ll accept and

otherwise he doesn’t, and what offer should you make? And if you do

a little analytics, you should figure

what offer you should make. That’s not hard, but most people don’t do this and especially if they weren’t told that it is

a uniform distribution they would have to observe that it was the uniform distribution, they instead make offers

and see what happens and a better mold that fits their behavior is an incredibly

recency bias version. That is, they don’t best

respond to past history. They best respond

to what the hell happened in the last run, or last couple of runs.>>Do they know

how many buyers are going to show up or is it an infinitely.>>I don’t actually know.

I should look at the paper.>>You didn’t know

that thing against the park.>>I don’t think they knew that. No, they didn’t know

the thing against the park. They said they saw

the thing against the norm but it was

a heavily recency bias. The behavior showed

incredibly strong recency bias and I’ll come back to

that point in a minute. But I think recency bias is not crazy in

a changing environment. Recent experience is more

relevant than past one, which I think is fine and

then there is a recent goal in Nissan experiment

which actually played with in

particular ad auction as a human subject experiment and mostly they

recovered the values, they notice that buyers

that they gave or sellers that they give tiny tiny

values on experiments, so small that they really

should have dropped that. They should lose all the time. They behaved irrationally. They didn’t play no regret. Bombay, I would

explain this that this poor guys in a lab experiment were

forced to sit there for, however long the experiment was and it’s frustrating

to keep losing. They were paid to sit there, but still it’s frustrating

to keep losing. This doesn’t quite happen

in real life because people whose value is too

low for this can just drop out and

advertise elsewhere. I think that’s less

relevant for us. But to show to you

that it’s not as good as I’m making out to be, let me actually give you

what happens in a very, very simple two person game, which is almost

rock-paper-scissors. This is a standard notation

of rock-paper-scissors. This square here tells you if

scissor press against rock, then scissor loses one dollar

and rock gains a dollar. The game I really want, because remember I

already told you that zero-sum games things work out okay is not a zero-sum game. The rock-paper-scissors

except we’re going to each played a date nine

dollars if we actually match. As it turns out this game has the very same Nash

Equilibrium 1/3, 1/3, 1/3. That is 1/3 of the time we

pay 18 dollars to a date. It’s a good game for

a date, not for [inaudible]. Question I want to

ask what happens if I do learning in this game and I’m even willing to get you

started off from the 1/3, 1/3 Equilibrium and I want

to actually depict this in the 2D plane because your strategy here

is in the 2D plane, because probabilities

should sum to one. This is one probability

one on rock, probability one on paper, probably one on scissor

and everything in between that point

here is the 1/3, 1/3, 1/3 and let’s assume you start

it off of Nash and I start off on Nash but I’m very bad at randomizing as we are

all of the rest of us. And as it turns out, a little bit too

heavily play paper. And say I’m playing

against Nikkiu. Nikkiu’s learning algorithm will discover that I’m very heavily playing paper and as

a result his playing, learning algorithm we’re very heavier pay the thing that beats paper that’s

called scissor. However then my learning

algorithm will discover that Nikkiu’s learning algorithm always the scissor

to be a scissor, I probably should create

pay rocks and cube. Pay rock and what we are going to do is actually

cycle around like this and the dynamic in

this 2D plane will come out to the cycling behavior

on the boundary. On the plus part which is cool and actually this is better

than playing the Nash. Better for the two of us because we’re not paying the nine dollars

to him Nash would. We think avoiding

the diagonal in an interesting way but in

particular we’re not playing the Nash as you guys see

and what we’re playing instead is a correlated Nash. The play does not converge. What we’re doing is

correlating our shared history. What we’re doing is

something of this form. We boost learning so we boost satisfying

this inequality that our cost or utility cost

is less than any other cost model or

a little bit of learning error. But we correlated, we’re not doing it independently,

hence it’s not a Nash. Okay. Last slides

on the model of learning and remember

the slide title is negative things

about learning. So this is kind of

a nice property, here is the reason

they’re not doing it. In order to get no-regret, I have to randomize. If I were to be deterministic, Nakeel knowing

my deterministic algorithm would beat me every single time. It would know that I

do and would do the up, do the beating thing. I have to randomize. And if you look at those plots, I said they look like gradient descent

which kind of they do, but they surely don’t look

like they’re randomizing. They’re not wiggling like crazy, that people are not randomizing. So that’s a problem. Would I actually recommend people this algorithm

to do this? So this was the question, “Are they running

no-regret algorithms?” And actually no,

they’re not running no-regret algorithms

because an algorithm to be no-regret, you

have to randomize. And they’re clearly

not randomizing. Second question, “Would

I tell them that, hey guys, you guys should

run a no-regret algorithm?” And honestly probably I

should not, for many reasons. One reason is what do you mean, no-regret against a fixed

strategy you might do better, maybe you should

try to do better. And what’s worse,

no-regret is not that bad, it sort of adopts to

other people strategies. But maybe I should do

better and I actually denote the rock-paper-scissors

example of us here exactly to

show this to you. If instead of automatically reacting to what Nikhil

is doing right now. I know he’s doing

a no-regret algorithm, I can predict what he’s

going to do tomorrow. I should react to that instead. That’s a better move than

just reacting to his past. I know his algorithm, I should they react

to his algorithm, to his next step

which I kind of know.>>I should do the same.>>Yes. Well, yeah, if we’ll get caught if

we keep people do this. So I’m not so sure

I would react to it but I want to make

a different point and that’s my coming back to being positive about

no-regret learning. I did not make

the assumption that they are running my no-regret algorithm, which they clearly are not. I didn’t recommend they running

the no-regret algorithm. I was asking, can I mold

them as achieving no-regret, and it seems like I can. They can do better

than getting no-regret and I would recommend they

do better than no-regret. But they do seem to

achieve a no-regret like condition without

necessarily running the algorithm. So what I’m going to do

in modeling this is not to recommend they run

this algorithm but instead model behavior

with this condition, which is not

the same assumption, and I think a much

better assumption than the recommendation

that they reach. Okay, so I’m going to

give you one other part. And I don’t know

how many minutes I have for one another

part but use a few, some amount of minutes. One benefit of this which I already mentioned in

this one state English but I want to give you

a little context for it, is that this no-regret

assumption is such a clean beautiful

mathematical assumption. It’s so usable. And how many things

one can derive from it? And I would say someone

trained as a mathematician, is that having a clean

mathematical model is definitely good for thinking about what follows from

this assumption. This is where I want to

go back to your point and actually I don’t know if

I can go back fast enough. There is a slide

that I skipped in the middle, exactly this one. There I wanted to pick exactly the point that

you asked me about. If I take the absolute limit, when relative regret

went down to zero because root D that is grew

more slowly than time, then there’s

absolutely no-regret. The assumption here was that the utility

of what they play, if I take the history of play as an empirical distribution, then that distribution will have the Nash like

no-regret property. It will have this property that the expected utility expectation

upwards from the history of player i is better than any fixed strategy

with hindsight. This is called

a correlated equilibrium. It’s like a Nash Equilibrium, same inequality, except

that the player correlated. And the way they correlated is that they both pulled from, both based their behavior on the same history and

they got them correlated, as we saw in the example. And going back to where I was, sorry for doing things

in the wrong order. So the sense that

they’re converging to or getting close to

a correlated equilibrium, we can ask a bunch of

interesting questions. What can we say about quality of a

correlated equilibrium? And what can we say

about the speed at which they converge to

this correlated equilibrium? And maybe one of

the most fun of, most fend off and one to most emphasize is, can

we adjust this? And the answer is we can, even if I actually don’t

keep the game stable. That is, learning is good because learners can adjust

to a changing environment. If you’re learning,

you’re not blindly playing a Nash Equilibrium and hoping for the best that

it stays the same game. You’re watching what happens and if the environment changes, you will learn to adjust. That’s what learning

is good for. Can we actually make

this mathematically precise and say learner’s adjust to the environment and

they constantly achieve, or over average time, achieve high social welfare? And the answer is yes. And in

the last, may be couple of, minutes I want to give

you a sense of how the no-regret learning actually gives you enough power to

prove these kind of things. So hopefully that’s still okay. So what is the kind of

thing I want a proof? This is something that often

goes under the name of price of anarchy and

maybe one nice simple example of myself and Tim Roughgarden is

for traffic routing. And it said that if you take

the static routing example, then the cost of the Nash Equilibrium is

no worse than the cost of an optimally designed strategy where we’re sending

twice as much traffic. You might say that that

two over there was not your favorite

degradation constant, that you have to run this

twice as much traffic. I want to point out that

this is as opposed to much, much, much worse things like, what’s usually known as

the tragedy of the commons, where the degradation is, degrades with more participants,

the worse you’d get. Price of anarchy formally

defined as follows, classically it’s about

Nash Equilibrium. They take the utility of Nash and compare it to the optimum. Then a sequence of papers originally studied

by Blom and co-authors, and then a very nice paper

with Tim Roughgarden, got generalized to

learning outcomes. What they said is, we repeated the same game

capital T times, take the utilities of

the player summed over time, and compare it T

times the optimum. At that point the game and

the players were stable, so the optimum didn’t change. And what I really would

like to do and this is the last one is I’m going to

keep changing the game too. So instead of having T

times the optimum down here, I have to change it some over the optimums of all those games. And that’s what I really

would like to shoot for. Our actual model

goes as follows, it’s the same game like we’re still

playing traffic routing every single step but

players get replaced. And I guess we call

this dynamic population. Every iteration, there is a parameter which I can tell you a little bit

more about in a sec. With some probability P, every player vanishes, and gets replaced with

an arbitrary new player. So what I mean here, because every player vanishes

with some probability P, if there are n players than a noticeable fraction

like Pn of them, lots of them, vanish

every single step. So something changes

all the time. And because the change

is arbitrary, if you stayed in the game

you have to keep watching. The situation might change

and you have to keep learning. But at the same time, because you only vanish

with probability P, any particular pair will stay long enough that

learning will be viable. And in particular, I have to keep them in long enough that in my no-regret

learning assumption, the regret error will not

dominate what they got. That is, so that their regret

goes down to zero. So I have to keep them around long enough then in expectation, they regret it or goes

down to zero. Yes?>>Things changed

dramatically if players were getting replaced

with higher probability, if they had already

incurred a very high regret? Sort of modeling this idea

of ruin, like gambling. People who are currently feeling like they’re

losing in this marketplace, they are likely to be-.>>More likely to leave. No. We would be very, like as long as, you know if anything

it would help me. It certainly won’t

hurt. It’s very tolerant of which fraction, very important that you not replace a very big fraction at a time so that most of

the people can learn. One way that the results are not as good as can be

is that I’m only looking at social

welfare which means I’m averaging

everyone’s behavior. So there could be people who are doing really

badly and my average doesn’t hurt because

there’s so many of them and you’re actually helping in that category

not hurting. So your way of replacing them

would only make it better. Problematic people would vanish.>>Maybe you also, have people leaving because

they’re winning.>>Right. That I

need to think about, that might actually hurt. Like, I have to be careful whether I can keep

the result that way. Deet is helping because his is making the problematic

people go away. Good questions. So, we’re

doing this in a class of games that team called

Smooth games that actually I’m using

my version on auctions. I won’t spend a lot of time explaining what

these games are maybe a one sentence summary is and there will be two sentence summary

between two things. One sentence summary is

that almost all the Price of Anarchy regrets

are of this form. No actually that’s the next

sentence so it’s not here, are of this class but given that I probably shouldn’t

keep talking for too long, I won’t spend too much time

explaining what the class is. I want to tell you

one line of how it gets proven which I think is important to sort of think about what is there to prove. So, all we have is the no regret assumption and

all these proofs need is, so here these guys in this particular

example of playing a sort of auction game

that they want to win one of those colors

things and maybe they have different values for

different colors and they bidding on one of them and that’s a game because

you have to decide how much to bid and which

one to bid on and, of course, as usual

with any of these, the best action might

depend on what the, how the other guys are doing, but the interesting thing in the proof that in

order to prove things, I don’t have to always

figure out what’s best, I don’t have to do what’s best. There is a particular action

that I should not regret, is the optimal solution. So, if my optimal thing is to win that red assignment over here then [inaudible] I should

not regret bidding on that guy. That is you’re not

asked to be on that guy, you’re not asked to

know what that guy is, but you’re not regretting

anything, in particular, you’ll not regret

that one either and that’s the only thing

that proof uses. That is, that there is one thing the optimum should not regret that behavior and that’s the assumption that

the proof uses, there’s lots and lots of

papers that use this kind of proof technique and I’m going to maybe go to the skipping a couple of things

I might come back to. Showing you the one

technical other thing I wanted to show you. What I want to do with this

changing population ideally, I would like you to do

this not just not regret the optimum but not

regret the optimum every single step and that’s

actually went too far. Even if I take one guy out, the optimal might

drastically change just to give you an example of

replaying a matching game. Here’s the optimum,

one guy left, of course, the optimal changed for

everyone’s cause and I’m running pass and

everything changed. But if you look careful you can design something

and I’m definitely cheating here called a Stable Optimum that is a sequence

of optimal solutions, that stay stable, that is, don’t change much

or rarely change or for any one player

change very low. I have two techniques or we have two techniques to generate such a thing one is specialized to particular

things like matching, for example, the greedy

matching is much more stable than

the optimal matching. So, running a greedy

like system is better but maybe an

even better there’s a whole technique

developed out there called Differential Privacy which

since Cynthia Dwork is a Microsoft person or used to be a Microsoft person

hopefully you guys all know about

differential privacy. If you think out

what differential privacy is trying to do, hey, it’s the same thing,

it’s a different argument. They want the solution

to be not sensitive to whether one guy

participates or not. That’s literally what I wanted too for a different reason. So, I can inherit the whole differential

privacy literature and generate theorems

of the form and maybe that there’s many many of these games we can have the Price of Anarchy error

which has to be covered. Even this changing population, I need to have

the probability not go below something

like one in lock B, this is because I want

everyone to stay around for at least log B time so

that the regret error will not dominate

the behavior in my bounds and the bounds I get is a product

of three parameters. Of course, I lose the Price

of Anarchy bounds. It’s the same Price

of Anarchy proof if there was a bound there, I’m recovering kind

of the same bound. I have two losses both of which go to one as things get better. There’s a regret error trouble, I have to accommodate

the regret error if the probability is high then

they don’t stay long enough, regret error is relatively high as I keep

people around longer, regret error goes to zero

and the second is the; how much it cost me to

have a stable solution. As the game gets bigger, a higher population

as you might know from Cynthia’s work or Cynthia work has worked with

many other people, the stable solution

get closer closer to optimum as it’s

a large population game. In a smaller population it’s harder to make it

differentially private. So again, if

the population is high and I kept people around

relatively long time then I recover the Price

of Anarchy bound otherwise there is

extra little bit of degradation around. And I guess I want

us to end with, I hope is the beginning I did convince you that learning

is actually good assumption. It is not a bad way to adapt. Is much better than Nash. If I play Nash against

a bad opponent and I will not pick on either Adat or Nikhil as playing

the bad solution. If I play Nash in

rock-paper-scissors no matter what

the opponent does, I get 0 as my payoff. If the opponents’ playing

badly I should beat him. So, it’s a better way to

adapt to the opponent, no need for the common prior and that is a tiny bit of the hours and

minutes I tried to convince you that learning

actually prepares do well in even dynamically changing environments which I

think is a nice feature. What I didn’t do and that’s

a slide I skipped through in the middle is

the open question part and maybe I should end with

that is that I did criticize learning for the kind of

thing Nikhil picked up on, that I should predict what the opponent does

and react to that. There are a few papers that

trying to say things of what happens when you

do such a prediction. At the moment they

always get beaten, that is they show

something you could do with such prediction

and then say hey but that’s happening anyhow even if they’re not

doing any prediction. So, at the moment we are not aware even though there

were attempts at this, I’m not aware of any papers that says something

better happens to either the players or the game overall if people play in a way that’s more

predicts the opponent. But I think there is a pastor and there’s

something good should happen. It’s clearly the right

thing to do and at the moment I can’t prove

anything interesting, so I’m going to stop

here and thank you.>>If you go down that road of trying to predict what

the opponent is going to play and how he’s going

to or she is going to respond to your

[inaudible] will you then have to model

the occupation time? Do you have to think

about limited resources?>>I certainly should. I mean the actual

attempts that are there over a very much

more simplistic attempts. What they say is any of these no-regret

learning algorithms and multiplicative aid, for example, but all of them they adjust the probability

distribution very slowly. That’s an interesting fact. Know this fact, that is, know that the distribution today is virtually

the same as yesterday, and as a result, ran something that they called the Optimistic Mirror Descent. Optimistic, meaning that

it’s recency biased, that this very heavily takes

into account what you did today and your proof

syncs with that. That doesn’t need,

it’s just optimistic. It doesn’t need

extra computation. It didn’t do the game that Miguel was inducing

me to think about, that if I’m going to take

advantage of what you’re doing, you should predict what

I’m doing is react to that. You just assume that we’re both running this form of

Optimistic Mirror Descent, rather than having one of us run a best response to the Optimistic

Mirror Descent, which is not itself

Optimistic Mirror Descent. So, you’re right, there’s

a computational problem here, once I take this to

the proper level, these papers do not

at the moment. Yes>>Can you connect that with the counterfactual regret

matching kinds of algorithms that,

like recently, the Poker Texas, the Heads-up Poker thing was

solved by Libratus at CMU. And they were essentially doing that kind of

reasoning and they essentially through

insane amounts of compute, and the group forced

trying to reason through all steps of

bidding and betting. And to quote, it was

a regret matching algorithm, but they had to precisely think about what would

the player’s best response be to this and [inaudible]>>Yes. So, I guess as Moore

actually points out that, an issue that I swept under the rug despite other issues

that I focused on, is that the games

I was playing here, which is both traffic routing and this bidding in ad auctions, are compared to poker, they’re a super simple game. It’s like you’re

choosing a pass, you’re choosing a bid, and this multi-line

games such as, even that particular

simple poker, they’re multi-line, the strategy space is

much, much more complex. Mainly the routing game, which has many paths, but you still choosing

edges really it’s, it’s a Shortest

Path Computation. It’s not, you’re keeping parameters on the edges,

that’s very manageable. Poker is a much, much

more complex game, and there is certainly a lot, even running these algorithms

I’m talking about will be problematic in a poker because there are

so many different strategies. They doable in pass or doable

in low-dimensional spaces. That’s a

higher-dimensional space. So, there is a

computational problem there, even with these

simple algorithms. They’re doing something

a bit more sophisticated, but the sophistication

is more coming in that they’re actually

capable of running this. Yes.>>Hey, so sorry

if I missed this, but what information is

the learning player being given? Do they get to see

all the opponent’s actions and utilities, or is that all hidden from them?>>Excellent question. Definitely, they don’t

see the opponent’s action. To get the actual bands to the extent there were bands

at all on these slides, they get information

of what would have happened if they have

done- so they need, there is the, what I

often call in learning, a full information

form feedback. That is, you don’t necessarily know what

the other guys are doing, but you know what would have

happened had you done that. That the sinking

of traffic routing, you drive on a pass, but for whatever reason, you actually get

traffic reports from everywhere including

where you were not.>>So all actions

I could have taken, I know what I will be recording?>>You know what

would have happened. This is based on traffic report, may be reasonable

on traffic routing. I’m not so sure it’s reasonable on internet traffic routing, but I guess they’re doing pics, which maybe can

approximate this. It is actually quite reasonable

in ad auction because the feedback thing they’re getting is giving

them payoff curves. So, they’re getting

this kind of, you guys, that is your company, is offering this feedback

to the bidders. So, that doesn’t say

who the opponent is, they have to go on the web. If they’re a pizza

company, honestly, probably, all of you know

the local pizza names. So they know who’s

advertising for pizza, but if not, they have to figure

it out the way you do it. Go on the web, but they do get the counterfactual effect as a form of feedback and

that’s what this assumes.>>So, they get a curve with

all the discontinuities as their bids drop below

other people’s bids, how things are suddenly changed?>>Yes. What they

get is feedback, is what would have happened

maybe not instantaneously, but aggregated over

some period and honestly, actually, I don’t know

the periods lengths. Maybe over the day or over, maybe some of you

know this here. What I’m reacting to, is you said with the drops, syncs are random enough that there aren’t

that many drops. I’ve seen those curves.

They’re pretty smooth.>>I see.>>And because it’s aggregated, I think on a day, or an hour or something. There’s some amount of aggregation on what

they’re getting.>>Okay.>>But, your company, as does Google,

feeds back curves. Curves like that and that’s

what these results made.>>Thanks.>>So the main->>Yes.>>So the way to look of the social behavior

whereas people are doing this naughty

Curve behavior. So, is there a main take-away, it’s like they converge, compared to like

doing a best response as in Nash Equilibrium

doesn’t converge? Because, the [inaudible] if you

make like these game are smooth, then they would have held it even for pure Nash

Equilibrium kind of thing. So, like what is

this, separates this? Is it just the convergence that this modeling explains

that, okay people converge? Other, like if the Nash Equilibrium wouldn’t

explain that, okay, why would people who converge

to a Nash Equilibrium?>>So, I if I think

of traffic routing. So Traffic Routing

is one of those games where best response or learning on a single game

does converge to a Nash, because it’s a potential game. So they’re the

only plus getting, their are two pluses

will be getting added. The speed of convergence, that is, what’s

the regret error? How fast are they converging? And second, and maybe

that’s the end and I really rush through it that

I can switch the player set. That I’m switching

the player set so fast, then we roughly resend players. I keep them around

for Vanin Logan time. That is almost a

constant fraction of players vanishes

every single pair of timestamps. There is no way they’re

converging to any Nash, because the Nash keeps changing. Yet, they keep

the social welfare high. Right? They said

that I can do this, they can’t converge

to Nash because the game isn’t stable enough. If one player is around

for quite a while, so, he can do something, but the Nash moved

around because I kept deleting

and adding players. So, they can’t possibly

converge to the Nash, because there isn’t

such a thing as the Nash. Every single step

the Nash is different, and yet, they can make

statements about rougher.>>But that somewhat is, I feel, okay at least the way I look at is this

somewhat different. That’s much more like

stability because, like if again it’s smooth, if, even if you don’t

converge to exact Nash, but you can converge

to approximate Nash, you will still be good. Like the price of an RT bond

will still go through, if you make this

assumption that game is a smooth game. So,

I feel like this.>>So, let me actually,

so, inverting, indeed it would go to

a Nash and would go to approximate Nash

reasonably fast, and the only plus here is, I can swap the players in an ad, which I think you need to prove to me that

the Nash moves slow enough. I don’t think so, but if

you take auction game, auction game is not

a potential game.>>I see.>>So, there they’re not

converging to a Nash, in fact there is no guarantee

they’re converging period. They can alternate the way I showed you in the rock, paper, scissor, things never converge. That’s not converging. So, when something

is not converging, then this is

a separate statement, and I guess one thing I didn’t emphasize but

maybe useful to say, that what this game does, as I did say, it converging to the set

of correlated equilibrium. Now, what do I mean?

Correlated equilibrium is a convex set, internal set. It’s a convex set with

a very high dimensional space. So, when you converge to

a convex set or any set, you can do two things. You can converge to a point in that set and if they

were to do that, which they do it in routing, they do not do it

in ad auctions, or in item auctions. Then, the thing that they

converge to is a real Nash, but they can converge to

a set in a different way. Here is my set, and here

is how converge to it. I get closer and

closer to the set maybe I never went into

it and I certainly don’t, didn’t converge in

the classical sense, I just converge to the set. I’m really close the set, I’m circling around it. That’s what’s happening

in rock, paper, scissor. In that sense, they

didn’t stabilize, they didn’t converge, but every set in this whole

correlated equilibrium, the thing they’re

getting close to, has good social WI-FI. There I prove, there the system proves

something much stronger. They’re not converging, period, and yet social WI-FI is good.>>I see. So, we don’t know

such results for like Nash equilibrium,

you’re saying? Like if it doesn’t, okay it doesn’t maybe it

doesn’t even make sense.>>Well, it would make sense. Like I feel like I don’t

know such result because they couldn’t be true where there’s a little cheating

what I’m saying, if playing with converge

to a Nash Equilibrium, I almost would

prove that finding a Nash Equilibrium is

computationally easy. Not quite because I would

say approximating it easy and what we know right

now is finding it hard, not that approximating is hard. But sort of under

the belief that this difference is minor and really approximating

it is also hard, then that thing

should not be true. Learning should not

converge to a Nash, because finding a Nash is hard. This is also true to add

on the item auction game, finding a Nash is hard. So, this method cannot, I don’t have an example to prove to you that this

doesn’t converge to a Nash, but I know it cannot,

because this is doable and that one’s

computationally hard.>>Yes. Okay, thanks.>>Okay?

>>Thank the speaker.>>Thank you.

## 1 Comment

First to comment!… All…right. 😏