This site uses cookies to improve your browsing experience. By continuing to browse the website, you accept such cookies. For more details and to change your settings, see our Cookie Policy and Privacy Policy. Close

Alternatives to ICM?

    • jbpatzer
      jbpatzer
      Bronze
      Joined: 22.11.2009 Posts: 6,955
      Will have to give this some thought before I try my idea. I've added some extra results to my earlier post, but haven't got all the results yet as my MATLAB crashed last night. I'll get back to you later.
    • jbpatzer
      jbpatzer
      Bronze
      Joined: 22.11.2009 Posts: 6,955
      I've finished all the simulations now. I edited my original post, but here are the results again.

      0.65:0.35 (super turbo bubble)
      ---------------------------------------
      15 +0.017, 10 -0.009, 5 -0.008
      5 -0.010, 10 -0.004, 15 +0.014
      12 +0.006, 12 -0.002, 6 -0.004
      13.5 +0.008, 13.5 -0.003, 3 -0.005
      15 +0.016, 7.5 -0.014, 7.5 -0.002
      24 +0.016, 3 -0.012, 3 -0.004

      1:0 (freezeout)
      ---------------------------------------
      15 -0.007, 10 +0.001, 5 +0.006
      5 -0.001, 10 -0.002, 15 +0.003

      0.5:0.5 (satellite)
      ---------------------------------------
      15 +0.028, 10 -0.008, 5 -0.0020
      5 -0.032, 10 +0.014, 15 +0.018

      Just to illustrate two extremes. Here's ICM working well for a freezeout (1:0:0).


      And here's ICM doing very badly for a satellite (0.5:0.5:0).


      Now more thought needed.
    • jbpatzer
      jbpatzer
      Bronze
      Joined: 22.11.2009 Posts: 6,955
      Originally posted by muebarek

      I have some questions on how you’re planning to implement this:
      - Do you want to shift tournament equity (I’m gonna say TEQ) directly or do you intent to correct the stack and plug that corrected effective stack sizes (which aren’t the real ones any more) into ICM to get the new TEQ? You suggested k = 0.25 yesterday which makes me think you’re gonna use the second method. For a direct TEQ-shift I plugged in some numbers and a k around 0.004 seemed to give reasonable results (the maximum resulting EVdiff would then be 0.01 which fits your data quite ok)

      Strangely my brain was thinking 'shift TEQ' but my fingers were typing 'shift s'. I have it in mind to shift TEQ.


      - How do you handle a 13-12-5 distribution for example? In which ratio will you split up the equity you’re taking from the 5bb stack among the bigstacks?


      But fortunately I came up with an (very naïve) idea on how to expand your idea regarding position:
      Judging from your results so far, it seems bad to have a shorter stack to the left and a bigger one to the right. So I’d correct every stack’s tournament equity by

      a*(sleft – s) + b*(s – sright)

      where a and b could depend on the payout-structure (raise with flatter payouts). My guess would be that choosing a and b around 0.2*k (if you make a direct TEQ-shift that is) might give reasonable corrections. Of course it seems too simple making it linear but at least that guarantees the conservation of total TEQ = total pricepool.

      What do you think about this?
      This may be a good idea. I'm going to think about it some more then just try something.
    • muebarek
      muebarek
      Bronze
      Joined: 31.07.2008 Posts: 532
      Originally posted by jbpatzer
      I'm going to think about it some more then just try something


      Now more thought needed.
      I agree. We now have to see if we’re able to cover the effects your simulations showed up somewhat properly in a model. Maybe there really isn’t much more required than the two ideas we came up with so far. But as often the devil is in the details.

      Originally posted by jbpatzer
      Just to illustrate two extremes. Here's ICM working well for a freezeout
      This makes sense since ICM is equal to cEV in a winner takes all tournament! Given equal skilled players this means basically Chips ~ $ and so your ICM TEQ for this structure only depends on your own stack!

      Originally posted by jbpatzer
      And here's ICM doing very badly for a satellite (0.5:0.5:0).
      These differences are huge! Imo, this shows that a possible ICM+ would not just be a gimmick but quite an essential improvement on tournament poker theory!

      I’m going to toy around a bit more with our first attempts. Speaking in our variables, it seems quite clear that if you take F as the first place payout, that
      a(F) and b(F) should be decreasing monotonely in F with a(1) ~= b(1) ~= 0.
      I’m not sure about k yet besides k(1) ~= 0 seems reasonable as well.
    • pzhon
      pzhon
      Bronze
      Joined: 17.06.2010 Posts: 1,151
      I suggest trying to separate the effects of position relative to other stacks and the stack dynamics. Eventually, incorporating all complexities at once might be good, but I would take smaller steps up from the ICM. For example, you could randomize positions after each hand. This gives you an equity function on the same space as the ICM, R+^n/S_n. Once you have an improvement there, you can break some or all of the symmetries.

      The effect of the schedule of future blind increases depends on position. Since you are not incorporating the blind schedule (it is an interesting enough problem with fixed blinds), symmetrizing positions is a more natural simplification.
    • jbpatzer
      jbpatzer
      Bronze
      Joined: 22.11.2009 Posts: 6,955
      That's an interesting idea.
    • muebarek
      muebarek
      Bronze
      Joined: 31.07.2008 Posts: 532
      Originally posted by pzhon
      I suggest trying to separate the effects of position relative to other stacks and the stack dynamics. Eventually, incorporating all complexities at once might be good, but I would take smaller steps up from the ICM. For example, you could randomize positions after each hand. This gives you an equity function on the same space as the ICM, R+^n/S_n. Once you have an improvement there, you can break some or all of the symmetries.

      The effect of the schedule of future blind increases depends on position. Since you are not incorporating the blind schedule (it is an interesting enough problem with fixed blinds), symmetrizing positions is a more natural simplification.
      Nice to have you back in the discussion :)

      Your suggestion is indeed interesting but I think we can already extract the random position case from what we have so far. Since randomizing position effectively does average both cases (since they will appear about equally often for a big number of simulations – judging from the graphs it seems like jbpatzer’s 10,000 is sufficient).

      So we should get the EVdifference for the random position case by taking the mean value of the two cases we already have. What are your opinions on this?
    • jbpatzer
      jbpatzer
      Bronze
      Joined: 22.11.2009 Posts: 6,955
      So I followed my usual instinct with modelling problems like this and tried the simplest model I could think of, which is to transfer some fraction k of the smallest stack's TEQ, as calculated by ICM, to the largest stack. With this ICM+ function and k = 0.035 the simulated TEQ agrees pretty well with the predicted TEQ for initial stacks of 5, 10 and 15 and a 0.65:0.35 payout structure. I then played two ICM players off against an ICM+ player with equal starting stacks, with disappointing results. The ICM player to the right of the ICM+ player did slightly better than either the ICM+ player or the other ICM player, but the effect was small.

      Graph of the 5 10 15 simulation for three ICM players


      Graph of the 5 10 15 simulation for three ICM+, k = 0.035 players (should say ICM+ in the figure)


      Graph of the 10 10 10 simulation for one ICM+, k = 0.035 player and two ICM players


      To head off a question, it's not obvious that by tuning k to the 5 10 15 starting stack case I would get something that's not generally applicable, because the stack sizes will fluctuate during each individual tourney, so a lot of different stack sizes are being sampled. But it didn't work. If the big stack is just above 10BB and the smallest stack just below, it's probably not a very good idea to be transferring 3.5% of the smallest stacks TEQ, so I'm still hopeful that a nonlinear function with more tunable parameters might do the trick.

      I like pzhon's idea, so I'm going to run a set of simulations with the positions of the players chosen randomly before each hand so that we can isolate the effect of stack size, and then think again. I'm going to do this for the satellite payout structure (0.5:0.5) because the effects are larger, and should therefore be less prone to contamination by noise.

      EDIT: I just set 11 simulations going, which should take about 2 days. In the meantime, I have an exam to set! :s_evil:
    • pzhon
      pzhon
      Bronze
      Joined: 17.06.2010 Posts: 1,151
      The ICM is continuous. There really should be a slight discontinuity (where the shorter stacks are equal) due to the multiway penalty on overcalls, but transferring a fixed proportion of the short stack's equity to the chip leader does not seem accurate. It forces large, strange jumps around 3 equal stacks and when the larger two stacks are equal. This may cause the ICM+ player to behave erratically, decreasing both the benefit you will find with your model and the distance between ICM and ICM+ for the optimal k value.

      One way to look at the adjustments needed is to use the EQDiff reported by the HoldemResources Nash calculator, averaged over the positions of the big blind. I trust this over the intuitions of most players. Most players think it is a far larger disaster to hit the blinds (or to post the big blind all-in) than the Nash calculator says, and I think most players are simply wrong. Anyway, I have not systematically tested the average EQDiff, but I think you may find that the second stack is the one who wants to end the bubble, whose equity is lower than the ICM predicts, not the short stack as your model assumes. The second stack is quite risk-averse against the big stack, and has to endure ATC pushes until the bubble is over, while a clear short stack is generally not so risk-averse.

      Your suggestion is indeed interesting but I think we can already extract the random position case from what we have so far.
      No, not unless you are doing something different from what I understand from this thread. Randomizing positions mean there are different transition probabilities between the states, including transition probabilities which are not averages of the transitions for different orderings. Randomizing positions mean that if you wait, you might be in a better position next time relative to the big stack without repeated collisions between your opponents to transfer the chips.
    • jbpatzer
      jbpatzer
      Bronze
      Joined: 22.11.2009 Posts: 6,955
      Originally posted by pzhon
      The ICM is continuous. There really should be a slight discontinuity (where the shorter stacks are equal) due to the multiway penalty on overcalls, but transferring a fixed proportion of the short stack's equity to the chip leader does not seem accurate. It forces large, strange jumps around 3 equal stacks and when the larger two stacks are equal. This may cause the ICM+ player to behave erratically, decreasing both the benefit you will find with your model and the distance between ICM and ICM+ for the optimal k value.
      Ofc I realize the model I suggested is far too simple, but I wanted to try it anyway. My usual road to the right answer (whatever that means) involves going horribly wrong over and over again before stumbling across the solution. I am slowly assimilating the various pearls of wisdom that have been cast before me in this htread.

      Originally posted by pzhon
      Your suggestion is indeed interesting but I think we can already extract the random position case from what we have so far.
      No, not unless you are doing something different from what I understand from this thread. Randomizing positions mean there are different transition probabilities between the states, including transition probabilities which are not averages of the transitions for different orderings. Randomizing positions mean that if you wait, you might be in a better position next time relative to the big stack without repeated collisions between your opponents to transfer the chips.
      I agree.
    • pzhon
      pzhon
      Bronze
      Joined: 17.06.2010 Posts: 1,151
      The simplicity isn't what bothers me. I'm imagining that you have some high dimensional hill-climbing problem around the ICM, you pick a direction, and see how the function improves in that direction. As long as the direction is not orthogonal to the gradient, you should be able to improve along that line.

      I'll use the HoldemResources.net Nash calculator to look ahead one step from the 5-10-15 distribution with 65-35 structure, randomizing positions.

      5-10-15
      10-5-15
      10-15-5
      5-15-10
      15-5-10
      15-10-5

      5: +0.01259 -0.00615 -0.0172 +0.01481 -0.00371 -0.01487, avg. -0.00242
      10: -0.00721 +0.01 +0.01669 -0.0186 -0.00786 +0.00162, avg. -0.00089
      15: -0.00539 -0.00386 +0.00051 +0.0038 +0.01157 +0.01325, avg. +0.00331

      So, if everyone plays according to the ICM, then on the next hand, the short stack loses the most, the medium stack loses a little, and the big stack gains. However, all changes are less than a third of a percent of the prize pool. It is likely that the pushes will not be called, so the bubble will last a few hands on average, so you might expect larger changes generally in the same direction as you get over one hand.
    • jbpatzer
      jbpatzer
      Bronze
      Joined: 22.11.2009 Posts: 6,955
      Originally posted by pzhon

      5: +0.01259 -0.00615 -0.0172 +0.01481 -0.00371 -0.01487, avg. -0.00242
      10: -0.00721 +0.01 +0.01669 -0.0186 -0.00786 +0.00162, avg. -0.00089
      15: -0.00539 -0.00386 +0.00051 +0.0038 +0.01157 +0.01325, avg. +0.00331

      So, if everyone plays according to the ICM, then on the next hand, the short stack loses the most, the medium stack loses a little, and the big stack gains. However, all changes are less than a third of a percent of the prize pool. It is likely that the pushes will not be called, so the bubble will last a few hands on average, so you might expect larger changes generally in the same direction as you get over one hand.
      OK. Remember though that once you start using an ICM+ function, you shift the position of the equilibrium. Are you suggesting (or maybe I'm suggesting?) that an ICM+ function should give averages of zero in the above calculation? That would actually be much easier to calculate than doing 10000 simulations, as I could probably wrap another iterative loop around the Nash calculation and adjust parameters in a sensible ICM+ model to force the averages to zero. Needs a sensibly parameterized ICM+ model though, and thus we continue round in a circle! :f_mad:
    • muebarek
      muebarek
      Bronze
      Joined: 31.07.2008 Posts: 532
      Originally posted by pzhon
      Your suggestion is indeed interesting but I think we can already extract the random position case from what we have so far.
      No, not unless you are doing something different from what I understand from this thread. Randomizing positions mean there are different transition probabilities between the states, including transition probabilities which are not averages of the transitions for different orderings. Randomizing positions mean that if you wait, you might be in a better position next time relative to the big stack without repeated collisions between your opponents to transfer the chips.
      You’re right about that. I did not think it through sufficiently. :f_frown:


      Originally posted by pzhon
      Anyway, I have not systematically tested the average EQDiff, but I think you may find that the second stack is the one who wants to end the bubble, whose equity is lower than the ICM predicts, not the short stack as your model assumes.

      [...]

      5: +0.01259 -0.00615 -0.0172 +0.01481 -0.00371 -0.01487, avg. -0.00242
      10: -0.00721 +0.01 +0.01669 -0.0186 -0.00786 +0.00162, avg. -0.00089
      15: -0.00539 -0.00386 +0.00051 +0.0038 +0.01157 +0.01325, avg. +0.00331

      So, if everyone plays according to the ICM, then on the next hand, the short stack loses the most, the medium stack loses a little, and the big stack gains.
      Actually, this is a good example of how counterintuitive the math can be. I agree that one normally would expect the medium stack to be overevaluated more than the short stack since (like you said) he’s the one with the highest risk aversion against the bigstack, so his ranges “suffer” (tighten up) the most on the bubble. I don’t really have a reasonable explanation why the shortstack ends up being the most overevaluated by ICM if one runs the numbers.

      0.5:0.5 (satellite)
      ---------------------------------------
      15 +0.028, 10 -0.008, 5 -0.020
      5 -0.032, 10 +0.014, 15 +0.018

      The satellite whose bubble should be a nightmare scenario for the 10bb stack even makes it a lot worse for the short stack according to these results. pzhon, I guess you played a lot more SNGs than jbpatzer and me combined, do you know how to solve this (seemingly) contradiction?

      Originally posted by pzhon
      It is likely that the pushes will not be called, so the bubble will last a few hands on average, so you might expect larger changes generally in the same direction as you get over one hand.
      jbpatzer is currently running simulations with randomized positions so we’ll probably have a good approximation of the real changes available tomorrow. I agree with your expectation of larger differences.

      Originally posted by jbpatzer
      [... ]
      as I could probably wrap another iterative loop around the Nash calculation and adjust parameters in a sensible ICM+ model to force the averages to zero.
      Needs a sensibly parameterized ICM+ model though
      That's the crux. Without a good parametrisation you'll always just manage to make the differences disappear for certain stack distributions.
    • jbpatzer
      jbpatzer
      Bronze
      Joined: 22.11.2009 Posts: 6,955
      Originally posted by muebarek

      0.5:0.5 (satellite)
      ---------------------------------------
      15 +0.028, 10 -0.008, 5 -0.020
      5 -0.032, 10 +0.014, 15 +0.018

      The satellite whose bubble should be a nightmare scenario for the 10bb stack even makes it a lot worse for the short stack according to these results. pzhon, I guess you played a lot more SNGs than jbpatzer and me combined, do you know how to solve this (seemingly) contradiction?
      These results show that things are worst for the short stack if he plays according to ICM. With an ICM+ that correctly evaluates the TEQ, it might be that the medium stack does worst. Presumably you mean that it should be bad for the medium stack because the big stack pushes on him so often, and he can call so infequently, but it's precisely the 'so often' bit that ICM doesn't capture.
    • muebarek
      muebarek
      Bronze
      Joined: 31.07.2008 Posts: 532
      Originally posted by jbpatzer
      These results show that things are worst for the short stack if he plays according to ICM.
      Yeah, I do somehow agree. But I admit I don't understand the reasons for this yet.

      Are the nash ranges suggested by ICM that off for the shortstacks? Does playing accordingly to these ranges cause them to have a losing strategy (maybe ICM misevaluates other stacks more than the short stacks but the short stacks' strategy is affected the most)? Or aren't the ranges that bad after all and it just isn't possible to get much more tournament equity out of a short stack (which means that ICM overevaluates them in the first place but it wouldn't have a big effect strategywise)?

      Probably both is part of the truth (since they influence each other). But to what degree?

      Just a hypothetical question to the extreme case (I don't think it's the case but imo considering even the weird options might help):
      Can we safely exclude that it might be possible that ICM even gives a short stack too little equity which causes him to make bad calls in the ICM equilibrium (not realizing he's more risk averse than ICM says)? Pushing/Calling that bad makes us see a even lower TEQ than ICM predicts. Is there a valid argument for us to neglect this (vague) possibility?
    • pzhon
      pzhon
      Bronze
      Joined: 17.06.2010 Posts: 1,151

      OK. Remember though that once you start using an ICM+ function, you shift the position of the equilibrium. Are you suggesting (or maybe I'm suggesting?) that an ICM+ function should give averages of zero in the above calculation?
      True equity should be a martingale (in the technical sense, not the betting progression, of course), so the average should be 0 if the Nash equilibrium is calculated with respect to the equity function.

      I am surprised that the short stack lost so much equity according to this calculation. I remember doing other calculations where the medium stacks were squeezed more. However, I would follow the data rather than my a priori intuition, which might be relative to the popular idea that short stacks are worthless, and should be pushed with weak hands and insufficient folding equity.
    • jbpatzer
      jbpatzer
      Bronze
      Joined: 22.11.2009 Posts: 6,955
      Originally posted by pzhon

      OK. Remember though that once you start using an ICM+ function, you shift the position of the equilibrium. Are you suggesting (or maybe I'm suggesting?) that an ICM+ function should give averages of zero in the above calculation?
      True equity should be a martingale (in the technical sense, not the betting progression, of course), so the average should be 0 if the Nash equilibrium is calculated with respect to the equity function.
      I don't know the theory of martingales. Is that something relevant to poker that I should read up about? (Remember that my background is in partial differential equations.) So should I interpret your reply as answering my question with a 'yes'?
    • pzhon
      pzhon
      Bronze
      Joined: 17.06.2010 Posts: 1,151
      There is a crucial result in the theory of martingales which is useful both for normal analysis of poker and for the modelling you are doing.

      A (discrete) martingale is a sequence of random variables Z(0), Z(1), Z(2), ... so that the expected value of the step Z(n+1)-Z(n), conditioned on all values up to Z(n), is 0. If you assume players are equally skilled, then equity is supposed to be a martingale.

      Given a martingale with this notation, a stopping time is a random time T so that whether T=t only depends on the values Z(1),...Z(t). You can't say, "Turn before the bridge that will collapse if you drive on it" or "Stop before the first time you will toss 3 heads in a row." Whether you stop at time t can only depend on the martingale's values up to time t.

      Optional Stopping Theorem:
      Given a martingale Z(t) with stopping time T, if
      1) E(T) is finite, and
      2) E(|Z(t+1)-Z(t)|) is uniformly bounded by some constant c,
      then E(Z(T))=E(Z(0)).

      The assumptions are satisfied very easily when Z is a share of the prize pool. To see a failure, see the martingale betting progression, where you double your bet after each loss and stop at the first win. Assumption 2 is not satisfied, and neither is the conclusion.

      What this means is that the current equity equals the average over the situations you get looking a little ahead, drilling down into the current position. So, your equity now should equal your average equity after 1 hand, or 10 hands, or after a player is eliminated, or at the end of the tournament.

      If your model equity function is not a martingale, then it can't be the tournament equity in all situations. On the flip side, if your model equity function have average step size 0, and is correct at the end of the tournament, then it must be equal to tournament equity. If you can bound the size of the average step, and the average length of the tournament, then you can estimate the difference between the model equity function and the true equity.

      Btw, the standard notation is X instead of Z, but X is used in various smileys like X( so it is hard to write X(0).
    • muebarek
      muebarek
      Bronze
      Joined: 31.07.2008 Posts: 532
      Hm...

      So if one could simulate tournaments of finite length with equal skilled players (ignoring position) whose stacks would be on average the same after each hand, one might get quite a decent approximation of real TEQ?!

      Did I get you right?

      EDIT: If I'm totally misunderstanding, ignore the following.

      So I’m gonna present the model I came up with about a year ago since I think it makes sense in conjunction with pzhon’s post about martingales.


      Transition model
      ---------------------------

      Let’s assume N equal skilled players with stacks S1, S2, …, SN are playing a poker tournament with randomized positions and a payout structure (f1, f2, …, fk) [k < N; f1 >= f2 >= … >= fk]. Let’s assume further these players do know their exact real TEQ and chose their shoving and calling ranges always exactly in the real TEQ nash equilibrium.

      Real TEQ has to be a martingale. So in their game with randomized positions, on average the expective value of their stacks after playing a hand has to be equal to their stacks before playing the hand (This is true since the stacks are the only difference between these players so the TEQ only depends on the stack distribution in a fixed payout structure).

      What happens in the game of these N perfect players when they play a hand is that some players win some chips and some other stacks lose chips (of course the total number of chips is conserved). But on average neither one will have a change in his stack since TEQ is a martingale.

      This can be used to design a simple model. For the sake of simplicity, I assume hands to play out the simplest way possible and that is to just assume a “hand” as transition of states where the stack Si gains C chips and one other stack Sj loses C chips.

      (…, Si, …, Sj, …) --> (…, Si+C, …, Sj-C, …) (*)

      If one chooses the transition-probabilities equal for each i and j (this means the probability of gaining/losing chips in a hand is 1/N for each player), it is guaranteed to not change the stack distribution on average! So the change of state (averaging over all transition cases) does not change the TEQ of any stack. So this simple model kind of creates the situation of “perfect players” playing hands against each other with the restriction of simplifying to only allow transitions like (*). Of course if one stack busts, he can’t gain chips anymore and N is reduced. If one lets them “play” a lot of tournaments (I usually take 100,000) and watches the outcome, one should get an idea of how much equity each stack has. Note that we don’t have any knowledge about their ranges (if we had, we wouldn’t discuss this ^^)! The only thing we use is knowing TEQ is a martingale and that playing a hand in poker means that chips usually change their owners.

      So after reading pzhon’s post on martingales, I think this model may yield a decent approximation of the real TEQ in the case of randomized positions and equal skilled players.

      If you think that it might be worth a try I could run the model to calculate its TEQ predictions for lots of different stack distributions so that we could try to parameterize the results and then have the outcoming nash ranges calculated by jbpatzer’s program and see if they stand a chance against ICM.

      Sry in advance if this turns out to be complete bullshit :f_biggrin:
    • jbpatzer
      jbpatzer
      Bronze
      Joined: 22.11.2009 Posts: 6,955
      I have to say I'm struggling a bit here. All my experience is in deterministic systems, so I'm going to need some serious digestion time for the previous two posts! In the meantime, my simulations seem to have slowed down a little, presumably because randomly permuting (1,2,3) before every hand is more time consuming than I expected (the MATLAB command is 'randperm(3)'). E.t.a. for results is now tomorrow morning.