Professional Card Counting Simulation
FROM ET FAN:
The Last 0.01%—IndistinguishableBy ET Fan
(From Blackjack Forum Vol. XXVI #1, Spring 2007)
© 2007 Blackjack Forum
Professional gamblers are hard nosed bastards. They have to be. When your livelihood depends on understanding every gamble better than your opponent, self-delusion is the mole, the fly in the ointment, the saboteur. So when a new tool appears on the scene it's natural for professionals to be skeptical. When the tool is a piece of sophisticated software that also happens to be free, a first-rate gambler would be foolish to bet on it without some solid, credible evidence.
In my last article "PowerSim: No Compromise on Accuracy," I provided several reasons why my free, open source blackjack simulator should be taken seriously. However, it was pointed out that a blackjack simulator has no credibility unless it produces results similar to older, established simulators, "or at least in the same ballpark." I agree. However, my nose is affixed so firmly to my face, I'm not happy just being in the same park. When two simulators are set to run identical games under identical conditions, the results must be statistically indistinguishable—or there's a problem with one of the simulators.
What follows is one man's attempt to get his nose on straight. A detective story, if you will, outlining dozens of sims, months of self-inflicted torture, and the partnership that finally saved my sanity, the end result being to establish and buttress the claim of accuracy for both PowerSim and BJStrike—a professional level simulator, index generator and practice program that's been some nine years in development.
"It is quite a three pipe problem . . . " —Sherlock Holmes
Background on the Case
Until recently, there was a 2^31 (just over 2 billion) round limit on the number of rounds PowerSim could handle. This was due to certain natural limits on the range of integer variables in a 32-bit language. And any time I compared PowerSim results to published sources, they agreed within one or two (occasionally 3) Standard Errors (StErrs).
All well and good. But then the 2 billion round limit was lifted. I have spent the last few months running 10 and 20 billion round simulations, and found some unsettling discrepancies with, for example, the sims reported in Blackjack Attack ed. 3. At the 10 billion round level, hundredths of a percent become significant in the win rate as well as a number of other areas, and I eventually did get results coinciding to within 0.01%. But that last hundredth of a percent just wouldn't go away. (Have you noticed I've been strangely quiet the last few months?)
I eventually concluded there were just too many tiny details that could influence that 0.01%. Things as subtle as whether you round exactly 3.25 decks (169 cards) to 3.5 or 3.0 when doing "round to nearest half-deck," or whether you count the holecard as "unseen" after the dealer checks under his ace (since now you know it isn't a ten and its tag will be positive, on average).
I have a list of about a dozen other details that can influence a hundredth of a percent. There might easily be another dozen that haven't occurred to me. What I really needed was direct access to a simulator, and direct access to its programmer. Someone with the proper credentials, and enough interest to respond to a hundred needling, potentially embarrassing questions in the pure interest of Science. I found that person in KarlD, creator of BJStrike.
KarlD is an avid blackjack player and counter. I've seen his handle on the newsgroup rec.gambling.blackjack going back many years. He also has a Ph.D in Computer Engineering, and currently works for a major computer company. Like most advantage players, he wishes to preserve a degree of anonymity, but he shares my passion for precision and mentioned that several years ago he went through a similar process of hard-nosed comparisons.
Let me briefly mention the important part open source simulators play in the science of blackjack. The free and open exchange of ideas and methods is a critical part of the scientific method. For example, if you invent a cold fusion process, but choose to keep the details on how it's done a secret, you may be doing the optimal thing for your wallet, but you sharply curtail your contribution to Science and to the human race. To my knowledge, PowerSim is the first serious, open source blackjack simulator, and as such it plants the discipline of card counting more firmly in the province of proven Science.
So while there's virtually no practical value in finding and destroying a hundredth of a percent here, a hundredth of a percent there (in the real world, one payoff error in 100 hours of play comes to about 0.01%), I sacrificed myself on the altar of Science, meanwhile driving Arnold and a half-dozen other people to doubt my sanity, and my status as blackjack geek of the omniverse.
I mentioned that I started out trying to compare PowerSim results with those in Chapter 10 of BJA3. I won't bore you with the details except to say I had some 13 different versions of PowerSim running night and day. I slowly learned to concentrate on the true count frequencies, since the StErr for those numbers is much smaller than the StErr of the win rate (although the StErr of TC frequencies isn't easy to calculate or interpret, since True Counts aren't normally distributed). Also, if the frequencies are different it's only luck when win rates happen to match. TC frequencies reflect more nuts and bolts type of information about a simulator. For instance, if the dealer made every conceivable kind of payoff error the win rate could be almost anything, but the count frequencies would be unaffected.
I corresponded with four knowledgeable users of CVCX and CVData, but none were able to provide me with the kind of tortuously precise information I needed on things like rounding boundaries. The bottom line is, based on information gleaned from the qfit web site, I was able to put together a version of PowerSim that differs from BJA3 Chapter 10 by approximately 0.01% in the frequencies for TCs < 0 and TC = 0—in the same direction. Both PowerSim frequencies were larger than those in BJA3 for the 5.0/6 S17 DAS game I was concerned with at the time. This means that the total number of positive TC rounds was smaller— even though all +TC frequencies agreed to the nearest hundredth of a percent, as reported in BJA3. This just means the lost 0.02% (give or take a few thousandths) was spread out over several of the ten +TCs listed.
You might think, with the lost positive TCs, that the overall win rate would be lower for any given bet spread. But no, after dozens of 20 billion round sims, the win rates attained with PowerSim remained consistently about 0.01% higher than the 0.83% reported in BJA3 for the play-all, 1-8 "practical" spread. (It's easier to compare practical spreads, since those bets are given precisely, not rounded.)
It's particularly puzzling that my TCs < 0 come in slightly more often than those in BJA3, since when TCs are floored, every negative running count results in a TC < 0, regardless of the method of half-deck rounding. Is it possible I "hit away" some one counts and some zero counts? I re-re-rechecked my code, and did tests on my random number generator. I rechecked my interpretation of the shufflepoint. I even wrote in the composition dependent exceptions to basic strategy mentioned by Cacarulo later in the book. It didn't help. However, it's not absolutely certain this was the strategy used in Chapter 10.
For masochists who would like to reproduce my failed (but close enough for any practical purpose) effort at emulating the Chapter 10 sims, I left code in REM statements toward the end of PowerSim, in the ESDecision, InsDecision, SplDecision, SurDecision, DblDecision, HitDecision, and BetDecision routines. They are marked "nearest half-deck pen estimation" or "comp-dependent exception." One word of caution: Don't try to reproduce the pitch games in Chapter 10 to within a hundredth of a percent. PowerSim assumes all cards are dealt face up (except the burn card and hole cards, of course). There is no provision for pitch, at the present time.
At about this time, I began emailing KarlD. I thought "why not drag HIM into this mess!?"
The Beginning of a Beautiful Friendship
Well, one email turned into two. Two emails turned into four... At one point, KarlD generously offered to let me install and use BJStrike, to try and answer some of my questions. (I think he could tell he was dealing with a sad, charity case.) Some sixty seven emails later, we might finally, possibly, be headed toward some answers.
If your tolerance for geekdom is at low tide today, you may want to skip ahead to "The Royal Road to Sanity."
Here are some "Things we learned along the way:"
PRINT w! * -21, INT(w! * -21)
Result: -6 -7 (As any school boy know, (-21)/3.5 = -6, and the INTeger of -6 is still -6.)
(This one caused a problem with TC conversion in my initial attempts to emulate BJStrike's 0.5 deck accuracy with the flooring option. w! is a pre-stored weight equal to 3.5 when the next card pointer is greater than 130 and less than 157. When the Running Count is -21, the formula INT(w! * -21) gave the wrong answer -7.)
These things did not affect the accuracy of PowerSim True Count conversions, because my formula did use a type of rounding (the INT function) and the calculation is all on one line. However, it did affect me as I tried to duplicate BJStrike results, since he did parts of his TC conversions ahead of time. He stores weights in an array, which saves a lot of time, and adds what I call an "empirical fudge factor" to make it accurate under normal conditions.
So at first, my BJStrike emulated TC conversions were off, then later, when I wanted to try abnormal accuracy (TC resolution at 1/52 or less, ie. exact machine-like conversions), I found the BJStrike fudge factor was just not up to it.
Oh the joy, the joy, the day that I proved, after setting up a custom shoe and count system in BJStrike, that TCs of -6 were occasionally wrong at the 0.011 accuracy level. Yes, dear PowerSim users, I have gone mad as a rainmaker in hell. At one point I warned Karl "You know, if you continue with this dialogue, you're liable to go crazy as me!" I feel I will need a lot of dealer error in my immediate future, to nurse me back to health.
"What one man can invent another can discover." —Sherlock Holmes
The Royal Road to Sanity
I eventually managed to emulate BJStrike's "half-deck accuracy" conversion method, with flooring and other systems, under a broad range of rules and conditions. This doesn't happen by accident. Our algorithms are quite different, our random number generators are different, but our frequencies match to within 0.005%, and win rates are within one or two standard deviations (ie., also less than 0.01%). No changes were made in the actual playing of the hands to achieve these results.
Here then, at long last, are the results to show there is no accuracy issue with Blackjack PowerSim. (There once was a brief problem with insurance under the European no hole card rule, which was promptly acknowledged and fixed.)
First up is the first sim where I successfully matched BJStrike results down to the last 0.01%. After some false starts, Karl suggested I run both simulators in running count mode. I also decided to strip away every possible option that could be causing a problem. I had to set up a special strategy file for PowerSim and write a strategy script for BJStrike. I wrote in code for PowerSim to track running counts and also to give total frequencies for all negative counts, and all positive counts greater than zero. There was to be NO doubling, NO splitting, and only one player. Total-dependent Basic strategy was used for hitting and standing only. The rules were 6 decks, S17, with Shuffle Point at 260 and no burn cards. Both programs played 10 billion rounds.
As you can see, PowerSim reports the stats in a different format from BJStrike. PowerSim gives the actual number of hands played at a given RC (easily converted to a percent, in this case, with 10G rounds), and it gives the win rate as a simple number, which you can multiply by 100 if you want the percentage.
Every single RC frequency agrees perfectly when rounded to the nearest 100th of a percent (which is the accuracy given by BJStrike) except the frequency at +11, which is 1.37499152% for PowerSim and 1.38% for BJStrike. This is no cause for concern since PowerSim is within 0.00001% of rounding to same number as BJStrike. In other words, the frequencies match as well as one could possibly hope. If they were any closer, I'd worry about the independence of the two programs.
The win rates also match up extremely well. In a few cases they are two standard errors apart, which is to be expected. Also notice that the win rate turns positive at a count of +18, according to both programs. Nice to know that with a bet spread of a million to one you can get a positive EV even with no splitting or doubling. You just need a few billion in bankroll and plenty of patience.
The overall win rate for flat betting was -2.362% for PowerSim and -2.360% for BJStrike, with a StErr of .0010%
For the next several sims I will adopt a different format. Frequencies at RCs or TCs other than those shown all match perfectly except when there's a special note. All sims are 10 billion rounds. The first two lines are for the same sims shown above.
"It is an old maxim of mine that when you have excluded the impossible, whatever remains, however improbable, must be the truth." —Sherlock Holmes
The Piece De Resistance
As an encore, KarlD decided to adjust his empirical fudge factor to add an option in BJStrike enabling my exact TC conversion method. I use this in PowerSim frankly because it's easy to understand and easy to program. The exact method of TC conversion can add or detract a few hundredths of a percent at most, so I decided to opt for clarity and simplicity. Exact TC conversion will be an option in an upgrade to BJStrike to be announced shortly. Here then, hot off the presses, is a comparison of PowerSim vs. BJStrike, with 6 decks dealt face up to 4 players, to a shufflepoint of 260, S17, DOA, split up to a max of 4 hands, DAS, No RSA, aces receive one card only, the I18 outlined above, and insurance at
The Win Rates are for player number 4. Every single TC frequency rounds to the same hundredth of a percent except TC = -5, which is not a concern, since PowerSim is within 0.0002% of rounding to match BJStrike perfectly. If you compare the win rates with the Standard errors (use the formula (BJSEv - PSEv * 100)/STDERR) you will find that WRs for a given count are generally within 1 or 2 STDERRs of each other. There is one outlier at TC = +11 where the WRs are 3.3 to 3.4 STDERRs apart, but this should be expected when comparing so many win rates. (Note that it's slightly different when you are comparing to a known theoretical mean. For example, the true mean is probably somewhere in between the two +11 WRs, with each approximately 1.7 Standard deviations away.)
Final verdict: the 4 player average win rate for BJStrike came in at +1.047%, and for PowerSim, +1.049%. The standard error on those numbers is 0.0014%. Average Scores were 29.83 and 29.89 respectively. And that, fellow number crunchers, doesn't happen by accident. A more aggressive 1-12 spread yields a SCORE in the $34-$35 range.
Only The Beginning
"But he had not that supreme gift of the artist, the knowledge of when to stop." —Sherlock Homes
The above represents only about half the sims performed during this quest for Inspector #9's seal of approval. There's nothing to hide, I just wanted to spare you the agony of looking down blind alleys, and spare myself the agony of formatting all that chaff among the wheat.
With the resources now at my command via BJStrike, I can quickly run checks on the European no hole card rule, many different penetrations, a face-down deal option, and practically any other rule or procedure you could imagine. If I find anything serious (any persistent discrepancy of 0.005% or more in win rates, or 3 StErrs in a normal statistic) I will do my best to locate the source of the problem. And should I fail, Dr. Watson, I shall report it to the ever expanding base of loyal PowerSim users so we can track it down together. I would like to acknowledge KarlD's help in patiently responding to questions, and getting the necessary tools into my hands.
Or as another talented programmer once said of PowerSim, "... it's sort of close."♠
[Editor's note: I would like to thank Karl D., creator of the excellent BJ Strike blackjack simulation software, for his generosity and help in the research for this article. —Arnold Snyder]
|© 2004-2005 Blackjack Forum Online, All Rights Reserved|