What Would It Take???
-
- Forum Rookie
- Posts: 46
- Joined: Thu Jun 02, 2011 3:02 am
Re: What Would It Take???
If you guys want to measure the "final hand" to test for non random, then how about simply keeping track of one card royal flush draws. It's an easy play to recognize and its played the same way in every video poker game except when a wild card is in the original dea. would finding out if the royals do complete one out of 47 times answer the question? Frank if you still have questions about the fifth card flipover the video interview sums up the issue as do the several articles already cited. Im surprised he didnt explain this to you when you met with him?
-
- VP Veteran
- Posts: 762
- Joined: Wed Feb 02, 2011 6:59 pm
If you guys want to measure the "final hand" to test for non random, then how about simply keeping track of one card royal flush draws. It's an easy play to recognize and its played the same way in every video poker game except when a wild card is in the original dea. would finding out if the royals do complete one out of 47 times answer the question? Frank if you still have questions about the fifth card flipover the video interview sums up the issue as do the several articles already cited. I'm surprised he didn't explain this to you when you met with him?I will indeed add a check for 4RF draws. And most of the other 1 card draws as well.I just got an email that explained what a 5th card-flip over is, and I'll try to add something to check for it if it's not too hard. Thanks!This would have been doubly impossible if not for everyones contributions, since I had no idea what people were concerned about.~FK
-
- Forum Rookie
- Posts: 46
- Joined: Thu Jun 02, 2011 3:02 am
Personally, I have no interest in fifth card flipovers. I am more concerned about one card royal draws really being 1/47.
-
- Video Poker Master
- Posts: 1850
- Joined: Mon Sep 11, 2006 4:02 am
For the raw dealt and drawn card analysis, I was planning on using 10 sets of 104 samples. 104 because it would give an exact mode of 2 occurrences in 104 hands for each individual card.What level of confidence would this yield, and do you like the idea of 104, rather than 100 or some other number.I know that 10 samples of 104 hands is "better" than a single sample of 1040 hands. I do not know how much better, or if this is optimal. Perhaps 20 samples of 52 hands would be better yet.Can you answer this?Actually Frank, if I understand you, in each group of 104 hands, you should expect 10 occurrences of each individual card, not 2. I assume by hand that you mean a 5-card hand, so 104 hands would have a total of 104 x 5 = 520 cards. With 10 samples of 104 hands each, you can expect 100 occurrences of each individual card.If instead you simply are going to sample the first cards in each hand (or the second etc.), then you would expect only 2 of each type as you stated.If your plan is to track 52 frequencies of the 52 different cards, and test the null hypothesis that the probability of each card occurring is equal vs. the alternative hypothesis that the probabilities of at least 2 of the cards are unequal, the general test for such sampling, suggested by Karl Pearson in 1900, tracks a statistic that is the sum of 52 numbers that are fractions. The number for each card would be: (A - E)^2 / E,where E is the expected number of cards with a specific rank and suit that one would anticipate in a sample of size N, and A is the actual number of cards with that specific rank and suit.So the test statistic would be the Sum for i = 1 to 52 of (Ai -Ei)^2/ Ei.This could be used to test either drawn or dealt cards (or both together) provided that you calculated the expected number appropriately.The test statistic from these tests, under the null distribution, would be distributed approximately as a chi-squared random variable with 51 degrees of freedom. If the test statistic is too high, one would reject the null hypothesis. If the sum of those 52 ratios were > 68.67, you could reject the null hypothesis with a statement such as "There is less than a 5% chance that such a sample of actual numbers could occur if the null hypothesis were true. You can easily find the 68.67 number by entering the following formula into an Excel cell: "=CHIINV(0.05,51)". The corresponding numbers for a test of size 1% is 77.39 ["=CHIINV(0.01,51)"] and 87.97 ["=CHIINV(0.001,51)"]. These numbers (68.67, 77.39, 87.97) are typically called critical values of the test.In answering your question, "What level of confidence would this yield, and do you like the idea of 104, rather than 100 or some other number," as you know there are two types of errors. The Type I error is specified regardless of sample size based on the first small probability number filled in above. Use of one of the 3 critical values indicates the probability of making a Type I errors of 5%, 1%, or 0.1%, respectively. This is the error of rejecting the null hypothesis when it is in fact true (It is often referred to in textbooks by the Greek letter "alpha," such as alpha = 0.05 or alpha = 0.01). Sometimes, you just get a bad sample and there is nothing wrong with the null hypothesis at all. I will talk about Type II error's later (when we discuss power).One typical other concern of the test statistic NOT being distributed appropriately that is alleviated by your suggested sample size is that the expected numbers in each cell are >= 5.0. Studies indicate that satisfying this condition for all cells would allow for the test statistic to be a very good approximation to a chi-squared random variable. This means that when you think you are doing a test with a 5%, 1%, or 0.1% test, you really are. This means that there are no unknown "size" biases because your test statistic is distributed differently. So any sample in which your expected number in each cell is greater than 10 should do the trick with the following important caveat: So far we have said nothing about what is called the "power" of this test, which we will talk about in a second.Choosing a number like 104 instead of 100 for dealt hands or cards (or a number like 94 instead of 100 for drawn cards) will allow your expected number of cards to always be an integer. Since I imagine you will be using a computer to sum up the 52 fractions comprising your test statistic rather than doing hand calculations, this is of no particular import. But I suppose it might make checking (or debugging) any calculations easier. And there certainly is no harm a priori in making the calculations easier.In order to choose the proper sample size, you have to decide what level of probability inequality you want to be able to detect. A larger number decreases your chances of making a Type II error (which is often denoted by the Greek letter "beta"). This is often not done, but if it isn't you know nothing about the probability that you will accept the null hypothesis when it is in fact false. When stated, it should be obvious that it is easier to detect a problem with a false null hypothesis of equal probabilities (and reject it) if two or more of the actual probabilities are far away from those stated by the null. If the actual probability of the ace of spades is actually 10% instead of 1/52 (with the other probabilities adjusted so the sum of all of them is still 100%), a test with any given sample size will be much more likely to reject the null hypothesis than if the probability of the ace of spades is actually 1/50 instead of 1/52.With too small a sample size, you will be unable to reject the null hypothesis even if it is false if the deviation of the frequencies is too small. In order to give you advice as to what sample size is appropriate, you have to answer the question, how big of a deviation do you want to detect, and how often are you willing for that deviation to go undetected. You cannot determine "beta" unless you identify specifically which alternative hypothesis that you are concerned about. In this case, it would require a complete specification of a set of alternative probabilities (adding to one) for the 52 individual cards. You can ignore this of course, but then you will never be able to determine the probability that your test was unable to detect a particular departure from your null hypothesis.But I warn you, for very small deviations, any test with a reasonable sample size will have very little power (the probability to correctly reject a false null hypothesis). It is for this reason that I have asked you to be careful with statements such as "I'm 99.9% certain that machines are random," even with sample sizes in the tens of million hands. You may not have been able to reject the null hypothesis, but it is unlikely that you would be able to differentiate exactly equal probabilities from probabilities that were off by a few ten-thousandths of 1%. It is for this reason that it is easier for a test to reject a claim of a specific probability situation "such as a 40% rank duplication rate" in 1-card draws than to accept a situation such as "all remaining cards will occur with exactly equal probability."Power calculations are more involved than what I suspect you might have done before in most statistics classes and different for any specific alternative that you would suggest, so I'll have to wait for more specificity from you about the deviations you want to be able to detect before proceeding further here. Typically, one does simulations of what the test statistic would be given a particular alternative representation of probabilities and calculates what proportion of the simulations' test statistics would be above the critical value. I can show you how to do that for any given alternative hypothesis.ANOTHER CAVEAT: The test outlined above is technically for use when all cards have an equal chance of occurring at any time with replacement, so such tests would be correct if we were to select each card in the hand from a different 52-card deck. Obviously that is not the case with video poker where each card in a hand is different from the other. This means that if we were to focus on only the first card of each hand (or only the second card, etc.), the inferences from such a test would be exactly appropriate. I suspect that the results will not be biased greatly with measuring cards drawn without replacement, but since this will automatically create a more even distribution of cards than one that is drawn with replacement, we will be less likely to reject the null hypothesis. This means that in the event that we DO reject, we will more certain that the events did NOT happen simply by chance than the critical value would suggest. It may be very likely that someone has done studies to determine the level of bias in such situations, but I'm not aware of such a study.After stating this, I can say with some degree of expertise that many studies that have been done in lots of disciplines based on these types of tests are in fact biased towards not rejecting the null hypothesis more often than the stated percentage at which they were testing. For future reference this is called a size bias, and similar to the power calculations suggested above, determining the level of size bias requires a different set of simulations. Finally, as to splitting the sample up into 10 piles, I have already answered that in an earlier post. I really don't know why you would do that. There is no harm in splitting here, but I don't think there is any benefit either. I will repeat what I already said in blue font below. Another guess as to what you were thinking about might be in using a simulation of a less-than-perfect RNG. Maybe drawing a sample with 10% of the sample drawn from starting in each of 10 different positions might be more uniform than drawing 100% of the sample starting in 1 position, but I would really need to see your reference to figure out why you think this would be better. I might be missing something obvious here, but nothing really comes to mind.Continued wishes of good luck, NewI would need more specifics on this reference. My opinion is that
unless there were concerns for errors in the larger sample size, it does
not make sense that simply dividing a sample up randomly into 10 piles
would give you more information UNLESS there was a specific reason to
divide them based on other characteristics.I could hazard a
guess that a statistic from what is called a "stratified" sample would
be better for inference (in terms of having a lower variance), but this
only works if the different "strata" themselves have different
characteristics than the overall population. For example, in projecting
election results, if we can divide a sample of 1000 into groups like
male republicans, female independents, etc., we can get by with a
smaller sample size for a given level of precision (or a higher level of
precision for a given sample size). The reason it works for stratified
sampling is because of some other knowledge we might know about the
size of the strata in the larger population and an assumption that
members have more in common with others in their own stratum than they
do in the general population.Having said that, I don't see a lot
of application of stratified sampling here. I suppose that it would
make some sense to keep information separate by casino or machine if one
has some reason to believe that there would be differences. In such
cases, this would likely require more sampling solely from the casinos
suspected of having non-normal outcomes.
-
- Video Poker Master
- Posts: 1850
- Joined: Mon Sep 11, 2006 4:02 am
Here is some more guidance on power and Type II errors. If it is not clear yet, power is the complement of the probability of a Type II error or 1 - Beta.Lets consider three different scenarios alternative to the null hypothesis that each card has a 1/52 probability of being chosen as any one of the drawn cards in a particular hand (or 5/52 chance of being in a given 5-card hand).Null ScenarioAll 52 cards have a 1/52 chance of being chosen.Alternative Scenario 126 cards (let's call them red cards) each have instead a 3/104 chance of being chosen (instead of 2/104 = 1/52), while the other 26 cards have a 1/104 chance of being chosen.Alternative Scenario 213 cards (let's call them spades) each have a 103/5200 chance of being chosen (instead of 100/5200 = 1/52), while the other 39 cards have a 99/5200 chance of being chosen.Alternative Scenario 32 cards (let's call them one-eyed jacks) each have a 9/520 chance of being chosen (instead of 10/520 = 1/52), while the other 50 cards have a 251/13000 chance of being chosen.We certainly could have a fourth scenario with only one card varying or any number of other combinations.As we increase the sample size, we will have an increasing chance of detecting any of the alternative scenarios, but for any given sample size, we will have a much greater chance of detecting the first alternative than the other two.There are smaller deviations in Scenario 3 for 50 of the cards than there are in Scenario 2, but, especially to shadowman since a smaller probability of one-eyed jacks would significantly affect payout on his favorite game, the deviations in Scenario 3 might be of more importance.Without doing the simulations, I really don't know what the likelihood of detecting these smaller deviations with any test is. What I do know is that as delta, defined below, gets closer to zero, the ability of a test to make a Type II error gets larger and approaches 1 - alpha, recalling from before that alpha has been "pre-chosen" to be something like 5%, 1%, or 0.1%.My suggestion would be to look at Scenarios of each type for a given sample size for a given deviation (delta) and detemine the power of the test. If we are satisfied with that level of power (detection ability), we stick with the sample size; if not we increase it.I would define delta as the percentage deviation from the null for the first type of cards listed in each scenario.For Scenario 1, delta = (3/104 - 1/52)/(1/52) = 50%.For Scenario 2, delta = (103/5200 - 1/52)/(1/52) = 3%.For Scenario 3, delta = (9/520 - 1/52)/(1/52) = - 10%
-
- VP Veteran
- Posts: 762
- Joined: Wed Feb 02, 2011 6:59 pm
Yes of course 10 not 2, sorry. I'll be busy today and tomorrow and will do a summary post of the current working idea on Friday. Before I type it up I'll check here to include anything anyone comes up with while I'm away.I believe I can start coding next week.~FK
-
- VP Veteran
- Posts: 762
- Joined: Wed Feb 02, 2011 6:59 pm
One more quick post before leaving home for the day. I forgot to say thank you, especially to new2vp that took all that time to share the math I needed to complete this and the "how to check" part. Thank you very much.Also thanks to everyone else for providing the "what to check for". Again indispensable.And a special thanks to everyone for keeping the arguments out of this thread and focusing on the task. This is in my mind is a VP forum at its best. I could not have even attempted it without participation on both sides.~FK
-
- VP Veteran
- Posts: 762
- Joined: Wed Feb 02, 2011 6:59 pm
Video Poker Hypothesis
Tester and Confidence Quantifier
Basic Concept: The
Utility will include three basic tests, each of which will be
independent and optional. The user will be able to use some or all of
them, if they so choose. In addition to the three basic tests there
will be some optional tests and a place for advanced user-defined
tests. The utility will include printable sheets for casino record
keeping and tallying. Optionally, video of your play can be used to
input at home, but the paper version will be included. The utility
will be written in MS Excel and will be completely free, if the code
translates then an OpenOffice version will also be made available for
free for those that do not own MS Excel.
The Three Main Tests
A Test for Random
Deal and Random Draw. All five
dealt cards will be recorded, as will the cards drawn. The dealt and
drawn cards will be analyzed for frequency of occurrence both
separately and together. Since we are testing for the frequency of
occurrence of single cards, each hand gives us 5 or more trials and
minimizes the need for an impossibly large sample. This test is
designed only for occasional or one time use. It's not something you
are going to be doing for the rest of your VP career.
The Made Hand Test:
This test will be ongoing and
of indefinite duration. You may decide to track all your straights
and higher for the rest of your life. It will allow you to track as
much or as little as you want. If you only wanted to track Royals
Flushes you could. Naturally the more things you track and the less
rare they are the higher the confidence level will be. The problem
with this test is it is based on your total number of hands and your
strategy. It is therefore subject to error and is dependent on what
your strategy is and how accurately you play. It might be amusing as
a fun thing to do, but it is far too error prone to be good science.
Strategy Independent
Frequency Test: This is
similar to the made hand test in the sense that you are recording
hands like Flushes, Full Houses, 4K, SF, RF. Where it differs is
that rather than comparing your total hands to the number of paying
hands, we are instead looking only at the frequency of the times you
draw. The test will also include a dealt pat hand test as compared
to total hands played, but again that is not subject to strategy
difference. You'll be able to check for as little or as much as you
like from the 3K on up to RF and it's designed for lifetime use or
short term use. Obviously, as the sample size increases over time
the confidence level will rise.
The
utility will be designed non-partisan & side-neutral. That is to
say, you can use it to test a hypothesis that machines are fair and
your results are completely normal. Or, you could use it to test a
hypothesis that machines are unfair and your results are abnormal.
Most importantly, it will tell you your confidence level based on
your sample size. ~FK
-
- VP Veteran
- Posts: 762
- Joined: Wed Feb 02, 2011 6:59 pm
Oh one more thing: The utility will be completely open source and the method as well as the utility will be published and should always be consider a work in progress. As people make suggestions of how to do things better, it will be reviewed updated and new versions will be made available. The new versions will be designed to work with your old data, so nothing will be lost and it will be totally backwards compatible.~FK
-
- Video Poker Master
- Posts: 1850
- Joined: Mon Sep 11, 2006 4:02 am
It is important to note that for any statistical test to be valid, it is to be completed AFTER you decide the hypothesis and not after any given run of hands, whether particularly good, particularly bad, or even an average group of hands.If you suspect a given machine or a given casino of "non-random" behavior, it is incorrect to use the data that caused you to suspect it in the first place.The rule is: Once you suspect something, test forward (into the future). Any number of things will happen with randomness, some good, some bad. Just because something bad (or good) happened in the past, is not sufficient to determine statistical significance.There was a very good explanation by a poster on vpFREE a couple days ago, but now that the administrator there has decided to move the thread due to comments disruptive to their rules, I can't find it.Frank, my suggestion would be that you include language that you are comfortable with indicating this requirement for validity. I suppose there actual examples might illustrate this concept better than a sentence or two of statistical jargon.