Evaluation & Results: Can Machine Learning Beat Lottery Randomness?
This section presents the results of a structured evaluation designed to understand how machine learning performs in a system designed to be random.
Using the Kerala State Lottery as a controlled environment, multiple algorithmic approaches were tested under real-world conditions. The focus is on measuring predictive performance and evaluating the practical relevance of any detected signal.
Experimental Setup
- Evaluation period: 91 days (out-of-sample)
- Total observations: 33,000+
- Algorithms tested: 36 (6 families × 6 time windows)
- Evaluation method: Book-level simulation
- Ticket price: ₹50
- Statistical testing: One-sided binomial with Bonferroni correction
This setup ensures results reflect real-world performance without data leakage or overfitting.
Performance Metrics
Hit Rate
Measures how often predicted numbers match actual outcomes. Baseline (random): 10%
Lift
Represents improvement over random selection. Example: 1.15 = 15% improvement
Statistical Significance
Evaluates whether improvements are likely due to chance. Adjusted for multiple comparisons.
Return on Investment (ROI)
Simulates real-world financial outcomes to assess practical impact.
Results Overview
- Best hit rate: 11.55%
- Random baseline: 10%
- Lift: 1.154
- Statistical significance: p < 10⁻⁶
The models demonstrate measurable improvement over random selection, indicating the presence of detectable statistical signals.
Model Performance Insights
Positional Digit Analysis
- Consistent performance across time windows
- Suggests minor structural variation in digit distribution
Composite Scoring
- Highest peak performance
- Combines multiple signals effectively
Short-Term Models (15–30 days)
- Stronger performance than long-term models
- Indicates signals may be time-sensitive
Recency-Based Models
- Based on recent occurrence patterns
- Did not show consistent improvement
Statistical Significance
- 17 out of 36 models achieved statistically significant results
- All positional digit models were significant
- Several composite models passed strict thresholds
This confirms the presence of weak but measurable signals within the system.
ROI Analysis
- ROI range: −54% to −85%
Example:
- Random expected return: ~₹20 per ₹50 ticket
- Best model return: ~₹23 per ₹50 ticket
Lottery participation is not structured as an investment approach aimed at generating consistent returns like trading or financial assets. It is more appropriately viewed as an entertainment-based activity, where outcomes are governed by chance rather than predictable return patterns.
From a practical perspective, participation is best approached with moderation, using only discretionary or excess funds allocated for entertainment purposes. This study does not promote or encourage gambling behavior, but rather provides a structured evaluation of machine learning performance within such systems.
Understanding Practical Constraints
Prize Distribution
High-value prizes are rare, while most wins are lower-value.
Execution Constraints
Practical limitations affect how strategies can be applied in real-world scenarios.
System Structure
Lottery systems are designed with predefined payout structures that influence outcomes.
Key Insight: Detectable vs Practical Signal
- Signals can be identified and measured
- Their real-world impact depends on multiple external factors
Final Takeaway
- Machine learning can outperform random selection under controlled evaluation
- Performance improvements are measurable and statistically valid
- Real-world outcomes depend on system-level constraints
💬 Have thoughts or feedback? Message me on Instagram @iamniteeshk
📺 Watch more insights on my YouTube channel @iamnkcom

