Evaluation & Results: Can Machine Learning Beat Lottery Randomness?

Evaluation & Results: Can Machine Learning Beat Lottery Randomness?

This section presents the results of a structured evaluation designed to understand how machine learning performs in a system designed to be random.

Can machine learning detect meaningful patterns in a system designed to be random?

Using the Kerala State Lottery as a controlled environment, multiple algorithmic approaches were tested under real-world conditions. The focus is on measuring predictive performance and evaluating the practical relevance of any detected signal.

Experimental Setup

  • Evaluation period: 91 days (out-of-sample)
  • Total observations: 33,000+
  • Algorithms tested: 36 (6 families × 6 time windows)
  • Evaluation method: Book-level simulation
  • Ticket price: ₹50
  • Statistical testing: One-sided binomial with Bonferroni correction

This setup ensures results reflect real-world performance without data leakage or overfitting.

Performance Metrics

Hit Rate

Measures how often predicted numbers match actual outcomes. Baseline (random): 10%

Lift

Represents improvement over random selection. Example: 1.15 = 15% improvement

Statistical Significance

Evaluates whether improvements are likely due to chance. Adjusted for multiple comparisons.

Return on Investment (ROI)

Simulates real-world financial outcomes to assess practical impact.

Results Overview

  • Best hit rate: 11.55%
  • Random baseline: 10%
  • Lift: 1.154
  • Statistical significance: p < 10⁻⁶

The models demonstrate measurable improvement over random selection, indicating the presence of detectable statistical signals.

Model Performance Insights

Positional Digit Analysis

  • Consistent performance across time windows
  • Suggests minor structural variation in digit distribution

Composite Scoring

  • Highest peak performance
  • Combines multiple signals effectively

Short-Term Models (15–30 days)

  • Stronger performance than long-term models
  • Indicates signals may be time-sensitive

Recency-Based Models

  • Based on recent occurrence patterns
  • Did not show consistent improvement
Observed signals vary in strength and consistency depending on the modeling approach and time window.

Statistical Significance

  • 17 out of 36 models achieved statistically significant results
  • All positional digit models were significant
  • Several composite models passed strict thresholds

This confirms the presence of weak but measurable signals within the system.

ROI Analysis

  • ROI range: −54% to −85%

Example:

  • Random expected return: ~₹20 per ₹50 ticket
  • Best model return: ~₹23 per ₹50 ticket
Machine learning models can improve selection efficiency within the dataset, but overall financial outcomes are influenced by the structural design of the lottery system, including prize distribution and payout mechanisms.

Lottery participation is not structured as an investment approach aimed at generating consistent returns like trading or financial assets. It is more appropriately viewed as an entertainment-based activity, where outcomes are governed by chance rather than predictable return patterns.

From a practical perspective, participation is best approached with moderation, using only discretionary or excess funds allocated for entertainment purposes. This study does not promote or encourage gambling behavior, but rather provides a structured evaluation of machine learning performance within such systems.

Understanding Practical Constraints

Prize Distribution

High-value prizes are rare, while most wins are lower-value.

Execution Constraints

Practical limitations affect how strategies can be applied in real-world scenarios.

System Structure

Lottery systems are designed with predefined payout structures that influence outcomes.

Key Insight: Detectable vs Practical Signal

A statistically detectable signal does not always translate directly into practical advantage.
  • Signals can be identified and measured
  • Their real-world impact depends on multiple external factors

Final Takeaway

  • Machine learning can outperform random selection under controlled evaluation
  • Performance improvements are measurable and statistically valid
  • Real-world outcomes depend on system-level constraints
This study highlights the difference between statistical insight and practical application in near-random systems.

💬 Have thoughts or feedback? Message me on Instagram @iamniteeshk

📺 Watch more insights on my YouTube channel @iamnkcom