Methodology: How Machine Learning Was Applied to Lottery Data
This section explains how machine learning techniques were applied to a system designed to be random. The objective is to evaluate whether any measurable signal exists within the data using structured and controlled analysis.
Problem Definition
The problem is defined as follows:
- Input: Historical lottery draw results
- Output: A ranked list of possible number endings (0000–9999)
- Goal: Identify whether top-ranked numbers show meaningful deviation from random expectation
Each prediction is generated using only past data. No future information is included at any stage of the process.
Data Description
The study uses publicly available Kerala State Lottery results.
- Outcome space: 0000 to 9999 (10,000 possible endings)
- Daily draw system
- Multiple prize tiers
- Winning numbers determined primarily by last four digits
Under ideal conditions, all outcomes are expected to follow a uniform distribution.
Algorithm Families
Six different algorithmic approaches were used to capture potential patterns:
1. Positional Digit Analysis
Analyzes the frequency of digits in specific positions (hundreds, tens, ones) and combines them to generate rankings.
2. Frequency-Based Scoring
Ranks numbers based on how frequently they appeared in recent history, testing short-term distribution patterns.
3. Gaussian Mixture Model (GMM)
Uses probabilistic clustering to identify structural patterns in higher-tier winning numbers.
4. Recency-Based Ranking
Assigns higher importance to numbers based on their recent occurrence patterns.
5. Composite Scoring
Combines multiple signals such as frequency, structure, and positional data into a unified scoring model.
6. Ensemble Method
Integrates outputs from multiple models using rank aggregation techniques to improve stability.
Time Window Strategy
Each algorithm was tested across multiple historical windows:
- 15 draws
- 30 draws
- 60 draws
- 90 draws
- 180 draws
- 365 draws
This enables comparison between short-term and long-term pattern behavior.
Ranking Process
Each algorithm assigns a score to all possible number endings.
- Numbers are ranked from highest to lowest score
- Top-ranked numbers are selected
- Selections are compared against actual results
Performance is evaluated relative to a random baseline.
Evaluation Design
The evaluation framework includes:
- Hit rate (accuracy compared to random selection)
- Lift (relative improvement over baseline)
- Statistical significance
- Return on investment (ROI)
Full evaluation results are available here:
Key Constraint: No Data Leakage
This ensures that the evaluation reflects real predictive capability under practical conditions.
Final Note
This methodology is designed to explore the limits of machine learning in near-random systems. The focus is on structured analysis and understanding how statistical signals behave under real-world constraints.
💬 Have thoughts or feedback? Message me on Instagram @iamniteeshk
📺 Watch more insights on my YouTube channel @iamnkcom

