

The get access to this data set, simply, install the lottodata package via GitHub and type jackpot_size into R.


The variables included in the data set:

Variable Description Type of variable
zip_code The first 3 digits of postal code (geographical region) string
start_date The start of the sales date (year-month-day format) date
end_date The end of the sales date (year-month-day format) date
game The specific lottery game (one of: Lotto Max, Lotto 649, Lottario) string
ticket_sales Number of tickets sold integer
net_sales The total cad dollar amount of sales integer
jackpot_size The jackpot size in cad dollars integer
year Year integer
month Month integer
day Day integer


Suppose you want to look at the game Lottario more in 2014:

# What is the yearly spending for the Lottario in zone M1B in 2012?
jackpot_size %>%
  filter(year == 2012 & game == "Lottario" & zip_code == "M1B") %>%
#> # A tibble: 6 x 10
#>   zip_code start_date end_date   game  ticket_sales net_sales jackpot_size  year
#>   <chr>    <date>     <date>     <chr>        <dbl>     <dbl>        <dbl> <dbl>
#> 1 M1B      2012-01-01 2012-01-07 Lott…           75       233       730000  2012
#> 2 M1B      2012-01-02 2012-01-07 Lott…          167       516       730000  2012
#> 3 M1B      2012-01-03 2012-01-07 Lott…          168       466       730000  2012
#> 4 M1B      2012-01-04 2012-01-07 Lott…          274       841       730000  2012
#> 5 M1B      2012-01-05 2012-01-07 Lott…          195       558       730000  2012
#> 6 M1B      2012-01-06 2012-01-07 Lott…          451      1469       730000  2012
#> # … with 2 more variables: month <dbl>, day <int>

ggplot2 example


jackpot_data <- jackpot_size
jackpot_plot <- jackpot_data %>%
  filter(year == 2012 & game == "Lottario" & zip_code == "M1B") %>%
  ggplot(aes(day, ticket_sales, fill = as.factor(month))) +
  geom_col() +
  facet_wrap(~month, labeller = labeller(month = 
                                           c("1" = "January", "2" = "February", "3" = "March", "4" = "April", "5" = "May",
                                             "6" = "June", "7" = "July", "8" = "August", "9" = "September", "10" = "October",
                                             "11" = "November", "12" = "December"))) +
  labs(x = "Days", y = "# of tickets sold", title = "Lottario ticket salees in 2012") +
  theme(legend.position = "none") +
  scale_fill_manual(values = c("#a6cee3","#1f78b4","#b2df8a","#33a02c","#fb9a99","#e31a1c",


Example exploratory data analysis:

# EDA via base R

jackpot_eda <- function(x){
  hist(x, col = rainbow(30))
  data.frame(min = min(x),
             median = median(x),
             mean = mean(x),
             max = max(x),
             sd = sd(x),
             range =max(x) - min(x) )


#>   min median     mean   max       sd range
#> 1   1    217 485.0569 17885 729.9844 17884