lottodata is an R data package designed to house data sets that can be easily accessible for me and everyone else working on the project. Check out this shiny app that uses the jackpot_size data set from this package.

Installation

You can install lottodata via GitHub:

devtools::install_github("andr3wli/lottodata")

Data

The package currently contains the following data sets:

Date set name Source About Description Size
jackpot_size Open Source Framework 430,579 rows &
10 columns
Jackpot size ($) and lotto ticket sales 1.3 MB
lotto_demographics Open Source Framework 96 rows &
7 coloumns
Demographic information about residents in
Ontario, Canada
5.4 MB

jackpot_size

The variables included in the data set:

Variable Description Type of variable
zip_code The first 3 digits of postal code (geographical region) string
start_date The start of the sales date (year-month-day format) date
end_date The end of the sales date (year-month-day format) date
game The specific lottery game (one of: Lotto Max, Lotto 649, Lottario) string
ticket_sales Number of tickets sold integer
net_sales The total cad dollar amount of sales integer
jackpot_size The jackpot size in cad dollars integer
year Year integer
month Month integer
day Day integer

lotto demographics

The variables included in the data set:

Variable Description Type of variable
zip_code The first 3 digits of postal code (geographical region) string
geo_id Geography ID integer
income Per capita income levels integer
education Highest completed level of education for the population float
mbsa Proportion of time spent in white collar employment. White collar
employment is defined as the proportion of residents aged 15 or
greater employed in management, business finance and administration,
health, education, law, social community and government services,
art, culture, natural and applied sciences and related occupations,
according to the National Occupational Classification
float
ses SES was calculated via takling the sum of the Z-scores of it’s
per-capita income, years of education, and proportion of white-collar
workers
float
description Describes where the location is in natural language string

Example

Suppose you want to look at the game Lottario more in 2014:

library(lottodata)
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

# What is the yearly spending for the Lottario in zone M1B in 2012?
jackpot_size %>%
  filter(year == 2012 & game == "Lottario" & zip_code == "M1B") %>%
  head()
#> # A tibble: 6 x 10
#>   zip_code start_date end_date   game  ticket_sales net_sales jackpot_size  year
#>   <chr>    <date>     <date>     <chr>        <dbl>     <dbl>        <dbl> <dbl>
#> 1 M1B      2012-01-01 2012-01-07 Lott…           75       233       730000  2012
#> 2 M1B      2012-01-02 2012-01-07 Lott…          167       516       730000  2012
#> 3 M1B      2012-01-03 2012-01-07 Lott…          168       466       730000  2012
#> 4 M1B      2012-01-04 2012-01-07 Lott…          274       841       730000  2012
#> 5 M1B      2012-01-05 2012-01-07 Lott…          195       558       730000  2012
#> 6 M1B      2012-01-06 2012-01-07 Lott…          451      1469       730000  2012
#> # … with 2 more variables: month <dbl>, day <int>

ggplot2 example

library(ggplot2)
theme_set(theme_classic())

jackpot_data <- jackpot_size
jackpot_plot <- jackpot_data %>%
  filter(year == 2012 & game == "Lottario" & zip_code == "M1B") %>%
  ggplot(aes(day, ticket_sales, fill = as.factor(month))) +
  geom_col() +
  facet_wrap(~month, labeller = labeller(month = 
                                           c("1" = "January", "2" = "February", "3" = "March", "4" = "April", "5" = "May",
                                             "6" = "June", "7" = "July", "8" = "August", "9" = "September", "10" = "October",
                                             "11" = "November", "12" = "December"))) +
  labs(x = "Days", y = "# of tickets sold", title = "Lottario ticket salees in 2012") +
  theme(legend.position = "none") +
  scale_fill_manual(values = c("#a6cee3","#1f78b4","#b2df8a","#33a02c","#fb9a99","#e31a1c",
                               "#fdbf6f","#ff7f00","#cab2d6","#6a3d9a","#ffff99","#b15928"))

jackpot_plot

Example exploratory data analysis:

# EDA via base R

jackpot_eda <- function(x){
  hist(x, col = rainbow(30))
  plot(x)
  plot(density(x))
  data.frame(min = min(x),
             median = median(x),
             mean = mean(x),
             max = max(x),
             sd = sd(x),
             range =max(x) - min(x) )
}

jackpot_eda(jackpot_size$ticket_sales)

#>   min median     mean   max       sd range
#> 1   1    217 485.0569 17885 729.9844 17884

Acknowledgement

We thank Dr. Ross Otto from Mcgill University for sharing these data sets on Open Source Framework. This project is being conducted with Dr. Luke Clark at the Centre for Gambling Research at UBC.

Code of Conduct

Please note that the lottodata project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.