lottodata is an R data package designed to house data sets that can be easily accessible for me and everyone else working on the project. Check out this shiny app that uses the jackpot_size data set from this package.
The package currently contains the following data sets:
| Date set name | Source | About | Description | Size | 
|---|---|---|---|---|
jackpot_size | 
Open Source Framework | 430,579 rows & 10 columns  | 
Jackpot size ($) and lotto ticket sales | 1.3 MB | 
lotto_demographics | 
Open Source Framework | 96 rows & 7 coloumns  | 
Demographic information about residents in  Ontario, Canada  | 
5.4 MB | 
jackpot_size
The variables included in the data set:
| Variable | Description | Type of variable | 
|---|---|---|
zip_code | 
The first 3 digits of postal code (geographical region) | string | 
start_date | 
The start of the sales date (year-month-day format) | date | 
end_date | 
The end of the sales date (year-month-day format) | date | 
game | 
The specific lottery game (one of: Lotto Max, Lotto 649, Lottario) | string | 
ticket_sales | 
Number of tickets sold | integer | 
net_sales | 
The total cad dollar amount of sales | integer | 
jackpot_size | 
The jackpot size in cad dollars | integer | 
year | 
Year | integer | 
month | 
Month | integer | 
day | 
Day | integer | 
lotto demographics
The variables included in the data set:
| Variable | Description | Type of variable | 
|---|---|---|
zip_code | 
The first 3 digits of postal code (geographical region) | string | 
geo_id | 
Geography ID | integer | 
income | 
Per capita income levels | integer | 
education | 
Highest completed level of education for the population | float | 
mbsa | 
Proportion of time spent in white collar employment. White collar  employment is defined as the proportion of residents aged 15 or greater employed in management, business finance and administration, health, education, law, social community and government services, art, culture, natural and applied sciences and related occupations, according to the National Occupational Classification  | 
float | 
ses | 
SES was calculated via takling the sum of the Z-scores of it’s  per-capita income, years of education, and proportion of white-collar workers  | 
float | 
description | 
Describes where the location is in natural language | string | 
Suppose you want to look at the game Lottario more in 2014:
library(lottodata)
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
# What is the yearly spending for the Lottario in zone M1B in 2012?
jackpot_size %>%
  filter(year == 2012 & game == "Lottario" & zip_code == "M1B") %>%
  head()
#> # A tibble: 6 x 10
#>   zip_code start_date end_date   game  ticket_sales net_sales jackpot_size  year
#>   <chr>    <date>     <date>     <chr>        <dbl>     <dbl>        <dbl> <dbl>
#> 1 M1B      2012-01-01 2012-01-07 Lott…           75       233       730000  2012
#> 2 M1B      2012-01-02 2012-01-07 Lott…          167       516       730000  2012
#> 3 M1B      2012-01-03 2012-01-07 Lott…          168       466       730000  2012
#> 4 M1B      2012-01-04 2012-01-07 Lott…          274       841       730000  2012
#> 5 M1B      2012-01-05 2012-01-07 Lott…          195       558       730000  2012
#> 6 M1B      2012-01-06 2012-01-07 Lott…          451      1469       730000  2012
#> # … with 2 more variables: month <dbl>, day <int>
library(ggplot2)
theme_set(theme_classic())
jackpot_data <- jackpot_size
jackpot_plot <- jackpot_data %>%
  filter(year == 2012 & game == "Lottario" & zip_code == "M1B") %>%
  ggplot(aes(day, ticket_sales, fill = as.factor(month))) +
  geom_col() +
  facet_wrap(~month, labeller = labeller(month = 
                                           c("1" = "January", "2" = "February", "3" = "March", "4" = "April", "5" = "May",
                                             "6" = "June", "7" = "July", "8" = "August", "9" = "September", "10" = "October",
                                             "11" = "November", "12" = "December"))) +
  labs(x = "Days", y = "# of tickets sold", title = "Lottario ticket salees in 2012") +
  theme(legend.position = "none") +
  scale_fill_manual(values = c("#a6cee3","#1f78b4","#b2df8a","#33a02c","#fb9a99","#e31a1c",
                               "#fdbf6f","#ff7f00","#cab2d6","#6a3d9a","#ffff99","#b15928"))
jackpot_plot
# EDA via base R
jackpot_eda <- function(x){
  hist(x, col = rainbow(30))
  plot(x)
  plot(density(x))
  data.frame(min = min(x),
             median = median(x),
             mean = mean(x),
             max = max(x),
             sd = sd(x),
             range =max(x) - min(x) )
}
jackpot_eda(jackpot_size$ticket_sales)


#>   min median     mean   max       sd range
#> 1   1    217 485.0569 17885 729.9844 17884We thank Dr. Ross Otto from Mcgill University for sharing these data sets on Open Source Framework. This project is being conducted with Dr. Luke Clark at the Centre for Gambling Research at UBC.
Please note that the lottodata project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.