jackpot_size.Rmd
library(lottodata)
The get access to this data set, simply, install the lottodata
package via GitHub and type jackpot_size
into R.
jackpot_size
The variables included in the data set:
Variable | Description | Type of variable |
---|---|---|
zip_code |
The first 3 digits of postal code (geographical region) | string |
start_date |
The start of the sales date (year-month-day format) | date |
end_date |
The end of the sales date (year-month-day format) | date |
game |
The specific lottery game (one of: Lotto Max, Lotto 649, Lottario) | string |
ticket_sales |
Number of tickets sold | integer |
net_sales |
The total cad dollar amount of sales | integer |
jackpot_size |
The jackpot size in cad dollars | integer |
year |
Year | integer |
month |
Month | integer |
day |
Day | integer |
Suppose you want to look at the game Lottario more in 2014:
library(lottodata)
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
# What is the yearly spending for the Lottario in zone M1B in 2012?
jackpot_size %>%
filter(year == 2012 & game == "Lottario" & zip_code == "M1B") %>%
head()
#> # A tibble: 6 x 10
#> zip_code start_date end_date game ticket_sales net_sales jackpot_size year
#> <chr> <date> <date> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 M1B 2012-01-01 2012-01-07 Lott… 75 233 730000 2012
#> 2 M1B 2012-01-02 2012-01-07 Lott… 167 516 730000 2012
#> 3 M1B 2012-01-03 2012-01-07 Lott… 168 466 730000 2012
#> 4 M1B 2012-01-04 2012-01-07 Lott… 274 841 730000 2012
#> 5 M1B 2012-01-05 2012-01-07 Lott… 195 558 730000 2012
#> 6 M1B 2012-01-06 2012-01-07 Lott… 451 1469 730000 2012
#> # … with 2 more variables: month <dbl>, day <int>
library(ggplot2)
theme_set(theme_classic())
jackpot_data <- jackpot_size
jackpot_plot <- jackpot_data %>%
filter(year == 2012 & game == "Lottario" & zip_code == "M1B") %>%
ggplot(aes(day, ticket_sales, fill = as.factor(month))) +
geom_col() +
facet_wrap(~month, labeller = labeller(month =
c("1" = "January", "2" = "February", "3" = "March", "4" = "April", "5" = "May",
"6" = "June", "7" = "July", "8" = "August", "9" = "September", "10" = "October",
"11" = "November", "12" = "December"))) +
labs(x = "Days", y = "# of tickets sold", title = "Lottario ticket salees in 2012") +
theme(legend.position = "none") +
scale_fill_manual(values = c("#a6cee3","#1f78b4","#b2df8a","#33a02c","#fb9a99","#e31a1c",
"#fdbf6f","#ff7f00","#cab2d6","#6a3d9a","#ffff99","#b15928"))
jackpot_plot
# EDA via base R
jackpot_eda <- function(x){
hist(x, col = rainbow(30))
plot(x)
plot(density(x))
data.frame(min = min(x),
median = median(x),
mean = mean(x),
max = max(x),
sd = sd(x),
range =max(x) - min(x) )
}
jackpot_eda(jackpot_size$ticket_sales)
#> min median mean max sd range
#> 1 1 217 485.0569 17885 729.9844 17884