What kind of data have we seen?
- Census
- All data (except people who don’t respond and those could be biased)
- Survey of individuals about themselves
- American Community Survey
- Sample of people - how representative is this?
- Individual data
- Crime
- Police reports (how representative is this of all crime?)
- Unsure of how reports indicate crime given uncertainty of convictions
- Recidivism
- Lots of data (though not all)
- Lots of factors
- Factors are both related to individuals and proxy (eg neighborhood)
- Flu
- CDC data - sample of hospitals and doctors
- Google data - proxy correlates, but a lot of data!
- Unemployment
- Government labor data - sample of households
- potential proxy correlates with Google searches
- not sure the information provided would accurately answer questions of interest
- eg young people may or may not work but that’s different than adults
What do you see in the unemployment data?
- unemployment is high in the winter (seasonal patterns)
- state unemployment tends to track national levels (tho numbers are different)
- unemployment high in 2008-10 then decreasing
What does it mean to sample
- a complete sample is too expensive
- large samples are more accurate (unemployment studies 110k people)
- average multiple samples
- sample across multiple populations to get a sense of the entire population
- unemployment surveys use 800 of 2000 sampling units in the US that are reflective of rural/urban, industrial/farming etc