Book narratives

  • follow the sentiments in books over time (much like twitter and other word data)
  • from data emerges consistent story arcs
  • see slides

Machine learning in book data (words)

  • history of understanding ratio of male:female centered books
  • understanding gender bias requires intensive categorization / reading
  • use information from 200 books and features about those books to determine important features in predicting whether the book is male or female centered
  • selecting important features (from a long list) is the same as selecting important predictors in marketing (ie who will buy) or in recidivism (who will reoffend)
    • see slides
  • test model on another set of books and examine success
  • predict male or female centered for all remaining books

Grouping Sherlock Holmes books (see the app)

  • without you providing a list of features, ML groups books based on word frequency
  • unsupervised ML (compared to supervised ML, where characteristics were compared to known outcomes)

Where are you from? (see the app)

  • pretty accurate
  • based on multiple questions each of which provides some probability of assigning you to a location
  • questions may differ for different people as particular sets of questions indicate particular locations
  • each question has some info, no one answer is diagnostic

Lyrics (see interactive site)

  • some lyrics are timeless (Love)
  • repetition is increasing
  • number of words is decreasing
  • vulgarity is increasing
  • word use reflects society
  • ML can be used to develop a model to predict genre from lyrics

How we find music has changed

  • what music you listen to is the result of targeting (eg Spotify suggestions) rather than general play