Data
Please pick one of these data sets as your primary source data for the team project.
Dataset 1: Yelp Reviews
About
Yelp is a widely-used platform for discovering local businesses, including restaurants, shops, services, and more. It hosts a vast collection of user-generated reviews, photos, and detailed business information. The Yelp dataset is a curated subset of this data, designed to support academic research and educational purposes.
The Yelp dataset can be used to answer various research questions and practical applications, such as:
- What factors influence the ratings and reviews that businesses receive?
- Can we predict business success based on review sentiment?
- What are the key characteristics of businesses in different metropolitan areas?
- How do visual elements in user-uploaded photos relate to review sentiment or business ratings?
- What are the trends in customer feedback across different industries and regions?
- How can mobile apps leverage user-generated content to enhance local business discovery?
- What are the characteristics of businesses that receive a high volume of tips?
- What role do tips play in influencing other users’ perceptions and decisions?
- What role do review photos play in influencing user decisions and conversion rates?
Download the data
Multiple datasets are available for download.
- Visit the Yelp Dataset documentation page to download the data in JSON format.
- To convert the JSON files to CSV format, follow these steps:
- Obtain the conversion script by downloading or copying the code from this Gist.
- Save the script as
json_to_csv_converter.py
in your working directory. - Run the script from your terminal with the following command:
python json_to_csv_converter.py yelp_academic_dataset.json
Replace yelp_academic_dataset.json
with the path to the JSON file you wish to convert.
The script will generate a CSV file with the same name as your input JSON file, located in the same directory. For example, the input yelp_academic_dataset.json
will produce yelp_academic_dataset.csv
.
Dataset 2: IMDb
About
IMDb (Internet Movie Database) is a go-to online platform for information about movies, TV shows, actors, directors, and more. It offers details like titles, release dates, cast info, ratings, and reviews, making it a popular resource for entertainment enthusiasts and professionals. Subsets of IMDb data can be accessed for personal/non-commercial purposes.
The IMDb data can allow you to answer a variety of research questions, such as:
- How have trends in genre popularity evolved over time in the entertainment industry?
- Is there a connection between the format of a title (movie, TV series, etc.) and its audience reception?
- Does the involvement of specific creators (directors, writers) impact the success of their projects?
- Which types of titles generate the most online-word-of-mouth (as indicated by user-generated content like reviews and ratings)?
- What factors most strongly influence the average rating of a movie or a TV show?
- Are there differences in audience engagement between content targeted at adults and non-adults?
- How does popularity of crew member(s) influence a title’s viewer engagement (e.g. votes, ratings)?
- Is an individual’s fame related to his/her birth year?
- Are there patterns in viewer engagement with TV series episodes based on release timing?
Download the data
Multiple datasets are available for download.