Much round, light burning, ancient people how to learn? Ever fell into fire cannot burn, drowned into the water cannot be die?
Just like a mani bead light, impervious halberd knives and swords.
Can also be kind, also be evil, kind or evil just with him.
When kind, can be immortal Buddha; When evil, like a beast. Havoc in Heaven with 72 changes, divine troops descending from Heaven also have no alternative.
In this page, we also will show the homework of the course of the Databases Technology in CMU just like the advance lists of ( Five ). All of the questions we tested in the postgresql.
This section mainly is to practice the query operation of the SQL, include SQL view, index, etc. We also sure that all the sqls will pass in the MySQL, this article also will provide the Chinese version in detail.
In this question we will use the MovieLens dataset released in May 2011. For more details, please refer to: movielens-readme
We preprocessed the original dataset and loaded the following two tables to PostgreSQL, this is same to the list of ( Five ):
• movies(mid, title, year, num ratings, rating) , Primary key: mid
• play in(mid, name, cast position), Primary key: mid, name
In the table movies, mid is the unique identifier for each movie, title is the movie’s title, and year is the movie’s year-of-release. Each movie receives a total number of num_ratings ratings from users, and the average rating is rating on a scale of 0.0-10.0.
The table play_in contains the main cast of movies. name is the actor’s name (assume each actor has an unique name). cast_position is the order of the actor where he/she appears on the movie cast list (For example, in the movie Titanic, the cast_positions of Leonardo DiCaprio and Kate Winslet are 1 and 2, respectively).
We will use Postgres, which is installed in the your own machines.
Write SQL queries for the following:
[Q1 – Actor Pool] Find the total number of distinct actors in the database. [5 points]
[Q2 – Best Rated Motion Pictures] Among all the movies rated for at least 10,000 times, select the top-10 movies (mid, title, and rating) which receive the highest average rating. [5 points]
[Q3 – Largest Crew] Return the movie (mid, title) that employed the largest number of actors. If there is a tie, sort the mid’s in ascending order. [10 points]
[Q4 – Active Actor Pool] Return the total number of active actors in the database. An active actor is one whose most recent movie is released after the year 2006 (i.e. year>2006). [10 points]
[Q5 – Working with Cruise] Return the actors (name) who has collaborated with Tom Cruise the most of times. If there is a tie, sort the name in ascending order. [10 points]
[Q6 – Working with the New Couple] Return the actors (name) who have collaborated with both Tom Cruise and Katie Holmes. Sort the output on ascending order of name. [10 points]
[Q7 – The Most Famous Actors] Find the top-10 active actors (as defined in Q4) whose movies receive the most ratings in total. For example, if Tom Hanks had played in three movies, whose num_ratings are 30, 20, and 10 respectively, then the total number of ratings of his movies would be 30+20+10=60. [10 points]
[Q8 – The Best Actors in a Leading Role] We define the leading role of each movie as the actors whose cast_position is either 1 or 2 (so the leading role of the movie Titanic is Leonardo DiCaprio and Kate Winslet). Then we say the star quality of each actor is the average rating of the movies in which he/she has played a leading role. For example, if Leonardo DiCaprio had acted in three movies, whose ratings are 7.0, 7.5 and 8.0 respectively, but he only played the first two movies as a leading role, then his star quality would be (7.0 + 7.5) / 2 = 7.25.
Find the actors (name) with top-10 star quality. To reduce noise, the actor must have played as a leading role for at least 5 times. [10 points]
[Q9 – The Breakout Actors in 2008] Find the number of actors who have acted in more movies in the year 2008 than any other year in their career. [15 points]
[Q10 – Dual Favorite Movies] One analyst wants to find a set of all-time best movies. He considers two metrics at the same time: its average rating (rating) and the number of ratings it receives (num_ratings). He uses the following rule to determine whether one movie is better than another: For two movies m 1 and m 2 , we define that m 1 dominates m 2 if and only if m 1 has
a higher average rating and m 1 receives more ratings.
Find a set of movies (title, num_ratings, rating) M, so that each movie in M is not dominated by any other movie in the database. Return the output on ascending order of title. (FYI: This query is also known as the Skyline Query) [15 points]
we give the Postgres version in detail, we will see you can tranfer it easily in mysql or others.
We should create tables of movies and play_in in advance. all the operations we have completed, you can see the refer to: the address: create tables[movies, play_in] and load datas .
sudo su - postgres # login postgres
select current_database(); # ===> show databases;
\l; # list all the databases
\c testdb; # switch to testdb database;
Like this in my ubuntu os:
SELECT COUNT(DISTINCT name) FROM play_in;
SELECT mid, title, rating FROM movies
CREATE VIEW freq
The flowing image show the test solution 1, 2, 3 infos in my ubuntu os:
CREATE VIEW latest
CREATE VIEW cruise_list
SELECT a1.name FROM play_in AS a1, play_in AS a2
SELECT play_in.name, SUM(num_ratings)
The flowing image show the test solution 4, 5, 6, 7 infos in my ubuntu os:
SELECT play_in.name, AVG(movies.rating)
CREATE VIEW helptable(name, year, num_movies) AS
SELECT title, num_ratings, rating FROM movies
The flowing image show the test solution 8, 9, 10 infos in my ubuntu os: