The solution of Databases Query problems(Sixth)


Preface

Much round, light burning, ancient people how to learn? Ever fell into fire cannot burn, drowned into the water cannot be die?
Just like a mani bead light, impervious halberd knives and swords.

Can also be kind, also be evil, kind or evil just with him.
When kind, can be immortal Buddha; When evil, like a beast. Havoc in Heaven with 72 changes, divine troops descending from Heaven also have no alternative.

DB-Sixth-kings


Presentation

In this page, we also will show the homework of the course of the Databases Technology in CMU just like the advance lists of ( Five ). All of the questions we tested in the postgresql.

This section mainly is to practice the query operation of the SQL, include SQL view, index, etc. We also sure that all the sqls will pass in the MySQL, this article also will provide the Chinese version in detail.

DB-Sixth-cn


Question Details

In this question we will use the MovieLens dataset released in May 2011. For more details, please refer to: movielens-readme

We preprocessed the original dataset and loaded the following two tables to PostgreSQL, this is same to the list of ( Five ):

• movies(mid, title, year, num ratings, rating) , Primary key: mid
• play in(mid, name, cast position), Primary key: mid, name

In the table movies, mid is the unique identifier for each movie, title is the movie’s title, and year is the movie’s year-of-release. Each movie receives a total number of num_ratings ratings from users, and the average rating is rating on a scale of 0.0-10.0.

The table play_in contains the main cast of movies. name is the actor’s name (assume each actor has an unique name). cast_position is the order of the actor where he/she appears on the movie cast list (For example, in the movie Titanic, the cast_positions of Leonardo DiCaprio and Kate Winslet are 1 and 2, respectively).


We will use Postgres, which is installed in the your own machines.

Queries

Write SQL queries for the following:

[Q1 – Actor Pool] Find the total number of distinct actors in the database. [5 points]


[Q2 – Best Rated Motion Pictures] Among all the movies rated for at least 10,000 times, select the top-10 movies (mid, title, and rating) which receive the highest average rating. [5 points]


[Q3 – Largest Crew] Return the movie (mid, title) that employed the largest number of actors. If there is a tie, sort the mid’s in ascending order. [10 points]


[Q4 – Active Actor Pool] Return the total number of active actors in the database. An active actor is one whose most recent movie is released after the year 2006 (i.e. year>2006). [10 points]


[Q5 – Working with Cruise] Return the actors (name) who has collaborated with Tom Cruise the most of times. If there is a tie, sort the name in ascending order. [10 points]


[Q6 – Working with the New Couple] Return the actors (name) who have collaborated with both Tom Cruise and Katie Holmes. Sort the output on ascending order of name. [10 points]


[Q7 – The Most Famous Actors] Find the top-10 active actors (as defined in Q4) whose movies receive the most ratings in total. For example, if Tom Hanks had played in three movies, whose num_ratings are 30, 20, and 10 respectively, then the total number of ratings of his movies would be 30+20+10=60. [10 points]


[Q8 – The Best Actors in a Leading Role] We define the leading role of each movie as the actors whose cast_position is either 1 or 2 (so the leading role of the movie Titanic is Leonardo DiCaprio and Kate Winslet). Then we say the star quality of each actor is the average rating of the movies in which he/she has played a leading role. For example, if Leonardo DiCaprio had acted in three movies, whose ratings are 7.0, 7.5 and 8.0 respectively, but he only played the first two movies as a leading role, then his star quality would be (7.0 + 7.5) / 2 = 7.25.

Find the actors (name) with top-10 star quality. To reduce noise, the actor must have played as a leading role for at least 5 times. [10 points]


[Q9 – The Breakout Actors in 2008] Find the number of actors who have acted in more movies in the year 2008 than any other year in their career. [15 points]


[Q10 – Dual Favorite Movies] One analyst wants to find a set of all-time best movies. He considers two metrics at the same time: its average rating (rating) and the number of ratings it receives (num_ratings). He uses the following rule to determine whether one movie is better than another: For two movies m 1 and m 2 , we define that m 1 dominates m 2 if and only if m 1 has
a higher average rating and m 1 receives more ratings.

Find a set of movies (title, num_ratings, rating) M, so that each movie in M is not dominated by any other movie in the database. Return the output on ascending order of title. (FYI: This query is also known as the Skyline Query) [15 points]


Answer

we give the Postgres version in detail, we will see you can tranfer it easily in mysql or others.

We should create tables of movies and play_in in advance. all the operations we have completed, you can see the refer to: the address: create tables[movies, play_in] and load datas .

Initialization:

1
2
3
4
5
6
sudo su - postgres     		# login postgres
psql

select current_database(); # ===> show databases;
\l; # list all the databases
\c testdb; # switch to testdb database;

Solution 1

1
SELECT COUNT(DISTINCT name) FROM play_in;

Solution 2

1
2
3
SELECT mid, title, rating FROM movies
WHERE num_ratings >= 10000
ORDER BY rating DESC LIMIT 10;

Solution 3

1
2
3
4
5
6
7
8
CREATE VIEW freq
AS
SELECT mid FROM play_in
GROUP BY mid
HAVING COUNT(*) >= ALL (SELECT COUNT(*)
FROM play_in GROUP BY mid);

SELECT freq.mid, title FROM freq, movies WHERE freq.mid = movies.mid ORDER BY mid DESC;

Solution 4

1
2
3
4
5
6
7
8
CREATE VIEW latest
AS
SELECT name, max(year) FROM play_in, movies
WHERE play_in.mid = movies.mid
GROUP BY name
HAVING max(year) > 2006;

SELECT COUNT(*) FROM latest;

Solution 5

1
2
3
4
5
6
7
8
9
CREATE VIEW cruise_list
AS
SELECT a1.name, COUNT(*) as C FROM play_in AS a1, play_in AS a2
WHERE a1.mid = a2.mid
AND a2.name = 'Tom Cruise'
AND a1.name <> 'Tom Cruise'
GROUP BY a1.name;

SELECT name FROM cruise_list where C >= ALL(SELECT C FROM cruise_list) ORDER BY name;

Solution 6

1
2
3
4
5
6
SELECT a1.name FROM play_in AS a1, play_in AS a2
WHERE a1.mid = a2.mid AND a2.name = 'Tom Cruise' AND a1.name <> 'Tom Cruise'
INTERSECT
SELECT a1.name FROM play_in AS a1, play_in AS a2
WHERE a1.mid = a2.mid AND a2.name = 'Katie Holmes' AND a1.name <> 'Katie Holmes'
ORDER BY name;

Solution 7

1
2
3
4
5
6
SELECT play_in.name, SUM(num_ratings) 
FROM play_in, movies
WHERE play_in.mid = movies.mid
GROUP BY name HAVING MAX(year) > 2006
ORDER BY SUM(num_ratings)
DESC LIMIT 10;

Solution 8

1
2
3
4
5
6
7
SELECT play_in.name, AVG(movies.rating) 
FROM play_in, movies
WHERE cast_position<3 AND play_in.mid = movies.mid
GROUP BY play_in.name
HAVING count(*) >= 5
ORDER BY avg(movies.rating)
DESC LIMIT 10;

Solution 9

1
2
3
4
5
6
7
8
9
CREATE VIEW helptable(name, year, num_movies) AS
SELECT name, year, count(*) FROM movies AS m, play_in AS p
WHERE m.mid = p.mid
GROUP BY name, year;

SELECT COUNT(t1.name) FROM helptable AS t1
WHERE t1.year = 2008 AND t1.num_movies > ALL (
SELECT num_movies FROM helptable AS t2
WHERE t1.name = t2.name AND t2.year <> 2008);

Solution 10

1
2
3
4
5
6
7
SELECT title, num_ratings, rating FROM movies
WHERE mid NOT IN
(SELECT a2.mid FROM movies AS a1, movies AS a2
WHERE a1.mid<>a2.mid
AND a1.num_ratings > a2.num_ratings
AND a1.rating > a2.rating)
ORDER BY title;

Reference

[1]. http://www.cs.cmu.edu/~christos/courses/dbms.S12/hws/HW2/PostgreSQLReadme.htm
[2]. http://blog.chinaunix.net/uid-26642180-id-3485465.html
[3]. http://www.cnblogs.com/gaojian/archive/2013/07/04/3170989.html

0%