This exploration is designed to help guide you through selecting items using SQL syntax. This is the most common operation you are going to run into as most databases tend to be read from a lot more than they are written to. When we are selecting things in SQL we can specify the table, the columns and the conditions we will use to select. In addition we have the ability to aggregate results by taking things like averages or sums across rows based on the content of the row.
In SQL it is convention to capitalize keywords. It isn’t required but helps make it more clear what is a SQL keyword and what is a table or column name. So the basic syntax for a SELECT
statement is as follows:
SELECT column1, column2, ... columnN FROM table WHERE condition;
So we have the following parts of this
SELECT
command is being run.,
. If we want we can rename this by replacing column1
with column1 AS new_name
. Assuming there was previously a column called column1
it would show up as new_name
in our result set. This is helpful when you are selecting data from several tables and there might be naming conflicts between them.You can use SELECT *
to select all columns. This is fine for working with data. But if you are planning to use the query in a program you should explicitly specify all the columns. If someone alters the structure of the table you can run into trouble because the columns that used to exist don’t any longer.
table
is simply replaced with the name of the table we want to select from.WHERE salary > 100000
. We will talk more about the sorts of conditions we can use shortly.;
. Sometimes the interpreter will be nice and add one for you but it is better to add it yourself and absolutely required if you run multiple queries in a row.An example query might look like
SELECT fname, lname FROM bsg_people WHERE age > 40;
This would select the first name and last name of people in the bsg_people
table only if their age is greater than 40.
This trips up a lot of people. There are two very similar characters that are used in SQL one is the accent grave `
the other is the single quote '
. The accent grave is typically found on the key to the left of the 1
key and it leans to the left. The single quote is usually found to the left of the return
key and is vertical.
You can surround names of things in SQL with accent graves. For example `bsg_people`
is acceptable and would be interpreted as a table name. You put single quotes around strings. So 'this is a string'
would be interpreted as a string as would 'bsg_people'
. Neither of these would work as a table name because they are strings.
If you see automatically generated SQL code it will almost always have accent graves around all table and column names.
You might ask, “Why? Why do such a thing!?” Well, the reason is because SQL wants to allow people to do terrible terrible things like create a table name with spaces `our employees`
for example. Never do this. Pretend like I never even mentioned it was a possibility. Whoever decided it was a good idea when designing the SQL spec needs to consider other lines of work. Kind of like this guy.
But, back to the topic at hand which involves absolutley no spaces in table or column names.
MySQL has documentaion (which is fairly hard to navigate) but does include a list of operators. They are pretty much what you might expect. I will talk about a couple notable exceptions.
=
not two. Weird right?IS TRUE
or IS FALSE
.NULL
in a second but this is how we see if something is NULL
.LIKE
is used to do simple pattern matching. You can use the wildcards %
to match any number of characters and _
to match exactly one character. So we could search for WHERE fname LIKE 'A%'
to find peoples who’s first name starts with the letter ‘A’ or something like A %time ago
to match phrases like “A long time ago” or “A long long time ago” or just “A time ago”.You can string together operators just like in Python to do something like WHERE age > 18 AND age < 65
to select people between 18 and 65.
In SQL you need to be a bit cautious around NULL
. Essentially logic in SQL is three valued. You can have TRUE
, FALSE
and NULL
. A null value represents an unknown or missing value. So say you were comparing the departments of employees from two tables you might have a condition like WHERE employee1.manager = employee2.manager
. You might expect to find that this would return all employees who have the same manager, which it does. It does not, however, return employees who both have a manager of NULL
. To also select them you would need a condition like employee1.manager IS NULL AND employee2.manager IS NULL
.
So be careful around nulls. If one is involved in a boolean expression it will generally end up NULL
rather than TRUE
or FALSE
.
Lets start with the following table of some truly amazing fictional movies for rent from a video rental store, this table is called demo_table
:
film_id | title | description | release_year | rental_duration | rental_rate | length | replacement_cost |
---|---|---|---|---|---|---|---|
50 | BAKED CLEOPATRA | A Stunning Drama of a Forensic Psychologist And a Husband who must Overcome a Waitress in A Monastery | 2006 | 3 | 2.99 | 182 | 20.99 |
100 | BROOKLYN DESERT | A Beautiful Drama of a Dentist And a Composer who must Battle a Sumo Wrestler in The First Manned Space Station | 2006 | 7 | 4.99 | 161 | 21.99 |
150 | CIDER DESIRE | A Stunning Character Study of a Composer And a Mad Cow who must Succumb a Cat in Soviet Georgia | 2006 | 7 | 2.99 | 101 | 9.99 |
200 | CURTAIN VIDEOTAPE | A Boring Reflection of a Dentist And a Mad Cow who must Chase a Secret Agent in A Shark Tank | 2006 | 7 | 0.99 | 133 | 27.99 |
250 | DRAGON SQUAD | A Taut Reflection of a Boy And a Waitress who must Outgun a Teacher in Ancient China | 2006 | 4 | 0.99 | 170 | 26.99 |
300 | FALCON VOLUME | A Fateful Saga of a Sumo Wrestler And a Hunter who must Redeem a A Shark in New Orleans | 2006 | 5 | 4.99 | 102 | 21.99 |
350 | GARDEN ISLAND | A Unbelieveable Character Study of a Womanizer And a Madman who must Reach a Man in The Outback | 2006 | 3 | 4.99 | 80 | 21.99 |
400 | HARDLY ROBBERS | A Emotional Character Study of a Hunter And a Car who must Kill a Woman in Berlin | 2006 | 7 | 2.99 | 72 | 15.99 |
450 | IDOLS SNATCHERS | A Insightful Drama of a Car And a Composer who must Fight a Man in A Monastery | 2006 | 5 | 2.99 | 84 | 29.99 |
500 | KISS GLORY | A Lacklusture Reflection of a Girl And a Husband who must Find a Robot in The Canadian Rockies | 2006 | 5 | 4.99 | 163 | 11.99 |
Lets do a simple query to find the rentals that are less than $1.
SELECT * FROM `demo_table` WHERE rental_rate < 1;
film_id | title | description | release_year | rental_duration | rental_rate | length | replacement_cost |
---|---|---|---|---|---|---|---|
200 | CURTAIN VIDEOTAPE | A Boring Reflection of a Dentist And a Mad Cow who must Chase a Secret Agent in A Shark Tank | 2006 | 7 | 0.99 | 133 | 27.99 |
250 | DRAGON SQUAD | A Taut Reflection of a Boy And a Waitress who must Outgun a Teacher in Ancient China | 2006 | 4 | 0.99 | 170 | 26.99 |
This one was pretty straight forward, it just returned the rows with a rental_rate
of $0.99 because they were less than $1.
Lets say we are only interested in stunning films. We could use the LIKE
keyword as follows:
SELECT * FROM `demo_table` WHERE description LIKE '%stunning%';
film_id | title | description | release_year | rental_duration | rental_rate | length | replacement_cost |
---|---|---|---|---|---|---|---|
50 | BAKED CLEOPATRA | A Stunning Drama of a Forensic Psychologist And a Husband who must Overcome a Waitress in A Monastery | 2006 | 3 | 2.99 | 182 | 20.99 |
150 | CIDER DESIRE | A Stunning Character Study of a Composer And a Mad Cow who must Succumb a Cat in Soviet Georgia | 2006 | 7 | 2.99 | 101 | 9.99 |
As we can see LIKE
is not case sensitive but we found all the movies which have stunning in their descriptions.
Finally lets do a little more complex query. Lets say we want to find the title
and description
of the movies that cost more than $1 to rent, are longer than 2 hours and that we can rent for 7 days.
SELECT title, description FROM demo_table WHERE rental_rate > 1 AND length > 120 AND rental_duration = 7;
title | description |
---|---|
BROOKLYN DESERT | A Beautiful Drama of a Dentist And a Composer who must Battle a Sumo Wrestler in The First Manned Space Station |
We get but a single result, BROOKLYN DESERT, it being the only row that met all of our conditions.
Sorting is straight forward. We add a ORDER BY
clause to the end of our SELECT
statement.
SELECT * FROM bsg_planets ORDER BY population DESC;
id | name | population | language | capital |
---|---|---|---|---|
17 | Canceron | 6700000000 | NULL | Hades |
3 | Caprica | 4900000000 | Caprican | Caprica City |
22 | Virgon | 4300000000 | NULL | Boskirk |
1 | Gemenon | 2800000000 | Old Gemenese | Oranu |
2 | Leonis | 2600000000 | Leonese | Luminere |
The syntax is
ORDER BY column1 {DESC|ASC}, column2 {DESC|ASC} ...
So we simply specify the columns we want to sort on in the order we want them to have precedence and then say if we want to sort the items in descending or ascending value.
We can also limit the number of results by adding LIMIT X
to the end of the query where X
is the number of rows we want returned.
SELECT * FROM bsg_planets ORDER BY population DESC LIMIT 1;
Returns only the first result
id | name | population | language | capital |
---|---|---|---|---|
17 | Canceron | 6700000000 | NULL | Hades |
Next we are going to talk about subqueries. These let us us the results of one query in another query. Sometimes you can use a subquery rather than a join and have things be quite a bit faster. If you are writing a program and need a primary key of a row it is a lot better to use a subquery to get it rather than hard code it.
The only interesting thing about subqueries is that you can alias the table so that you can refer to it. We do that by wrapping it parens and using AS
. For exmaple, if we wanted to find all the people from Caprica we could use a subquery like this
SELECT fname, lname FROM bsg_people
WHERE homeworld = (SELECT id FROM bsg_planets WHERE name = 'Caprica');
fname | lname |
---|---|
William | Adama |
Lee | Adama |
Laura | Roslin |
Kara | Thrace |
Gaius | Baltar |
This query returned all the characters from Caprica.
These are a bit more advanced and something we won’t probably be using too much of. But we can generate aggregate results using aggregate functions and the GROUP BY
clause.
Lets look at an example.
SELECT fname, lname, COUNT( * ) AS certs
FROM bsg_people
INNER JOIN bsg_cert_people ON bsg_cert_people.pid = bsg_people.id
GROUP BY bsg_people.id
ORDER BY fname ASC
This selects the first name, last name and the number of times someone shows up in our table. In this case we joined people with certifications and got a count of the number of times they showed up. We also renamed the COUNT(*)
as to certs
using the AS
keyword.
fname | lname | certs |
---|---|---|
Callandra | Henderson | 1 |
Galen | Tyrol | 1 |
Kara | Thrace | 1 |
Karl | Agathon | 1 |
Laura | Roslin | 1 |
Lee | Adama | 2 |
Margaret | Edmondson | 1 |
Saul | Tigh | 1 |
Sharon | Valerii | 1 |
So this table shows us how many certifications each person has. COUNT(*)
just tells us how many times the thing we GROUP BY
shows up. In this case we GROUP BY bsg_people.id
, so it gives us a count of how many times each id
shows up. Because we joined people
with the people_cert
table there is one row for each person/certification combo that exists.
Finally lets finish this by finding the maximum number of certifications someone has.
SELECT MAX(temp_table.certs) FROM
(SELECT fname, lname, COUNT( * ) AS certs
FROM bsg_people
INNER JOIN bsg_cert_people ON bsg_cert_people.pid = bsg_people.id
GROUP BY bsg_people.id
ORDER BY fname ASC) AS temp_table;
MAX(temp_table.certs) |
---|
2 |
This illustrates pretty much everything you might want to do in a select statement, we alias things, use aggregate functions and join tables. I will offer one word of warning this query will not return what you think.
SELECT fname, lname, MAX(temp_table.certs) FROM
(SELECT fname, lname, COUNT( * ) AS certs
FROM bsg_people
INNER JOIN bsg_cert_people ON bsg_cert_people.pid = bsg_people.id
GROUP BY bsg_people.id
ORDER BY fname ASC) AS temp_table;
We want the name of the person with the maximum number of certifications. We saw that was Lee Adama earlier.
But this table returns the following
fname | lname | MAX(temp_table.certs) |
---|---|---|
Callandra | Henderson | 2 |
What is going on?
Well we didn’t GROUP BY
anything. So it looks at the entire result set as a group and finds the maximum, which is 2. Thats fine. The problem is that it assumes any row in the group is representative of the group. So it takes the first row to pull the fname
and lname
from. Thats bad news.
A correct way to do this would be as follows.
SELECT fname, lname, COUNT( * ) AS certs
FROM bsg_people
INNER JOIN bsg_cert_people ON bsg_cert_people.pid = bsg_people.id
GROUP BY bsg_people.id
ORDER BY certs DESC LIMIT 1;
fname | lname | certs |
---|---|---|
Lee | Adama | 2 |
That gets us the result set we are looking for.
I would suggest working with a partner if you can. Take turns where one person comes up with a list of criteria for a selection query and the other implements that query.
This will give you practice with two different things. One is writing selection queries, but the other is interpreting language into a query which, once you get some practice will be the far harder task. When someone says they want PG-13 and G rated movies shorter than 100 minutes, they they want all PG-13 movies and only G movies shorter than 100 minutes? You will find out that they way you communicate these sorts of queries is often ambiguous and requires a lot of clarification.