...
Code Block | ||
---|---|---|
| ||
SELECT issns, COUNT( * ) FROM articles GROUP BY issns; |
GROUP BY
tells SQL what field or fields we want to use to aggregate the data. If we want to group by multiple fields, we give GROUP BY
a comma separated list.
...
Code Block | ||
---|---|---|
| ||
SELECT issns, COUNT( * ) FROM articles GROUP BY issns HAVING COUNT( * ) >= 10; |
...
The HAVING
keyword works exactly like the WHERE
keyword, but uses aggregate functions instead of database fields.
...
Code Block | ||
---|---|---|
| ||
DROP VIEW journal_counts; |
You can also add a view using Create View in the View menu and see the results in the Views tab just like a table.
Exercise
Write a query that returns the number of articles published in each journal on each month, sorted from most popular journal to the ones with least publications each month starting from the most recent records. Save this query as a
VIEW
.
Joins and aliases
Joins
To combine data from two tables we use the SQL JOIN
command, which comes after the FROM
command.
We also need to tell the computer which columns provide the link between the two tables using the word ON
. What we want is to join the data with the same journal name.
Code Block | ||
---|---|---|
| ||
SELECT *
FROM articles
JOIN journals
ON articles.issns = journals.issns; |
ON
is like WHERE
, it filters things out according to a test condition. We use the table.colname
format to tell the manager what column in which table we are referring to.
Alternatively, we can use the word USING
, as a short-hand. In this case we are telling the manager that we want to combine articles
with journals
and that the common column is issns
.
Code Block | ||
---|---|---|
| ||
SELECT *
FROM articles
JOIN journals
USING (issns); |
We often won’t want all of the fields from both tables, so anywhere we would have used a field name in a non-join query, we can use table.colname
.
For example, what if we wanted information on published articles in different journals, but instead of their ISSN we wanted the actual journal title.
Code Block | ||
---|---|---|
| ||
SELECT articles.issns, journal_title, title, first_author, citation_count, author_count, month, year
FROM articles
JOIN journals
ON articles.issns = journals.issns; |
Joins can be combined with sorting, filtering, and aggregation. So, if we wanted average number of authors for articles on different journals, we could do something like
Code Block | ||
---|---|---|
| ||
SELECT articles.issns, journal_title, ROUND(AVG(author_count), 2)
FROM articles
JOIN journals
ON articles.issns = journals.issns;
GROUP BY articles.issns; |
Aliases
As queries get more complex names can get long and unwieldy. To help make things clearer we can use aliases to assign new names to things in the query.
We can alias both table names:
Code Block | ||
---|---|---|
| ||
SELECT ar.title, ar.first_author, jo.journal_title
FROM articles AS ar
JOIN journals AS jo
ON ar.issns = jo.issns; |
And column names:
Code Block | ||
---|---|---|
| ||
SELECT ar.title AS title, ar.first_author AS author, jo.journal_title AS journal
FROM articles AS ar
JOIN journals AS jo
ON ar.issns = jo.issns; |
The AS
isn’t technically required, so you could do
Code Block | ||
---|---|---|
| ||
SELECT a.title t
FROM articles a; |
but using AS
is much clearer so it is good style to include it.
Exercises:
- How many plots from each type are there?
- How many papers have a single author? How many have 2 authors? How many 3? etc?
- How many articles are published for each language? (Ignore articles where language is unknown).
- How many articles are published for each licence type, and what is the average number of citations for that licence type
- Create a view with article, journal and publisher information