Using PostgreSQL
PostgreSQL is an open-source relational database management system based on the SQL (Structured Query Language) standard for relational databases. For more information, see
http://www.postgresql.org.In the CS department, PostgreSQL is running under Redhat Linux, and its interface is accessible from any Solaris 7 machine.
PostgreSQL databases can be accessed in many ways. This document covers only the psql command-line interface; see the PHP Tutorial for information on accessing PostgreSQL databases with PHP. For more information about the many other interfaces, which include several procedural languages such as Perl, as well as ODBC support, see
http://www.postgresql.org.This section presumes a basic understanding of relational databases. There are many introductions to relational databases on the web; for a starting point, try
http://www.uk.research.att.com/pub/docs/att/tr.98.2.pdf. Some of the information in this section has been adapted from the SQL commands section of the PostgreSQL User Guide; throughout this document, many explanations have links to specific sections of this guide for more information.CREATE TABLE [TEMP] <tablename> (<field1> <type1>,<field2> <type2>,...,<fieldn> <typen>)
This command will create a table with the named fields and data types; if the TEMP parameter is specified, the table only exists until the current database session ends. For example:
CREATE TABLE customers (firstname varchar(20), lastname varchar(20) NOT NULL, address varchar(40), city varchar(20), state char(2) DEFAULT "TN", zip int4);
would create a table named customers with five text fields of various sizes, and an integer field. The NOT NULL keyword specifies that when a record is inserted into the table a value must be provided for the lastname field; the DEFAULT keyword specifies that the string that follows, TN, will be filled into the state field if it is left blank when a record is inserted.
There are many data types available in PostgreSQL, including some complex data types beyond those specified in the SQL standard. In particular, PostgreSQL stands out for the special data types it provides for manipulation of geometric data and IP addresses. More information about all the data types available can be found in the
user's guide; a few of the most commonly used are:|
char(n) |
Character string with a fixed length of n characters |
|
varchar(n) |
Character string with a variable length of up to n characters |
|
text |
Variable-length character string |
|
int2, int4, int8 |
Signed 2, 4, or 8 byte integer |
|
float4, float8 |
4 or 8 byte floating-point number |
|
date |
Date without time of day |
|
time |
Time of day |
|
timestamp |
Date and time |
|
serial |
Used to create a unique sequence of numbers; each item added to the table will receive the first unused number in this field when added. |
If a table has a lot of data in it, lookups into it may become very time-consuming. To reduce the search time, you can create an index on one or more (but no more than 16) fields of the table. An index will allow rapid access of records based on indexed fields; however, it will make inserting and deleting records slower, as the index must be updated each time the table is modified. An index can also be used to guarantee uniqueness of an item in a table; for example, if you had a "customers" table and wanted to be sure each customer had a unique customer number, you could create a unique index on the customer number field.
CREATE [UNIQUE] INDEX indexname ON tablename (column1, column2, ..., columnn);
Note that a table can have more than one index; often, a table will have a unique index on one or more fields and also have another, non-unique index on one or more other fields to allow ease of searching on fields that do not need to have unique values.
DROP TABLE tablename
This will remove a table and any associated indices from the database.
Note that if the table contained any fields of serial type, the sequences created in the database for these fields is NOT automatically removed; you will have to explicitly issue the drop sequence command:
DROP SEQUENCE tablename_fieldname.
To delete an index without deleting the table it is associated with:
DROP INDEX indexname
If a large amount of data is to be loaded into or unloaded from a table, it may be advantageous to drop the indices from that table so that they do not need to be updated for each record; note however that unique indices should be generally retained to guarantee data integrity.
Note that PostgreSQL does NOT currently support removal of a column from a table; you must drop and recreate the table.
To select records from a single table:
SELECT column1, column2, ..., columnn
FROM tablename
WHERE criteria;
For the following examples, assume that three tables have been created; how to actually insert records into a table will be covered in the next section.
The tables:
items:
|
item_number |
description |
type |
color |
|
1 |
Red Widget |
widget |
red |
|
2 |
Blue Widget |
widget |
blue |
|
3 |
Red Shmoo |
shmoo |
red |
|
4 |
Blue Shmoo |
shmoo |
blue |
components:
|
item_number |
component |
|
1 |
part A |
|
1 |
part B |
|
1 |
part C |
|
2 |
part A |
|
2 |
part D |
|
3 |
part E |
|
3 |
part F |
|
3 |
part G |
|
4 |
part A |
|
4 |
part H |
|
4 |
part I |
stock:
|
component |
quantity_on_hand |
|
part A |
120 |
|
part B |
30 |
|
part C |
20 |
|
part D |
20 |
|
part E |
10 |
|
part F |
30 |
|
part G |
15 |
|
part H |
25 |
|
part I |
5 |
Example 1: Selecting all records from a table
The wildcard * can be used in place of the names of columns in a select statement if all columns are to be returned. So, the statement:
SELECT *
FROM items;
Returns:
|
item_number |
description |
type |
color |
|
1 |
Red Widget |
widget |
red |
|
2 |
Blue Widget |
widget |
blue |
|
3 |
Red Shmoo |
shmoo |
red |
|
4 |
Blue Shmoo |
shmoo |
blue |
Example 2: Selecting certain columns
Any columns from a table can be selected in any order. So, the statement:
SELECT type, item_number
FROM items;
Returns:
|
type |
item_number |
|
widget |
1 |
|
widget |
2 |
|
shmoo |
3 |
|
shmoo |
4 |
Example 3: Selecting certain rows, part 1
To select rows based on certain criteria, specify a condition in the where clause of the select statement.
For example,
SELECT *
FROM items
WHERE type = 'widget';
Returns:
|
item_number |
description |
type |
color |
|
1 |
Red Widget |
widget |
red |
|
2 |
Blue Widget |
widget |
blue |
The condition specified can also be >, <, >=, or <= as appropriate.
Example 4: Selecting certain rows, part 2
If you only want to see rows that satisfy more than one condition, you can append as many conditions as desired using AND. For example,
SELECT *
FROM stock
WHERE quantity_on_hand > 10
AND quantity_on_hand < 30;
Returns:
|
Component |
quantity_on_hand |
|
part C |
20 |
|
part D |
20 |
|
part G |
15 |
|
part H |
25 |
Example 5: Using OR
If you want to see the logical union of multiple sets of records, you can join conditions together with OR. (Note that SQL does have a UNION keyword, but it doesn't mean the same thing; see
the PostgreSQL documentation on UNION .) For example:SELECT *
FROM stock
WHERE quantity_on_hand < 10
OR quantity_on_hand >30;
Returns:
|
Component |
quantity_on_hand |
|
part A |
120 |
|
part I |
5 |
Example 6: Subqueries
Suppose you wanted to know the quantity on hand for components used in item 1. To find this out, you would first need to find the components used from the components table, then look up the quantities for those components in the stock table. SQL provides an easy way to combine these lookups with subqueries, which allow you to select rows having values drawn from a SELECT from another table. For example:
SELECT *
FROM stock
WHERE component IN
(SELECT component
FROM components
WHERE item_number = 1);
Returns:
|
Component |
quantity_on_hand |
|
part A |
120 |
|
part B |
30 |
|
part C |
20 |
Example 7: Selecting from multiple tables
Now suppose you wanted to know what the quantity on hand was for components used in widgets. You could use two subqueries:
SELECT *
FROM stock
WHERE component IN
(SELECT component
FROM components
WHERE item_number IN
(SELECT item_number
FROM items
WHERE type = 'widget'));
Fortunately SQL provides a powerful mechanism, known as a join, for crossing multiple tables that makes long nested queries like this unnecessary. To perform a join, specify all the tables you will be selecting from in the FROM clause and specify in the WHERE clause the columns that correspond in the tables. So, to perform the above query:
SELECT stock.component, stock.quantity_on_hand
FROM items, components, stock
WHERE items.item_number = components.item_number
AND components.component = stock.component
AND items.type = 'widget';
Returns:
|
Component |
quantity_on_hand |
|
part A |
120 |
|
part A |
120 |
|
part B |
30 |
|
part C |
20 |
|
part D |
20 |
One advantage of the nested subquery method is that while with a join, the
same row is returned twice in this case, it would appear only once
with the subquery method. See the section on selecting distinct values
below for a way to avoid this problem.
Note that whenever two tables in a query have columns with the same name, the column name must be specified as tablename.columnname to indicate which table the column is to come from. To save typing when table names are long, you can assign aliases to the tables in the FROM clause and use the aliases in place of the table names. For example, the above query could also be written:
FROM items i, components c, stock s
WHERE i.item_number = c.item_number
AND c.component = s.component
AND i.type = 'widget';
Example 8: Ordering results
If a query returns a lot of rows, it may be convenient to order them by one or more columns. For example, suppose you wanted to know what parts the stock was low on. You could order a selection from the stock table by quantity, like this:
SELECT *
FROM stock
ORDER BY quantity_on_hand;
Results:
|
Component |
quantity_on_hand |
|
part I |
5 |
|
part E |
10 |
|
part G |
15 |
|
part C |
20 |
|
part D |
20 |
|
part H |
25 |
|
part B |
30 |
|
part F |
30 |
|
part A |
120 |
You can also order by multiple columns, which will order by the first column, then by the second column where identical in the first column, etc. If you want to see the results in descending order, just add the keyword DESC after the column names in the ORDER BY clause.
Example 9: Counting records
If you only want to know how many records match a set of criteria, you can use the COUNT(*) function:
SELECT COUNT(*)
FROM components
WHERE item_number = 3;
Returns:
|
count(*) |
|
3 |
Example 10: Aggregates
SQL also includes several aggregate functions, including SUM(), MAX(), and MIN().
To use these, the GROUP BY clause is added to the end of the select statement to specify how records should be grouped together. For example:
SELECT c.item_number, MIN(s.quantity_on_hand)
FROM components c, stock s
WHERE c.component = s.component
AND c.item_number = 3
GROUP BY c.item_number;
Returns:
|
item_number |
quantity_on_hand |
|
3 |
10 |
Example 11: Selecting distinct values
Suppose you only wanted to see what different colors there are for products. You could select the color column from all rows, but you would get multiple copies of each result – not a big deal with four rows, but quite tedious for a large table. SQL remedies this problem with the DISTINCT keyword:
SELECT DISTINCT color
FROM items;
Returns:
|
color |
|
blue |
|
red |
INSERT INTO tablename (column1, column2, ..., columnn) VALUES(value1, value2, ..., valuen);
Value1 will be placed into column1, value2 in column2, and so on, regardless of the order of the columns in the table; columns may be omitted and will be filled in with nulls (an error will occur if a column specified as NOT NULL is omitted). If values are provided for all columns in the table in the order in which they appear in the table, the list of columns may be omitted.
To delete a record or records:
DELETE FROM tablename
WHERE condition;
Any valid condition for a WHERE clause can be used to specify records to delete.
To update a record or records:
UPDATE tablenameAny or all of the columns in a table may be given new values with an update statement, and again, any valid condition for a WHERE clause may be used. For example, if this statement were executed:
UPDATE items
SET description = 'Green Shmoo',
color = 'green'
WHERE item_number = 4;
Then the table would look like this afterwards:
|
item_number |
description |
type |
color |
|
1 |
Red Widget |
widget |
red |
|
2 |
Blue Widget |
widget |
blue |
|
3 |
Red Shmoo |
shmoo |
red |
|
4 |
Green Shmoo |
shmoo |
green |
PostgreSQL provides a COPY command as an addition to the SQL standard for moving data between tables and files. However, the COPY command can only be used by database administrators because of its direct interaction with the database. To give file access functionality to other users, psql provides a wrapper to the COPY command, \copy.
To use the copy command:
\copy tablename [from|to] filename
Copying to a file will create a file with the given name that includes all the rows of the table, one row per line, with the fields separated by tabs. To copy from a file, the file must be in the same format; data in the file will be added to the table.