PostgreSQL is an open-source Relational Database Management System (RDMS) thats popular for a number of reasons: It’s free, it’s secure, it supports custom functions, it’s object relational model architecture, and unlimited rows per table. Check out this article for a more in-depth breakdown. PostgreSQL is also used by many major companies including: NASA, Twitch, Apple, and Reddit. In this article we are going touch on the basics of PostgreSQL so you can get up and running fast.
Downloading PostgreSQL and pgAdmin4.
On a Mac, the process of downloading postgres is simplified thanks to the postgress.app installation package. Simply download the desired release, and save the app to the applications folder.
After downloading, you should be able to double click on the blue elephant icon and see something similar to this pop up on your screen. Note: The databases shown will be different for you, but will include both postgres and template1 databases.
template1 is a template database. By default, postgres uses the template1 database to create new databases whenever you run the CREATE DATABASE SQL query. Please see this tutorial for a walkthrough on changing template1.
Now, you should download PgAdmin4, which is the desktop GUI interface that is commonly used to interact with postgres databases. Keep in mind that when you first installed postgres, it came with psql . You should be able to type psql in your terminal, and control your database through the CLI in the same way you can control it using pgAdmin’s GUI.
Think of the difference between psql and PgAdmin as similar to the difference between using github from the command line and the github desktop GUI.
For the sake of simplicity, we will focus on how to perform basic actions using the pgAdmin GUI — but keep in mind that this can all be done through the command line using psql. The data used in this article is a subset of the National Address Database (NAD) that was compiled by the Department of Transportation. And you can download the .csv used in this article here.
Now, we’ll see the blog database in the left browser window. When we click on the blog database, it will turn yellow indicating that we have connected to the database, and expand so we can see things like triggers, extensions, and schemas.
Now lets click on the Schemas menu, to reveal the tables tab. In this tutorial, we are going to take data from a csv, and load it into postgres.
Right click on Tables then create a new table called ‘mailboxes’.
Now we need to declare the columns in our database. This is an important step and the columns of the mailboxes table need to match the columns of the .csv file that we plan to import. This includes an index column, do not forget to include the index column if your .csv has one. In our case, I don’t have an index column.
The columns present in our addresses.csv file are as follows:
- StN_PosDir (North,South,East,West)
- Add_Number (address number)
- StN_PosTyp (Street/Drive/Lane/Court/Place/Road etc.)
Let’s click on the Columns tab and add the above mentioned columns to the table. Adding columns also requires that we declare the data types of each column. It sometimes makes sense to set all columns to text when importing a file, then dealing with changing the datatype after the import. In our case we can keep all the data as text.
We should now see all of our columns in the mailboxes table.
Importing CSV data into PostgreSQL.
In order to bring the data from addresses.csv into the mailboxes table, we need to use the pgAdmin query tool on the mailboxes table.
Now use the following SQL query to load the CSV into postgres.COPY mailboxes
Don’t forget that mailboxes is the name of the table we want to send the CSV information to, and the order of the columns in the mailboxes table matches the order of the columns in the addresses.csvfile. The CSV HEADER indicates that the first row of the addresses.csv file are column names and should be skipped, and the DELIMITER is ',' because a CSV is a comma separated file.
You can use F5 within the query tool to run a query, and after you run the query above you should see the following output:
To make sure that all of our information in now in our SQL database, grab and return all of the rows using the following query:
SELECT * FROM mailboxes;
Then hit F5 to run the query. You should see all of the table information in the Data Output window of pgAdmin.
Exporting the results of a query as a CSV.
Let’s say we run an SQL query on our address information, and want to output that information as a CSV file. First, let's write a query to make the addresses more readable.
SELECT CONCAT(addressnumber, ' ', streetdirection, ' ', streetname, ' ', stn_postyp, ' ', comm, ' ', state) as formatted_address FROM mailboxes
Oops, it seems that the NULL entries in the streetdirection column are strings, and not actual postgres null values. Because of this, we should replace all of the NULL strings with empty strings. This way, when we compile the address we will only add a street direction if one is needed. To replace all of the NULL strings in the streetdirection column with empty strings, run the following query.UPDATE mailboxes
SET streetdirection = REPLACE(streetdirection,'NULL','');
Now, the NULL is replaced with an empty string, and we can properly format the street names using the CONCAT query from before:
In order to save these to their own CSV, we need to wrap the CONCAT query around the COPY keyword, specify the destination and name of the output file, as well as the delimiter. Here is the format:COPY ([Query]) TO '[File Path]' DELIMITER ',' CSV HEADER;
This gets a little messy, but if you copy and paste this query into your query editor, it should be easier to read:COPY (SELECT CONCAT(addressnumber, ' ', streetdirection, ' ', streetname, ' ', stn_postyp, ' ', comm, ' ', state) as formatted_address FROM mailboxes) TO '/Users/brendanferris/Desktop/scripts/vehicle_analysis/output.csv' DELIMITER ',' CSV HEADER;
Now, the desired data will be located in a CSV on the file path that you specified!