Tutorial
This tutorial walks through sq
’s features.
If you haven’t installed sq
, see here. If you’ve already
installed sq
, check that you’re on a
recent version like so:
$ sq version
sq v0.40.0
If your version is older than that, please upgrade.
Then start with sq help
.
Basics
Let’s set out with an example. We’ll use a SQLite database, prepopulated with the Sakila dataset. Note that for many of these examples, the output will be abbreviated for brevity.
# No data source added to sq yet, so "sq ls" is empty.
$ sq ls
# Download the sample db.
$ wget https://sq.io/testdata/sakila.db
# Add a new source
$ sq add ./sakila.db --handle @tutorial_db
@tutorial_db sqlite3 sakila.db
# The new source should show up in "sq ls".
$ sq ls
@tutorial_db sqlite3 sakila.db
# For many commands, add -v (--verbose) to see more detail.
$ sq ls -v
HANDLE ACTIVE DRIVER LOCATION OPTIONS
@tutorial_db active sqlite3 sqlite3:///Users/neilotoole/work/sq/sq/scratch/tutorial/sakila.db
# Now, let's have a look at our new source. Output abbreviated for brevity.
$ sq inspect @tutorial_db
SOURCE DRIVER NAME FQ NAME SIZE TABLES VIEWS LOCATION
@tutorial_db sqlite3 sakila.db sakila.db/main 5.6MB 16 5 sqlite3:///Users/neilotoole/work/sq/sq/scratch/tutorial/sakila.db
NAME TYPE ROWS COLS
actor table 200 actor_id, first_name, last_name, last_update
address table 603 address_id, address, address2, district, city_id, postal_code, phone, last_update
category table 16 category_id, name, last_update
city table 600 city_id, city, country_id, last_update
country table 109 country_id, country, last_update
customer table 599 customer_id, store_id, first_name, last_name, email, address_id, active, create_date, last_update
film table 1000 film_id, title, description, release_year, language_id, original_language_id, rental_duration, rental_rate, length, replacement_cost, rating, special_features, last_update
film_actor table 5462 actor_id, film_id, last_update
Let’s step through the above:
sq ls
: list the current sources. There are none.wget
: download a SQLite datafile to use for this demo.sq add
: add a source. The driver type is detected to besqlite3
, and we give this source the handle@tutorial_db
.sq ls
: lists the sources again; this time we see our new@tutorial_db
source.sq ls -v
: lists the sources yet again, this time verbosely (-v
).sq inspect
: inspects@tutorial_db
, showing the structure of the source.
sq
commands feature sophisticated shell completion. Try it out
by hitting TAB
when typing a command.Query
Now that we have added a source to sq
, we can query it. Let’s select
everything from the actor
table.
$ sq @tutorial_db.actor
actor_id first_name last_name last_update
1 PENELOPE GUINESS 2020-02-15T06:59:28Z
2 NICK WAHLBERG 2020-02-15T06:59:28Z
3 ED CHASE 2020-02-15T06:59:28Z
# Being that "@tutorial_db" is the active source, you can omit the handle:
$ sq .actor
actor_id first_name last_name last_update
1 PENELOPE GUINESS 2020-02-15T06:59:28Z
2 NICK WAHLBERG 2020-02-15T06:59:28Z
3 ED CHASE 2020-02-15T06:59:28Z
That listed the contents of the actor
table.
The same query can be executed in native SQL:
$ sq sql "SELECT * FROM actor"
actor_id first_name last_name last_update
1 PENELOPE GUINESS 2020-02-15T06:59:28Z
2 NICK WAHLBERG 2020-02-15T06:59:28Z
3 ED CHASE 2020-02-15T06:59:28Z
Let’s look at some examples of using the SLQ query language. See the query guide for in-depth documentation.
$ sq '.actor | where(.first_name == "MARY")'
actor_id first_name last_name last_update
66 MARY TANDY 2020-02-15T06:59:28Z
198 MARY KEITEL 2020-02-15T06:59:28Z
It should be obvious that the above query effectively performs a WHERE first_name = 'MARY'
.
$ sq '.actor | .first_name, .last_name | .[2:5]'
first_name last_name
ED CHASE
JENNIFER DAVIS
JOHNNY LOLLOBRIGIDA
Above we select (zero-indexed) rows 2-5, and output columns first_name
and last_name
.
The same could be accomplished by:
$ sq sql 'SELECT first_name, last_name FROM actor LIMIT 3 OFFSET 2'
first_name last_name
ED CHASE
JENNIFER DAVIS
JOHNNY LOLLOBRIGIDA
We could also output in JSON using the -j
(--json
) flag:
[
{
"first_name": "ED",
"last_name": "CHASE"
},
{
"first_name": "JENNIFER",
"last_name": "DAVIS"
},
{
"first_name": "JOHNNY",
"last_name": "LOLLOBRIGIDA"
}
]
The --jsonl
(JSON lines) format is sometimes more convenient:
{"first_name": "ED", "last_name": "CHASE"}
{"first_name": "JENNIFER", "last_name": "DAVIS"}
{"first_name": "JOHNNY", "last_name": "LOLLOBRIGIDA"}
There are several other output formats available.
Join
We can join across the tables of the database.:
$ sq '.actor | join(.film_actor, .actor_id) | join(.film, .film_id) | .first_name, .last_name, .title:film_title'
first_name last_name film_title
PENELOPE GUINESS ACADEMY DINOSAUR
PENELOPE GUINESS ANACONDA CONFESSIONS
PENELOPE GUINESS ANGELS LIFE
PENELOPE GUINESS BULWORTH COMMANDMENTS
Stdin
Let’s grab another data source, this time in CSV. We’ll download the file.
$ wget https://sq.io/testdata/film.csv
Now let’s take a look at it:
$ cat film.csv | sq inspect -v
SOURCE DRIVER NAME FQ NAME SIZE TABLES VIEWS LOCATION
@stdin csv @stdin @stdin 189.6KB 1 0 @stdin
NAME TYPE ROWS COLS NAME TYPE PK
data table 1000 13 film_id INTEGER
title TEXT
description TEXT
release_year INTEGER
language_id INTEGER
original_language_id TEXT
rental_duration INTEGER
rental_rate NUMERIC
length INTEGER
replacement_cost NUMERIC
rating TEXT
special_features TEXT
last_update DATETIME
Note that because CSV is monotable (only has one table of data), its data is represented as a single table named
data
.
We can pipe that CSV file to sq
and performs the usual actions on it:
$ cat film.csv | sq '.data | .title, .release_year | .[2:5]'
title release_year
ADAPTATION HOLES 2006
AFFAIR PREJUDICE 2006
AFRICAN EGG 2006
We could continue to cat
the CSV file to sq
, but being that we’ll use it later
in this tutorial, we’ll add it as a source:
$ sq add film.csv --handle @film_csv
@film_csv csv film.csv
stdin
sources can’t take advantage of ingest caching, because
the stdin
pipe is “anonymous”, and sq
can’t do a cache lookup for it. If you’re going to
repeatedly query the same stdin
data, you should probably just sq add
it.We’ve now got two sources: a SQLite database (@tutorial_db
), and
a CSV file (@film_csv
). We can join across those sources:
$ sq '.actor | join(.film_actor, .actor_id) | join(@film_csv.data, .film_id) | .first_name, .last_name, .title | .[0:5]'
first_name last_name title
PENELOPE GUINESS ACADEMY DINOSAUR
PENELOPE GUINESS ANACONDA CONFESSIONS
PENELOPE GUINESS ANGELS LIFE
PENELOPE GUINESS BULWORTH COMMANDMENTS
PENELOPE GUINESS CHEAPER CLYDE
Active Source
Now that we’ve added multiple sources, let’s see what sq ls
has to say:
$ sq ls
@film_csv csv film.csv
@tutorial_db* sqlite3 sakila.db
Note that @tutorial_db
is the active source (it has an asterisk beside it, and renders
in a different color on a color terminal).
We can do this with @film_csv
:
$ sq '@film_csv.data | .title | .[0:2]'
title
ACADEMY DINOSAUR
ACE GOLDFINGER
But not this:
$ sq '.data | .title | .[0:2]'
sq: SQL query against @tutorial_db failed: SELECT "title" FROM "data" LIMIT 2 OFFSET 0: no such table: data
Because the active source is still @tutorial_db
. To see the active source:
$ sq src
@tutorial_db sqlite3 sakila.db
Let’s switch the active source to the CSV file:
$ sq src @film_csv
@film_csv csv film.csv
Now we can use the shorthand form (omit the @film_csv
handle) and sq
will look for
table .data
in the active
source (which is now @film_csv
):
$ sq '.data | .title | .[0:2]'
title
ACADEMY DINOSAUR
ACE GOLDFINGER
Ping
A useful feature is to ping the sources to verify that they’re accessible:
# Ping sources in the root group, i.e. all sources.
$ sq ping /
@film_csv 0s pong
@tutorial_db 1ms pong
Instead of pinging all the sources in the group, we can specify the sources explicitly:
$ sq ping @film_csv @tutorial_db
@film_csv 0s pong
@tutorial_db 1ms pong
SQL Sources
Having read this far, you can be forgiven for thinking that sq
only deals with file formats such as CSV
or even SQLite, but that is not the case. Let’s add some SQL databases.
First we’ll do Postgres; we’ll start a pre-built Sakila database via docker on port (note that it will take a moment for the Postgres container to start up):
$ docker run -d -p 5432:5432 sakiladb/postgres:latest
Let’s add that Postgres database as a source:
$ sq add 'postgres://sakila:p_ssW0rd@localhost/sakila' --handle @sakila_pg
@sakila_pg postgres sakila@localhost/sakila
If you don’t want to type the password on the command line, use
-p
to be prompted:$ sq add 'postgres://sakila@localhost/sakila' -p Password: [ENTER]
The new source should show up in sq ls
:
@film_csv* csv film.csv
@sakila_pg postgres sakila@localhost/sakila
@tutorial_db sqlite3 sakila.db
Ping the new source just for fun:
$ sq ping @sakila_pg
@sakila_pg 29ms pong
Now we’ll add and start a MySQL instance of Sakila:
$ docker run -d -p 3306:3306 sakiladb/mysql:latest
$ sq add "mysql://sakila:p_ssW0rd@localhost/sakila" --handle @sakila_my
@sakila_my mysql sakila@localhost/sakila
And get some data from @sakila_my
:
$ sq '@sakila_my.actor | .[0:2]'
actor_id first_name last_name last_update
1 PENELOPE GUINESS 2006-02-15 04:34:33 +0000 UTC
2 NICK WAHLBERG 2006-02-15 04:34:33 +0000 UTC
Insert & Modify
In addition to JSON, CSV, etc., sq
can write query results to database tables.
We’ll use the film_category
table as an example: the table is in both @sakila_pg
and @sakila_my
. We’re going to
truncate the table in @sakila_my
and then repopulate via a query against @sakila_pg
.
# First, set the active source to @sakila_my for convenience.
$ sq src @sakila_my
@sakila_my mysql sakila@localhost/sakila
# Now, let's confirm the counts
$ sq '@sakila_pg.film_category | count'
count
1000
$ sq '@sakila_my.film_category | count'
count
1000
Make a copy of the table as a backup.
$ sq tbl copy .film_category .film_category_bak
Copied table: @sakila_my.film_category --> @sakila_my.film_category_bak (1000 rows copied)
Note that the
sq tbl copy
makes use each database’s own table copy functionality. Thussq tbl copy
can’t be used to directly copy a table from one database to another. Butsq
provides a means of doing this, keep reading.
Truncate the @sakila_my.film_category
table:
$ sq tbl truncate @sakila_my.film_category
Truncated 1000 rows from @sakila_my.film_category
$ sq '@sakila_my.film_category | count'
count
0
The @sakila_my.film_category
table is now empty. But we can repopulate it via a query against @sakila_pg
. For this
example, we’ll just do 500
rows.
$ sq '@sakila_pg.film_category | .[0:500]' --insert @sakila_my.film_category
Inserted 500 rows into @sakila_my.film_category
$ sq '@sakila_my.film_category | count'
count
500
We can now use the sq tbl
commands to restore the previous state.
$ sq tbl drop .film_category
Dropped table @sakila_my.film_category
# Restore the film_category table from the backup table we made earlier
$ sq tbl copy .film_category_bak .film_category
Copied table: @sakila_my.film_category_bak --> @sakila_my.film_category (1000 rows copied)
# Verify that the table is restored
$ sq '.film_category | count'
count
1000
# Get rid of the backup table
$ sq tbl drop .film_category_bak
Dropped table @sakila_my.film_category_bak
jq
Note that sq
plays nicely with jq. For example,
list the names of the columns in table @sakila_pg.actor
:
$ sq inspect --json @sakila_pg.actor | jq -r '.columns[] | .name'
actor_id
first_name
last_name
last_update