CSV & friends
The sq
CSV driver implements connectivity for CSV
and variants, such as TSV, pipe-delimited, etc..
Note that the CSV data sources are read-only. That is to say, while you can query the CSV source as if it were a SQL table, you cannot insert values into the CSV source.
Add source
When adding a CSV source via sq add
, the location string is simply the filepath.
For example:
$ sq add ./actor.csv
@actor_csv csv actor.csv
You can also pass an absolute filepath (and, in fact, any relative path is expanded to
an absolute path when saved to sq
’s config).
Usually you can omit the --driver=csv
flag, because sq
will inspect the file contents
and figure out that it’s a CSV file. However, it’s safer to explicitly specify the flag.
sq add --driver=csv ./actor.csv
The same is true for TSV files. You can specify the driver explicitly:
$ sq add ./actor.tsv
@actor_tsv tsv actor.tsv
But, if you omit the driver, sq
can generally figure out that it’s a TSV file.
sq add ./actor.tsv
Monotable
sq
considers CSV to be a monotable data source (unlike, say, a Postgres data source, which
obviously can have many tables). Like all other sq
monotable sources,
the source’s data is accessed via the synthetic .data
table. For example:
$ sq @actor_csv.data
actor_id first_name last_name last_update
1 PENELOPE GUINESS 2020-02-15T06:59:28Z
2 NICK WAHLBERG 2020-02-15T06:59:28Z
Delimiters
It’s common to encounter delimiters other than comma. TSV (tab) is the most common, but other
variants exist, e.g. pipe (a|b|c
). Use the --opts delim=DELIM
flag to specify
the delimiter. Because the delimiter is often a shell token (e.g. |
), the delim
option
requires text aliases. For example:
sq add ./actor.csv --opts delim=pipe
The accepted values are:
Delim | Value |
---|---|
comma | , |
space | |
pipe | | |
tab | |
colon | : |
semi | ; |
period | . |
Note:
comma
is the default. You generally never need to specify this.tab
is the delimiter for TSV files. Because this is such a common variant,sq
allows you to specify--driver=tsv
instead. But usuallysq
will figure out that it’s a TSV file. The following are equivalent:$ sq add --driver=tsv ./actor.tsv $ sq add --driver=csv --opts delim=tab ./actor.tsv $ sq add ./actor.tsv
Header row
By default, sq
treats CSV files as raw data files, without a header row. The fields (columns)
are then named A
, B
, C
, etc.
$ sq @actor_noheader_csv.data
A B C D
1 PENELOPE GUINESS 2020-02-15T06:59:28Z
2 NICK WAHLBERG 2020-02-15T06:59:28Z
But often a CSV file will have a header row. For example:
actor_id,first_name,last_name,last_update
1,PENELOPE,GUINESS,2020-02-15T06:59:28Z
2,NICK,WAHLBERG,2020-02-15T06:59:28Z
In that case, use the --header=true
option:
sq add --opts header=true ./actor.csv
Then the CSV header field names will become the column names.
$ sq @actor_csv.data
actor_id first_name last_name last_update
1 PENELOPE GUINESS 2020-02-15T06:59:28Z
2 NICK WAHLBERG 2020-02-15T06:59:28Z
Explicit column names
If the CSV file doesn’t have a header row, you can use the cols
option to provide semantic
column names instead of the default A, B, C
. For example:
$ sq add --opts cols=id,first,last,date ./actor_noheader.csv
@actor_noheader_csv csv actor_noheader.csv
$ sq @actor_noheader_csv.data
id first last date
1 PENELOPE GUINESS 2020-02-15T06:59:28Z
2 NICK WAHLBERG 2020-02-15T06:59:28Z
You can also use the cols
option to override the column names even if the CSV
file does have a header row.
$ sq add --opts 'header=true&cols=id,first,last,date' ./actor.csv
@actor_csv csv actor.csv
$ sq @actor_csv.data
id first last date
1 PENELOPE GUINESS 2020-02-15T06:59:28Z
2 NICK WAHLBERG 2020-02-15T06:59:28Z
--opts cols
.