Tab-separated values
Tab-separated values (TSV) is a simple, text-based file format for storing tabular data.[1] Records are separated by newlines, and values within a record are separated by tab characters. The TSV format is thus a delimiter-separated values format, similar to comma-separated values.
Filename extension | .tsv , .tab |
---|---|
Internet media type |
text/tab-separated-values |
Type of format | multiplatform, serial data streams |
Container for | database information organized as field separated lists |
Standard | IANA MIME type |
TSV is a simple file format that is widely supported, so it is often used in data exchange to move tabular data between different computer programs that support the format. For example, a TSV file might be used to transfer information from a database to a spreadsheet.
Example
The head of the Iris flower data set can be stored as a TSV using the following plain text (note that the HTML rendering may convert tabs to spaces):
Sepal length Sepal width Petal length Petal width Species 5.1 3.5 1.4 0.2 I. setosa 4.9 3.0 1.4 0.2 I. setosa 4.7 3.2 1.3 0.2 I. setosa 4.6 3.1 1.5 0.2 I. setosa 5.0 3.6 1.4 0.2 I. setosa
The TSV plain text above corresponds to the following tabular data:
Sepal length | Sepal width | Petal length | Petal width | Species |
---|---|---|---|---|
5.1 | 3.5 | 1.4 | 0.2 | I. setosa |
4.9 | 3.0 | 1.4 | 0.2 | I. setosa |
4.7 | 3.2 | 1.3 | 0.2 | I. setosa |
4.6 | 3.1 | 1.5 | 0.2 | I. setosa |
5.0 | 3.6 | 1.4 | 0.2 | I. setosa |
Character escaping
The IANA media type standard for TSV achieves simplicity by simply disallowing tabs within fields.[2]
Since the values in the TSV format cannot contain literal tabs or newline characters, a convention is necessary for lossless conversion of text values with these characters. A common convention is to perform the following escapes:[3][4]
escape sequence | meaning |
---|---|
\n |
line feed |
\t |
tab |
\r |
carriage return |
\\ |
backslash |
Another common convention is to use the CSV convention from RFC 4180 and enclose values containing tabs or newlines in double quotes. This can lead to ambiguities.
Another ambiguity is whether records are separated by a line feed, as is typical for Unix platfoms, or a carriage return and line feeds, as is typical for Microsoft platforms. Many programs such as LibreOffice expect a carriage return followed by a newline.
References
- "How To Use Tab Separated Value (TSV) files". International Monetary Fund. Retrieved 2023-02-01.
- "Definition of tab-separated-values (tsv)". Internet Assigned Numbers Authority (IANA).
- "Linear TSV". Data Protocols - Open Knowledge Foundation.
- "jq Manual". stedolan.github.io.
Bibliography
- IANA, Text Media Types, Definition of tab-separated-values (tsv), Paul Lindner, U of MN Internet Gopher Team, June 1993
- Tab Separated Values (TSV): a format for tabular data exchange, Jukka Korpela, created 2000-09-01, last update 2005-02-12.