...
 
Commits (13)
.es_keywords
.git
.gitignore
R/dev.R
README.org
TODO
TODO.org
inst/other
manual
various
various_scripts
2019-03-21 Enrico Schumann <es@enricoschumann.net>
* DESCRIPTION (Version): 0.6-0
* NAMESPACE: import 'datetimeutils::roundPOSIXt'
* R/functions.R (read_ts_tables): rename
argument 'fread' to 'read.fn', with default
NULL. To use package 'data.table', set it to
"fread".
(read_ts_tables): fixed -- 'drop.weekends'
now works for intraday data
(read_ts_tables): new argument 'frequency',
used only for intraday data
2019-03-11 Enrico Schumann <es@enricoschumann.net>
* R/functions.R (as.ts_table.ts_table): add method
2019-02-14 Enrico Schumann <es@enricoschumann.net>
* DESCRIPTION (Imports, Suggests): move package
DBI to Suggests
2018-11-21 Enrico Schumann <es@enricoschumann.net>
* R/functions.R (write_ts_table): fix 'add' for
......@@ -15,7 +39,7 @@
2018-03-21 Enrico Schumann <es@enricoschumann.net>
* R/functions.R (write_ts_table): with
replace.file set to TRUE, check whether files
'replace.file' set to TRUE, check whether files
exists before removal (this avoids the warning
about non-existing files)
......@@ -23,6 +47,12 @@
* inst/tests/write_read.R: add timing test
2018-02-01 Enrico Schumann <es@enricoschumann.net>
* R/functions.R (read_ts_tables): experimental
new argument 'fread'; if TRUE, files are read
with data.table::fread instead of read.table
2017-12-11 Enrico Schumann <es@enricoschumann.net>
* NAMESPACE: export 'ttime'
......@@ -41,7 +71,7 @@
2017-10-24 Enrico Schumann <es@enricoschumann.net>
* DESCRIPTION (Version): Version: 0.5-0
* DESCRIPTION (Version): 0.5-0
* R/functions.R (ts_table): store timestamp as
numeric (which reverts the change introduced in
......
......@@ -2,16 +2,17 @@ Package: tsdb
Type: Package
Title: Terribly-Simple Data Base for Time Series
Version: 0.6-0
Date: 2018-11-21
Date: 2019-03-21
Maintainer: Enrico Schumann <es@enricoschumann.net>
Authors@R: person(given = "Enrico", family = "Schumann",
role = c("aut", "cre"),
email = "es@enricoschumann.net",
comment = c(ORCID = "0000-0001-7601-6576"))
Description: A terribly-simple data base for numeric time
series. All series are saved as csv files. The package
offers utilities for saving files in a standardised
format, and for retrieving and joining data.
Description: A terribly-simple data base for numeric
time series. Series are stored in CSV format. The
package offers utilities for saving series in a
standardised format, and for retrieving and joining
data.
License: GPL-3
Imports: DBI, datetimeutils, fastmatch, utils, zoo
Suggests: MonetDBLite
Imports: datetimeutils, fastmatch, utils, zoo
Suggests: DBI, MonetDBLite, data.table
......@@ -7,17 +7,10 @@ export(
write_ts_table
)
importFrom("DBI",
"dbConnect",
"dbDisconnect",
"dbGetQuery",
"dbQuoteIdentifier",
"dbWriteTable"
)
importFrom("datetimeutils",
"is_businessday",
"previous_businessday")
"previous_businessday",
"roundPOSIXt")
importFrom("fastmatch",
"fmatch")
......@@ -28,6 +21,12 @@ importFrom("utils",
importFrom("zoo",
"zoo", "coredata", "index", "as.zoo")
S3method(print, ts_table)
S3method(as.ts_table, ts_table)
S3method(as.ts_table, zoo)
S3method(as.data.frame, ts_table)
S3method(as.matrix, ts_table)
S3method(as.zoo, ts_table)
S3method(print, file_info)
S3method(print, ts_table)
v0.6-0 (2019-03-21)
o fixed: 'write_ts_table' with option 'add' would
not rewrite (i.e. delete) data before 1 Jan 1970
o write_ts_table: new argument 'replace.file'
o write_ts_table: scientific notation is no longer
suppressed, i.e. numbers may now be written as
e.g. 1e10
o read_ts_tables: new arguments 'read.fn' and
'frequency'
o read_ts_tables: 'return.class' may also be
'ts_table'
o function 'ttime' is now exported
o there are public repositories at
https://github.com/enricoschumann/tsdb and
https://gitlab.com/enricoschumann/tsdb
v0.5-0 (2017-10-24)
o fixed: write_ts_table does now also write empty
files
o write_ts_table: first argument has been renamed 'ts'
o new function 'file_info'
o read_ts_tables: rename argument 'column.name' to
'column.names' (plural)
o new ts_table method for as.matrix
v0.4-1 (2017-02-06)
o 'read_ts_tables' has a new argument 'column.name',
......
This diff is collapsed.
......@@ -114,18 +114,19 @@
* About tsdb
A terribly-simple data base for numeric time series. All series
are saved as CSV files. The package offers utilities
for saving files in a standardised format, and for
retrieving and joining data.
A terribly-simple data base for numeric time
series. Series are stored in CSV format. The package
offers utilities for saving series in a standardised
format, and for retrieving and joining data.
** Good things about tsdb
- no setup needed, no system dependencies
(i.e. external software, such as a database)
- completely portable; moving from one computer to
another requires no effort (the only thing to take
care of is file encoding)
another requires no effort other than copying the
files (the only thing to take care of is file
encoding if non-ASCII column names are used)
- data usable by other software
......@@ -172,7 +173,7 @@ ts
: 5 rows [2016-01-01 -> 2016-01-05]: A
Note that we had to provide a column name (=A=) for the
data. That is not optional. It is one of the things
data. This is not optional. It is one of the things
that =ts_table= enforces. Another is that timestamps
need to be of class =Date= or =POSIXct=.
......@@ -199,7 +200,8 @@ The written file will look like this:
You may notice that the dates have been replaced by
numbers. The mapping between these numbers and calendar
times is described later, when we discuss the
representation of timestamps.
representation of timestamps. (But if you can't wait:
it is the number of days since 1 January 1970.)
Let us write a second file. This time, we use
=ts_table= directly.
......@@ -268,6 +270,7 @@ The written file looks like this:
16810,10,20
#+END_EXAMPLE
** Reading data
Use the function =read_ts_tables=.
......@@ -324,8 +327,9 @@ More convenient may be to specify a =return.class=.
But wait. We provided and wrote to the file values for
1 January to 5 January. But we only got values for 1, 4
and 5 January. The reason is that tsdb was written with
financial data in mind, and on weekends there are no prices.
and 5 January. The reason is that =tsdb= was written
with financial data in mind, and on weekends there are
no prices.
#+BEGIN_SRC R :session *R* :results output :exports both
weekdays(as.Date("2016-1-1")+0:4)
#+END_SRC
......@@ -379,9 +383,10 @@ a single table, but we read tables.
10 2016-01-10 NA 20
#+end_example
The column names of the returned object consist of the filepaths and
the column, which may be more information than we actually want. The
argument =column.name= specifies the format; its default is
The column names of the returned object consist of the
filepaths and the column, which may be more information
than we actually want. The argument =column.name=
specifies the format; its default is
=%dir%/%file%::%column%=.
#+BEGIN_SRC R :session *R* :results output :exports both
read_ts_tables(c("example1", "example2"),
......@@ -408,8 +413,8 @@ argument =column.name= specifies the format; its default is
#+end_example
Missing values are by default set to =NA=. That happens even for
missing columns, with a warning, though.
Missing values are by default set to =NA=. That happens
even for missing columns, with a warning though.
#+BEGIN_SRC R :session *R* :results output :exports both
read_ts_tables(c("example1", "example2"),
dir = "~/tsdb/daily",
......@@ -447,7 +452,7 @@ In read_ts_tables(c("example1", "example2"), dir = "~/tsdb/daily", :
(objects of class =ts_table=). A =ts_table= is a
numeric matrix, so there
is always a =dim= attribute. For a time-series table
=x=, you get the number of observations with =dim(x)[[1]]=.
=x=, you get the number of observations with =dim(x)[1L]=.
Attached to this matrix are several attributes:
......@@ -458,8 +463,8 @@ In read_ts_tables(c("example1", "example2"), dir = "~/tsdb/daily", :
- columns :: a character vector that provides the
columns names
(There may be other attributes as well, but these three
are always present.)
There may be other attributes as well, but these three
are always present.
A =ts_table= is not meant as a time-series class. For
most computations (plotting, calculation of statistics,
......@@ -467,15 +472,15 @@ etc), the =ts_table= must first be coerced to =zoo=, =xts=,
a data-frame or a similar data structure. Methods that
perform such coercions are responsible for converting
the numeric timestamp vector to an actual
timestamp. For this, they may use the internal function
timestamp. For this, they may use the function
=ttime=, whose pronounciation may remind you of a hot
beverage, but which really stands for =translate time=.
** The file format
tsdb can store and load time-series data. The format
it uses is plain CSV; a sample file may look as
=tsdb= can store and load time-series data. The format
it uses is plain CSV. A sample file may look as
follows:
#+BEGIN_EXAMPLE
......@@ -491,19 +496,19 @@ beverage, but which really stands for =translate time=.
names of the columns, with the first column always
being named =timestamp=.
The advantage of this plain format is that the data are
in no way dependent on =tsdb=. The files can be used
and manipulated by other software as well.
The advantage of this plain format is that the data
are in no way dependent on =tsdb=. The files can be
used and manipulated by other software as well.
** Timestamps
:PROPERTIES:
:CUSTOM_ID: sec:timestamps
:CUSTOM_ID: timestamps
:END:
Two types of timestamps are supported: =Date= and
=POSXIct=. As part of a =ts_table=, timestamps are
always stored in their numeric representation: Daily
always stored in their numeric representation: daily
timestamps are represented as the number of days
since 1 Jan 1970; intraday timestamps are the number
of seconds since 1 Jan 1970.
......@@ -2,9 +2,9 @@
test.ts_table <- function() {
## require("RUnit")
## require("tsdb")
## require("zoo")
require("RUnit")
require("tsdb")
require("zoo")
y <- ts_table(11:15, as.Date("2016-1-1")-5:1, "close")
checkEquals(y,
......@@ -50,9 +50,9 @@ test.ts_table <- function() {
test.read_ts_tables <- function() {
## require("RUnit")
## require("tsdb")
## require("zoo")
library("RUnit")
library("tsdb")
library("zoo")
x <- ts_table(data = 11:15,
timestamp = as.Date("2016-1-1") + 1:5,
columns = "A")
......@@ -73,12 +73,40 @@ test.read_ts_tables <- function() {
structure(c(11, 12, 13, 14, 15, 6, 7, 8, 9, 10),
.Dim = c(5L, 2L)))
## check POSIXct
z1 <- ts_table(11:15,
as.POSIXct("2016-1-1 10:00:00", tz = "UTC")+0:4,
"close")
write_ts_table(z1, dir, "X1")
z2 <- ts_table(1:5,
as.POSIXct("2016-1-1 10:00:00", tz = "UTC")+1:5,
"close")
write_ts_table(z2, dir, "X2")
z12 <- read_ts_tables(c("X1", "X2"), dir, columns = "close",
start = as.POSIXct("2016-1-1 10:00:00", tz = "UTC"),
end = as.POSIXct("2016-1-1 10:00:20", tz = "UTC"))
checkEquals(z12$data,
structure(c(11, 12, 13, 14, 15, NA,
NA, 1, 2, 3, 4, 5),
.Dim = c(6L, 2L)))
checkEquals(z12$timestamp,
as.POSIXct("2016-1-1 10:00:00", tz = "UTC")+0:5)
z12 <- read_ts_tables(c("X1", "X2"), dir, columns = "close",
start = "2016-1-1 11:00:00",
end = "2016-1-1 11:00:20")
checkEquals(z12$data,
structure(c(11, 12, 13, 14, 15, NA,
NA, 1, 2, 3, 4, 5),
.Dim = c(6L, 2L)))
checkEquals(z12$timestamp,
as.POSIXct("2016-1-1 10:00:00", tz = "UTC")+0:5)
## z1 <- ts_table(11:15, as.POSIXct("2016-1-1 10:00:00", tz = "UTC")+0:4, "close")
## write_ts_table(z1, dir, "X1")
## z2 <- ts_table(1:5, as.POSIXct("2016-1-1 10:00:00", tz = "UTC")+1:5, "close")
## write_ts_table(z2, dir, "X2")
## read_ts_tables(c("X1", "X2"), dir, columns = "close")
## check empty file
......@@ -87,6 +115,11 @@ test.read_ts_tables <- function() {
checkEquals(em$timestamp, structure(numeric(0), class = "Date"))
checkEquals(em$data, structure(numeric(0), .Dim = 0:1))
}
test.write_ts_table <- function() {
......
......@@ -13,7 +13,6 @@
\usage{
file_info(dir, file)
}
%- maybe also 'usage' for other objects documented here.
\arguments{
\item{dir}{
character
......@@ -24,13 +23,23 @@ file_info(dir, file)
}
\details{
Experimental.
Provide information, such as number of entries, of
specified files.
It is recommended that code that uses the returned
information to alter or write tables, should explicitly
check whether a table exists (column \code{exists} in
the returned \code{\link{data.frame}}). For instance,
a value of \code{\link{NA}} for \code{min.timestamp}
would occur for a non-existing file, but also if the
file could not be read for some reason.
}
\value{
An object of type \code{file_info}, which is a
\code{data.frame}.
\code{data.frame} with information such as whether a
file exists, minimum and maximum timestamp, and more.
}
\author{
......@@ -41,6 +50,5 @@ file_info(dir, file)
}
\examples{
\dontrun{
file_info(dir = "/tsdb",
c("table1", "table2"))}
file_info(dir = "/tsdb", c("table1", "table2"))}
}
......@@ -12,7 +12,9 @@ read_ts_tables(file, dir, t.type = "guess",
return.class = NULL,
drop.weekends = TRUE,
column.names = "\%dir\%/\%file\%::\%column\%",
backend = "csv")
backend = "csv",
read.fn = NULL,
frequency = "1 sec")
}
\arguments{
\item{file}{
......@@ -34,8 +36,12 @@ read_ts_tables(file, dir, t.type = "guess",
character. \bold{Currently only a single column is supported.}
}
\item{return.class}{
NULL or character
}
\code{NULL} (default) or character: if \code{NULL}, a list is
returned. Also supported are \code{zoo},
\code{\link{data.frame}} and \code{\link{ts_table}}.
}
\item{drop.weekends}{
logical
}
......@@ -46,9 +52,19 @@ read_ts_tables(file, dir, t.type = "guess",
}
\item{backend}{
character: currently, \code{csv} and \code{monetdb} are supported
character: currently, only \code{csv} is fully supported
}
\item{read.fn}{
\code{NULL} or character: use \sQuote{\code{fread}}
to use \code{\link[data.table]{fread}}
from package \pkg{data.table}
}
\item{frequency}{
character: only used when \code{t.type} is
\code{POSIXct} (or guessed to be \code{POSIXct})
}
}
\details{
Read time-series data from csv files.
......@@ -69,7 +85,8 @@ read_ts_tables(file, dir, t.type = "guess",
}
\examples{
\dontrun{
read_ts_tables(c("table1", "table2"), dir = "/tsdb",
read_ts_tables(c("table1", "table2"),
dir = "~/tsdb",
columns = "close",
return.class = "zoo")}
}