Roadmap

Tau is in active development. The current release is v0.1.0. Future versions are marked (soft) — they describe intent and scope, not a committed schedule.


v0.1.0 — the engine is complete

The core engine and server are feature-complete and shipping. The data model, query language, storage backends, and simulation testing infrastructure are all in this release.

Engine

  • Half-open interval model with monoid concatenation semantics
  • Newest-layer-wins resolution by monotonic layer ID — deterministic, no configuration
  • O(log n) point lookup per layer; sweep-line normalisation compaction
  • Derived lenses with lazy closure composition and cycle detection at DERIVE time
  • Rolling window aggregations as first-class expression nodes
  • Arc-backed immutable layers — clones are pointer bumps

Storage

  • In-memory and binary disk backends
  • AES-256-GCM encryption at rest; per-entry CRC32 integrity
  • Write-ahead log with per-statement fsync and full WAL replay on startup
  • Schema DDL (CREATE LENS / DERIVE LENS) persisted and replayed
  • WAL checkpoint after compaction

Query language (TauQL)

  • CREATE / DROP / USE DATABASE; SHOW DATABASES / LENSES
  • CREATE / DROP LENS with static types
  • APPEND LENS; COPY LENS FROM for server-side CSV ingest
  • DERIVE LENS AS <expr>: lazy computed lenses with composable closures
  • AT, RANGE [WHERE <expr>], REDUCE USING (min|max|avg|sum|count)
  • Full expression grammar: arithmetic, comparison, logical, unary, rolling aggregations
  • START TRANSACTION / COMMIT / ROLLBACK: per-connection atomic transactions — mutations buffered and invisible until COMMIT; ROLLBACK discards the buffer

Server

  • Line-oriented TCP protocol with shared/exclusive locking
  • TLS (PEM cert/key or ephemeral self-signed)
  • Argon2id authentication; per-database CRUDA grants; wildcard grants
  • CREATE / DROP USER, GRANT / REVOKE, SHOW USERS / GRANTS
  • Connection cap with graceful rejection; per-connection idle timeout
  • GET /healthz liveness probe; Prometheus metrics via --metrics-port

Verification

  • Property-based tests (Hegel/Hypothesis): interval containment, layer lookup, value roundtrip, compaction query-equivalence, permission composition — each checked against hundreds of randomised inputs
  • Deterministic simulation tester (dst): every transport × auth × WAL combination, driven against a reference oracle, with fault injection and reproducible seeds

v0.2.0 (soft) — performance and operability

The engine is correct. v0.2.0 makes it fast enough to benchmark honestly, operable enough to run in production without documentation gaps, and expressive enough to cover real ingest and audit patterns.

Benchmarks

  • Published cargo bench suite using Criterion: AT, RANGE, REDUCE, and APPEND at varying layer counts and dataset sizes
  • Reproducible comparison against InfluxDB 2.x and QuestDB on standard ingest and query workloads, with methodology documented and results checked into the repo
  • Flamegraph-guided profiling; all regressions caught by the bench suite in CI

Query performance

  • Multi-layer merge iterator: single-pass query across N layers instead of N sequential passes — the primary hot path as layer count grows before compaction
  • Write throughput profiling and targeted optimisation of the WAL path

Transactions and batch ingest

  • START TRANSACTION / COMMIT / ROLLBACK: atomic multi-statement transactions — mutations buffered per-connection, invisible until COMMIT, discarded on ROLLBACK; shipped in v0.1.0
  • BATCH APPEND LENS <name> { ... }: single-statement bulk ingest for one lens — a list of intervals inside a block, committed as one layer without round-trip overhead; optimised for high-volume ingest paths

Layer introspection and audit

  • HISTORY LENS <name> [start end]: list all layers covering a time range, with their IDs, write timestamps, and interval coverage — answers "how many corrections have been applied here and when?"
  • AT LENS <name> <t> AS OF <timestamp>: point query against the state of the data as it existed at a given wall-clock time, using write timestamps recorded in the WAL; the user-facing audit API
  • AT LENS <name> <t> LAYER <n>: low-level audit query against a specific layer ID — used for debugging and by the DST

Backup and restore

  • BACKUP DATABASE <name> TO <path>: consistent snapshot — quiesces the WAL, copies the store file and a WAL checkpoint, resumes; atomic from the caller's perspective
  • RESTORE DATABASE <name> FROM <path>: replays a backup into a running server with WAL consistency checks
  • Tested against real failure scenarios: partial backup, interrupted restore, corrupt snapshot

Configuration

  • TOML configuration file replaces long flag lists; all current flags map to config keys
  • Config file is the canonical source of truth; flags override for one-off runs
  • Required groundwork for v0.3.0 cluster configuration

Client

  • Python client (pip install tau-py): connection management, authentication, AT, RANGE, REDUCE, APPEND, BATCH APPEND, and transaction support
  • Thin enough to drive the benchmark suite from a notebook

v0.3.0 (soft) — distributed

v0.3.0 makes Tau a multi-node system. The layer model is already well-suited to replication: layers are append-only, have monotonic IDs, and conflict resolution is a deterministic rule. The main addition is a consensus layer that assigns globally ordered layer IDs and replicates the WAL across nodes.

Replication model

  • Raft consensus via openraft: the WAL maps directly to a Raft replicated log — each WAL entry becomes a Raft log entry, so the distributed storage layer is largely already written
  • The leader node assigns globally monotonic layer IDs; the algebraic properties of the layer model are preserved exactly across the cluster
  • Read replicas serve AT, RANGE, and REDUCE with bounded replication lag; writes always route to the leader
  • CONSISTENCY STRONG query hint routes a read to the leader for linearisable results

Fault tolerance

  • Leader election and automatic failover via Raft
  • Cluster recovers from minority node loss without data loss and without manual intervention
  • The DST is extended to multi-node: simulates leader failures, network partitions, and follower lag, cross-checking every query result against the reference oracle

Cluster management (TauQL)

  • CLUSTER STATUS: node list, leader, replication lag per follower
  • ADD NODE <addr>, REMOVE NODE <id>: online membership changes via Raft joint consensus
  • tau binary gets --cluster, --peers <addr,...>, and --node-id flags; cluster config lives in the TOML config file

Backlog

Items with no assigned version — considered, not yet scheduled.

  • Named timestamp aliases: ISO-8601 or human-readable offsets in AT and RANGE
  • systemd unit file for local deployments
  • man page for the server binary
  • Online schema evolution: rename lens, change type with migration
  • Go client for integration-test ergonomics
  • Grafana data source plugin
  • Prometheus remote write adapter: ingest Prometheus metrics directly into Tau lenses
  • Leaderless replication via hybrid logical clocks: the algebraic approach to multi-master — post-v0.3.0 research direction