LakeSync

Introduction

LakeSync — declare what data goes where. The engine handles the rest.

LakeSync is an open-source TypeScript sync engine. Pluggable adapters connect any readable or writable system — Postgres, BigQuery, S3/Iceberg, Jira, Salesforce, or local SQLite. Declarative sync rules define what data flows between them. Every adapter is both a source and a destination. Local SQLite is one consumer among many — data can also materialise into Postgres, MySQL, or BigQuery destination tables.

How It Works

  1. Adapters connect to data sources — Postgres, BigQuery, S3/R2, Jira, Salesforce, or anything you implement the interface for
  2. Sync rules define what data flows to each consumer — bucket-based filtering with eq/in operators and JWT claim references
  3. The gateway evaluates rules and routes data between adapters, with real-time WebSocket broadcast to connected clients
  4. Destinations receive data as queryable tables — local SQLite for offline-capable apps, or Postgres/MySQL/BigQuery via the materialise protocol

Adapters

Two interfaces abstract all data sources. Adapters are both sources and destinations.

AdapterInterfaceDetails
Postgres / MySQLDatabaseAdapterinsertDeltas, queryDeltasSince, getLatestState, ensureSchema
BigQueryDatabaseAdapterIdempotent MERGE inserts, INT64 HLC precision, clustered by table + hlc
S3 / R2 (Iceberg)LakeAdapterputObject, getObject, listObjects, deleteObject — Parquet + Iceberg table format
CustomEitherImplement the interface for any readable data source. CompositeAdapter routes to multiple backends.

Source Connectors

Source connectors poll external APIs on an interval and push changes into the sync gateway. They extend BaseSourcePoller in @lakesync/core, which handles lifecycle, chunked push, and memory-managed ingestion with automatic backpressure.

ConnectorPackageDetails
Jira Cloud@lakesync/connector-jiraIssues, comments, and projects via JQL-filtered polling
Salesforce@lakesync/connector-salesforceAccounts, contacts, opportunities, and leads via SOQL queries
Database (Postgres / MySQL / BigQuery)@lakesync/coreCursor-based or diff-based polling via ConnectorIngestConfig

Key Features

  • Pluggable adaptersDatabaseAdapter for SQL-like sources, LakeAdapter for object storage. Both are bidirectional. Cross-backend flows via sync rules.
  • Materialise protocol — All three database adapters (Postgres, MySQL, BigQuery) implement Materialisable, materialising flushed deltas into queryable destination tables via a generic SqlDialect pattern. Hybrid column model (synced columns + extensible props JSONB). Supports composite primary keys, soft delete (default), and external ID deduplication. Adding a new destination = 4 SQL dialect methods.
  • Source pollingBaseSourcePoller provides lifecycle management, chunked push, and memory-managed ingestion with automatic backpressure and flush. Connectors extend it to poll any external API.
  • Adapter-sourced pull — Pull data from named source adapters (BigQuery, Postgres, etc.) directly into local SQLite. The gateway queries the adapter and applies sync rules before returning filtered deltas.
  • Sync rules DSL — Declarative bucket-based filtering with eq/in/neq/gt/lt/gte/lte operators and jwt: claim references. Pure function evaluation via filterDeltas().
  • Column-level LWW — Conflicts resolved per-column, not per-row. Concurrent edits to different fields never overwrite each other.
  • Real-time sync — WebSocket-based server-initiated broadcast. When any client pushes, others receive deltas in sub-100ms. Auto-reconnect with exponential backoff. HTTP polling as fallback.
  • Offline support — Local SQLite via sql.js WASM. Persistent IndexedDB outbox survives page refreshes and process crashes. Automatic drain on reconnect.
  • Hybrid Logical Clocks — Branded HLCTimestamp bigint (48-bit wall clock + 16-bit counter). Causal ordering with deterministic clientId tiebreaking.
  • Result-based error handling — Public APIs return Result<T, E> instead of throwing.

Packages

PackageDescription
@lakesync/coreHLC, Delta, Result, conflict resolution, sync rules, adapter interfaces (LakeAdapter, DatabaseAdapter, Materialisable), base source poller, connector types
@lakesync/clientLocalDB (sql.js), SyncCoordinator, transports, queues
@lakesync/gatewayIn-memory sync gateway with push/pull protocol
@lakesync/gateway-serverSelf-hosted HTTP + WebSocket gateway server for Node.js and Bun
@lakesync/protoProtocol Buffer serialisation for the sync wire format
@lakesync/adapterStorage adapters — S3/R2, Postgres, MySQL, BigQuery, Composite, FanOut, Lifecycle
@lakesync/connector-jiraJira Cloud source connector — polls issues, comments, and projects
@lakesync/connector-salesforceSalesforce CRM source connector — polls accounts, contacts, opportunities, and leads
@lakesync/parquetParquet file encoding/decoding for delta persistence in Iceberg format
@lakesync/catalogueIceberg REST catalogue client for table metadata and commit operations
@lakesync/compactorBackground compaction, maintenance, and checkpoint generation
@lakesync/analystAnalytical query engine powered by DuckDB-WASM
@lakesync/reactReact hooks — useQuery, useMutation, useSyncStatus, LakeSyncProvider
lakesyncUnified package re-exporting all packages