A TypeScript library for efficient data serialization, achieved by separating data structure from their values and storing each in compact binary formats.
Most modern data serialization formats fall into one of two categories:
The aim of this project is to store values with the flexibility of JSON but the byte efficiency of dedicated formats.
To accomplish this, custom "types" are constructed, implicitly defining binary serialization formats for specific categories of values.
Types are defined from a rich set of primitive and compound types (see Data Types).
There is a fixed (universal) serialization format for types, and each type defines a serialization format for its values.
structure-bytes reduces the number of bytes needed to serialize values in several ways in comparison to JSON:
sb.IntType.structure-bytes serializes the type of a collection's elements as part of the type rather than the value, so it isn't repeated.This project is somewhat similar to Google's Protocol Buffers, but was not designed to match its functionality.
structure-bytes to eliminate..proto definition files, which are not designed to be transmitted like values.) This compactly stores complex types and allows values to be written along with their types so they can be read without knowing their types beforehand..proto files. This allows, for example, for a function which turns a type into another type that either contains an error message or an instance of the original type.structure-bytes provides a larger set of primitive and compound types.Byte (1-byte signed integer)Short (2-byte signed integer)Int (4-byte signed integer)Long (8-byte signed integer)BigInt (a signed integer with arbitrary precision)FlexInt (a signed integer from -(2 ** 52) to 2 ** 52 with a variable-length representation)UnsignedByte (1-byte unsigned integer)UnsignedShort (2-byte unsigned integer)UnsignedInt (4-byte unsigned integer)UnsignedLong (8-byte unsigned integer)BigUnsignedInt (an unsigned integer with arbitrary precision)FlexUnsignedInt (an unsigned integer below 2 ** 53 with a variable-length representation)Date (8-byte signed integer representing number of milliseconds since Jan 1, 1970)Day (3-byte signed integer representing a specific day in history)Time (4-byte unsigned integer representing a specific time of day)Float (IEEE 32-bit floating-point number)Double (IEEE 64-bit floating-point number)Boolean (a single true or false value)BooleanTuple (a constant-length array of Booleans)BooleanArray (a variable-length array of Booleans)Char (a single UTF-8 character)String (UTF-8-encoded text)Octets (an ArrayBuffer (raw binary data))Tuple<Type> (a constant-length array of Types)Struct (an object with a fixed set of fields, each with a fixed name and type)Array<Type> (a variable-length array of Types)Set<Type> (like an Array, except creates a set when read)Map<KeyType, ValueType> (a mapping of KeyType instances to ValueType instances)Enum<Type> (a fixed set of Types; useful when only a small subset of Type instances represent possible values, especially with Strings)Choice (a fixed set of types that values can take on)Recursive<Type> (a type that can reference itself and be used to serialize circular data structures)Singleton<Type> (a type that serializes only one value)Optional<Type> (either null or undefined or an instance of Type)Pointer<Type> (allows multiple instances of Type with the same serialization to be stored only once)The docs folder is hosted at
https://calebsander.github.io/structure-bytes.
const sb = require('structure-bytes')
const colorType = new sb.TupleType({
type: new sb.UnsignedByteType,
length: 3
})
const routeAttributesType = new sb.StructType({
color: colorType,
description: new sb.PointerType(new sb.StringType),
direction_names: new sb.PointerType(
new sb.ArrayType(new sb.StringType)
),
long_name: new sb.StringType,
text_color: colorType,
type: new sb.UnsignedByteType
})
const routesType = new sb.ArrayType(
new sb.StructType({
id: new sb.StringType,
attributes: routeAttributesType
})
)
const sb = require('structure-bytes')
const colorType = new sb.TupleType({
type: new sb.UnsignedByteType,
length: 3
})
const routeAttributesType = new sb.StructType({
color: colorType,
description: new sb.PointerType(new sb.StringType),
direction_names: new sb.PointerType(
new sb.ArrayType(new sb.StringType)
),
long_name: new sb.StringType,
text_color: colorType,
type: new sb.UnsignedByteType
})
const routesType = new sb.ArrayType(
new sb.StructType({
id: new sb.StringType,
attributes: routeAttributesType
})
)
const typeBuffer = routesType.toBuffer()
// ArrayBuffer { byteLength: 92 }
console.log(Buffer.from(typeBuffer))
// <Buffer 52 51 02 0a 61 74 74 72 69 62 75 74 65 73 51 06 05 63 6f 6c 6f 72 50 11 03 0b 64 65 73 63 72 69 70 74 69 6f 6e 70 41 0f 64 69 72 65 63 74 69 6f 6e 5f ... >
const toRGB = hex => [
parseInt(hex.slice(0, 2), 16),
parseInt(hex.slice(2, 4), 16),
parseInt(hex.slice(4, 6), 16)
]
fetch('https://api-v3.mbta.com/routes?filter[type]=0,1')
.then(res => res.json())
.then(({data}) => {
const routes = data.map(({id, attributes}) => {
delete attributes.short_name
delete attributes.sort_order
attributes.color = toRGB(attributes.color)
attributes.text_color = toRGB(attributes.text_color)
return {id, attributes}
})
const valueBuffer = routesType.valueBuffer(routes)
// ArrayBuffer { byteLength: 306 }
// (for comparison, JSON.stringify(routes).length == 1498)
})
const sb = require('structure-bytes')
// Buffer obtained somehow
const typeBuffer = new Uint8Array([0x50, 0x11, 0x03]).buffer
const type = sb.r.type(typeBuffer)
console.log(type)
// TupleType { type: UnsignedByteType {}, length: 3 }
// Buffer obtained somehow
const valueBuffer = new Uint8Array([0x00, 0x80, 0xFF]).buffer
console.log(type.readValue(valueBuffer))
// [ 0, 128, 255 ]
The I/O functions are implemented with callbacks, but can all be used with util.promisify() if you prefer to use Promises instead.
See the documentation for examples.
I recommend the file extensions .sbt for a serialized type, .sbv for a serialized value, and .sbtv for a serialized type and value.
const fs = require('fs')
const sb = require('structure-bytes')
const type = new sb.EnumType({
type: new sb.StringType,
values: [
'ON_TIME',
'LATE',
'CANCELLED',
'UNKNOWN'
]
})
sb.writeType({
type,
outStream: fs.createWriteStream('status.sbt')
}, err => {
sb.readType(fs.createReadStream('status.sbt'), (err, readType) =>
console.log(readType)
/*
EnumType {
type: StringType {},
values: [ 'ON_TIME', 'LATE', 'CANCELLED', 'UNKNOWN' ] }
*/
)
})
const value = 'CANCELLED'
sb.writeValue({
type,
value,
outStream: fs.createWriteStream('status.sbv')
}, err => {
sb.readValue({
type,
inStream: fs.createReadStream('status.sbv')
}, (err, value) =>
console.log(value)
// 'CANCELLED'
)
})
sb.writeTypeAndValue({
type,
value,
outStream: fs.createWriteStream('status.sbtv')
}, err => {
sb.readTypeAndValue(fs.createReadStream('status.sbtv'), (err, type, value) =>
console.log(value)
// 'CANCELLED'
)
})
Server-side:
const http = require('http')
const sb = require('structure-bytes')
const type = new sb.DateType
http.createServer((req, res) => {
sb.httpRespond({req, res, type, value: new Date}, err =>
console.log('Responded')
)
}).listen(80)
Client-side:
<script src = '/structure-bytes/compiled/download.js'></script>
<script>
// 'date' specifies the name of the type being transferred so it can be cached for future requests
sb.download({
name: 'date',
url: '/',
options: {} // optional options to pass to fetch() (e.g. cookies, headers, etc.)
})
.then(value =>
console.log(value.getFullYear())
// 2017
)
</script>
Server-side:
const http = require('http')
const sb = require('structure-bytes')
http.createServer((req, res) => {
sb.readValue({type: new sb.FlexUnsignedIntType, inStream: req}, (err, value) => {
res.end(String(value))
})
}).listen(80)
Client-side:
<script src = '/structure-bytes/compiled/upload.js'></script>
<button onclick = 'upload()'>Click me</button>
<script>
let clickCount = 0
function upload() {
clickCount++
sb.upload({
type: new sb.FlexUnsignedIntType,
value: clickCount,
url: '/click',
options: {method: 'POST'} // options to pass to fetch(); method 'POST' is required
})
.then(response => response.text())
.then(alert)
}
</script>
The entire project is written in TypeScript and declaration files are automatically generated. That means that if you are using TypeScript, the compiler can automatically infer the type of the various values exported from structure-bytes. To import the package, use:
import * as sb from 'structure-bytes'
The typings allow the compiler to automatically infer what types of values a Type can serialize. For example:
const type = new sb.StructType({
abc: new sb.DoubleType,
def: new sb.MapType(
new sb.StringType,
new sb.DayType
)
})
type.valueBuffer(/*...*/)
/*
If you hover over "valueBuffer", you can see
that it requires the value to be of type:
{ abc: number | string, def: Map<string, Date> }
*/
You can also add explicit VALUE generics to make the compiler check that your Type serializes the correct types of values.
Many non-primitive types (e.g. ArrayType, MapType, StructType) can take either one or two generics.
The first, VALUE, specifies the type of values the type can serialize.
The second, READ_VALUE, specifies the type of values the type will deserialize and defaults to VALUE if omitted.
READ_VALUE must extend VALUE.
Generally the two generics are the same, except in the case of numeric types (which write number | string but read number) and OptionalType (which writes E | null | undefined but reads E | null).
With StructType:
class Car {
constructor(
public make: string,
public model: string,
public year: number
) {}
}
const carType = new sb.StructType<Car>({
make: new sb.StringType,
model: new sb.StringType,
year: new sb.UnsignedShortType
})
// The compiler would have complained if one of the fields were missing
// or one of the field's types didn't match the type of the field's values,
// e.g. "make: new sb.BooleanType"
If you have transient fields (i.e. they shouldn't be serialized), you can create a separate interface for the fields that should be serialized:
interface SerializedCar {
make: string
model: string
year: number
}
class Car implements SerializedCar {
public id: number // transient field
constructor(public make: string, public model: string, public year: number) {
this.id = Math.floor(Math.random() * 1e6)
}
}
const carType = new sb.StructType<SerializedCar>({
make: new sb.StringType,
model: new sb.StringType,
year: new sb.UnsignedShortType
})
With ChoiceType, you can let the type be inferred automatically if each of the possible types writes the same type of values:
const choiceType = new sb.ChoiceType([ // Type<number | string, number>
new sb.ByteType,
new sb.ShortType,
new sb.IntType,
new sb.DoubleType
])
However, if the value types are not the same, TypeScript will complain about them not matching, and you should express the union type explicitly:
interface RGB {
r: number
g: number
b: number
}
interface HSV {
h: number
s: number
v: number
}
type CSSColor = string
const choiceType = new sb.ChoiceType<RGB | HSV | CSSColor>([
new sb.StructType<RGB>({
r: new sb.FloatType,
g: new sb.FloatType,
b: new sb.FloatType
}),
new sb.StructType<HSV>({
h: new sb.FloatType,
s: new sb.FloatType,
v: new sb.FloatType
}),
new sb.StringType
])
A RecursiveType has no way to infer its value type, so you should always provide the VALUE generic:
interface Cons<A> {
head: A
tail: List<A>
}
interface List<A> {
list: Cons<A> | null // null for empty list
}
const recursiveType = new sb.RecursiveType<List<string>>('linked-list')
recursiveType.setType(new sb.StructType<List<string>>({
list: new sb.OptionalType(
new sb.StructType<Cons<string>>({
head: new sb.StringType,
tail: recursiveType
})
)
}))
recursiveType.valueBuffer({
list: {
head: '1',
tail: {
list: {
head: '2',
tail: {list: null}
}
}
}
})
When reading types from buffers and streams, they will be of type Type<any>.
You should specify their value types before using them to write values:
const booleanType = new sb.BooleanType
const readType = sb.r.type(booleanType.toBuffer()) // Type<any>
// Will throw a runtime error
readType.valueBuffer('abc')
//vs.
const castReadType: sb.Type<boolean> = sb.r.type(booleanType.toBuffer())
// Will throw a compiler error
castReadType.valueBuffer('abc')
It may also sometimes be useful to be more specific about what types of values you want to be able to serialize. For example, if you want to serialize integer values that will always be represented as numbers (and never in string form), you can force the compiler to error out if you try to serialize a string value with the following:
const intType: sb.Type<number> = new sb.IntType
// Now this is valid:
intType.valueBuffer(100)
// But this is not, even though it would be if you omitted the type annotation on intType:
intType.valueBuffer('100')
In the following definitions, flexInt means a variable-length unsigned integer with the following format, where X represents either 0 or 1:
[0b0XXXXXXX] stores values from 0 to 2**7 - 1 in their unsigned 7-bit integer representations[0b10XXXXXX, 0bXXXXXXXX] stores values from 2**7 to 2**7 + 2**14 - 1, where a value x is encoded into the unsigned 14-bit representation of x - 2**7[0b110XXXXX, 0bXXXXXXXX, 0bXXXXXXXX] stores values from 2**7 + 2**14 to 2**7 + 2**14 + 2**21 - 1, where a value x is encoded into the unsigned 21-bit representation of x - (2**7 + 2**14)All numbers are stored in big-endian format.
The binary format of a type contains a byte identifying the class of the type followed by additional information to describe the type, if necessary.
For example, new sb.UnsignedIntType translates into [0x13], and new sb.StructType({abc: new sb.ByteType, def: new sb.StringType}) translates into:
[
0x51 /*StructType*/,
2 /*2 fields*/,
/*first field's name*/, 0x61 /*a*/, 0x62 /*b*/, 0x63 /*c*/, 0, 0x01 /*ByteType*/,
/*second field's name*/, 0x64 /*d*/, 0x65 /*e*/, 0x66 /*f*/, 0, 0x41 /*StringType*/
]
If the type has already been written to the buffer, it is also valid to serialize the type as:
0xFFoffset ([position of first byte of offset in buffer] - [position of type in buffer]) - flexIntFor example:
const someType = new sb.TupleType({
type: new sb.FloatType,
length: 3
})
const type = new sb.StructType({
one: someType,
two: someType
})
// type serializes to
[
0x51 /*StructType*/,
2 /*2 fields*/,
/*first field's name*/, 0x6f /*o*/, 0x6e /*n*/, 0x65 /*e*/, 0,
0x50 /*TupleType*/,
0x20 /*FloatType*/,
3 /*3 floats in the tuple*/,
/*second field's name*/, 0x74 /*t*/, 0x77 /*w*/, 0x6f /*o*/, 0,
0xff, /*type is defined previously*/
8 /*type is defined 8 bytes before this byte*/
]
In the following definitions, type means the binary type format.
ByteType: identifier 0x01ShortType: identifier 0x02IntType: identifier 0x03LongType: identifier 0x04BigIntType: identifier 0x05FlexIntType: identifier 0x07UnsignedByteType: identifier 0x11UnsignedShortType: identifier 0x12UnsignedIntType: identifier 0x13UnsignedLongType: identifier 0x14BigUnsignedIntType: identifier 0x15FlexUnsignedIntType: identifier 0x17DateType: identifier 0x1ADayType: identifier 0x1BTimeType: identifier 0x1CFloatType: identifier 0x20DoubleType: identifier 0x21BooleanType: identifier 0x30BooleanTupleType: identifier 0x31, payload:length - flexIntBooleanArrayType: identifier 0x32CharType: identifier 0x40StringType: identifier 0x41OctetsType: identifier 0x42TupleType: identifier 0x50, payload:elementType - typelength - flexIntStructType: identifier 0x51, payload:fieldCount - flexIntfieldCount instances of field:name - a serialized StringType valuefieldType - typeArrayType: identifier 0x52, payload:elementType - typeSetType: identifier 0x53, payload identical to ArrayType:elementType - typeMapType: identifier 0x54, payload:keyType - typevalueType - typeEnumType: identifier 0x55, payload:valueType - typevalueCount - flexIntvalueCount instances of value:value - a value that conforms to valueTypeChoiceType: identifier 0x56, payload:typeCount - flexInttypeCount instances of possibleType:possibleType - typeRecursiveType: identifier 0x57, payload:recursiveID (an identifier unique to this recursive type in this type buffer) - flexIntrecursiveType (the type definition of this type) - typeSingletonType: identifier 0x59, payload:valueType - typevalue - a value that conforms to valueTypeOptionalType: identifier 0x60, payload:typeIfNonNull - typePointerType: identifier 0x70, payload:targetType - typeByteType: 1-byte integerShortType: 2-byte integerIntType: 4-byte integerLongType: 8-byte integerBigIntType:byteCount - flexIntnumber - byteCount-byte integerFlexIntType: flexInt of value * 2 if value is non-negative, -2 * value - 1 if value is negativeUnsignedByteType: 1-byte unsigned integerUnsignedShortType: 2-byte unsigned integerUnsignedIntType: 4-byte unsigned integerUnsignedLongType: 8-byte unsigned integerBigUnsignedIntType:byteCount - flexIntnumber - byteCount-byte unsigned integerFlexUnsignedIntType: flexIntDateType: 8-byte signed integer storing milliseconds in Unix timeDayType: 3-byte signed integer storing days since the Unix time epochTimeType: 4-byte unsigned integer storing milliseconds since the start of the dayFloatType: single precision (4-byte) IEEE floating pointDoubleType: double precision (8-byte) IEEE floating pointBooleanType: 1-byte value, either 0x00 for false or 0xFF for trueBooleanTupleType: ceil(length / 8) bytes, where the nth boolean is stored at the (n % 8)th MSB (0-indexed) of the floor(n / 8)th byte (0-indexed)BooleanArrayType:length - flexIntbooleans - ceil(length / 8) bytes, where the nth boolean is stored at the (n % 8)th MSB (0-indexed) of the floor(n / 8)th byte (0-indexed)CharType: UTF-8 codepoint (somewhere between 1 and 4 bytes long)StringType:string - a UTF-8 string of any length not containing '\0'0x00 to mark the end of the stringOctetsType:length - flexIntoctets - length bytesTupleType:length values serialized by elementTypeStructType:fieldTypeArrayType:length - flexIntlength values serialized by elementTypeSetType:size - flexIntsize values serialized by elementTypeMapType:size - flexIntsize instances of keyValuePair:key - value serialized by keyTypevalue - value serialized by valueTypeEnumType:index of value in values array - flexIntChoiceType:index of type in possible types array - flexIntvalue - value serialized by specified typeRecursiveType:valueNotYetWrittenInBuffer - byte containing either 0x00 or 0xFFvalueNotYetWrittenInBuffer:value - value serialized by recursiveTypeoffset ([position of first byte of offset in buffer] - [position of value in buffer]) - flexIntSingletonType:OptionalType:valueIsNonNull - byte containing either 0x00 or 0xFFvalueIsNonNull:value - value serialized by typeIfNonNullPointerType:offset - flexInt:0offset] - [position of first byte of offset in the last instance of these value bytes])offset is 0 (i.e. this is the first instance):targetTypeVersions will be of the form x.y.z. They are in the semver format:
x is the major release; changes to it represent significant or breaking changes to the API, or to the type or value binary specification.y is the minor release; changes to it represent new features that do not break backwards compatibility.z is the patch release; changes to it represent bug fixes that do not change the documented API.To test the Node.js code, run npm test.
To test the HTTP transaction code, run node client-test/server.js and open localhost:8080 in your browser. Open each link in a new page. Upload and Download should each alert Success, while Upload & Download should alert Upload: Success and Download: Success.
Caleb Sander, 2017
Generated using TypeDoc