Transformations & Field Mapping

Overview

The transform engine processes data at the NDJSON (newline-delimited JSON) intermediate representation layer. This means transforms work consistently across all input formats (CSV, XML, JSON, NDJSON) and are applied in a streaming manner without buffering entire datasets.

Key Features

  • 🧮 30+ Compute Functions — String manipulation, math, logic, type conversions
  • 🎯 Type Coercion — Convert strings to numbers, booleans, timestamps
  • 🛡️ Error Policies — Control behavior for missing fields and coercion failures

Basic Usage

import { ConvertBuddy } from "convert-buddy-js";

const buddy = new ConvertBuddy();

const result = await buddy.convert(csvData, {
  outputFormat: "json",
  transform: {
    mode: "replace",
    fields: [
      { targetFieldName: "id", required: true },
      { targetFieldName: "full_name", compute: "concat(first, ' ', last)" },
      { targetFieldName: "age", coerce: { type: "i64" }, defaultValue: 0 },
      { targetFieldName: "email", compute: "lower(trim(email))" },
    ],
    onMissingField: "null",
    onCoerceError: "null",
  },
});

Transform Modes

Replace Mode (default)

mode: "replace" outputs only the fields you explicitly map. All original fields are discarded unless mapped.

// Input:  { "first": "Alice", "last": "Smith", "age": 30, "city": "NYC" }
// Output: { "full_name": "Alice Smith", "age": 30 }

transform: {
  mode: "replace",
  fields: [
    { targetFieldName: "full_name", compute: "concat(first, ' ', last)" },
    { targetFieldName: "age" },
  ],
}

Augment Mode

mode: "augment" preserves all original fields and adds or overwrites the fields you map. Perfect for adding computed columns to existing data.

// Input:  { "name": "Alice", "age": 30 }
// Output: { "name": "Alice", "age": 30, "name_upper": "ALICE", "is_adult": true }

transform: {
  mode: "augment",
  fields: [
    { targetFieldName: "name_upper", compute: "upper(name)" },
    { targetFieldName: "is_adult", compute: "gte(age, 18)" },
  ],
}

Field Mapping

Each field mapping supports multiple options:

type FieldMap = {
  targetFieldName: string;        // Output field name (required)
  originFieldName?: string;       // Input field name (defaults to targetFieldName)
  required?: boolean;             // Fail if missing
  defaultValue?: any;             // Fallback value when missing or null
  coerce?: Coerce;                // Type conversion specification
  compute?: string;               // Expression to compute value
};

Examples

Basic pass-through:

{ targetFieldName: "name" }  // Maps input "name" to output "name"

Rename field:

{ targetFieldName: "fullName", originFieldName: "full_name" }

Required field:

{ targetFieldName: "id", required: true }

Default value:

{ targetFieldName: "status", defaultValue: "active" }

Computed field:

{ targetFieldName: "full_name", compute: "concat(first, ' ', last)" }

Type coercion:

{ targetFieldName: "age", coerce: { type: "i64" } }

Compute Functions

Transform expressions support 30+ built-in functions. All functions are evaluated in Rust/WASM for maximum performance.

String Functions

FunctionDescriptionExample
concat(...)Join strings and values
concat(first, " ", last)
lower(s)Convert to lowercase
lower("HELLO")
"hello"
upper(s)Convert to uppercase
upper("hello")
"HELLO"
trim(s)Remove leading/trailing whitespace
trim("  text  ")
"text"
substring(s, start, end)Extract substring
substring("hello", 0, 3)
"hel"
replace(s, old, new)Replace text
replace("foo bar", "foo", "baz")
len(s)String length
len("hello")
5
starts_with(s, prefix)Check if starts with prefix
starts_with(url, "https")
ends_with(s, suffix)Check if ends with suffix
ends_with(file, ".pdf")
contains(s, substr)Check if contains substring
contains(text, "keyword")
pad_start(s, len, char)Left-pad to length
pad_start("42", 5, "0")
"00042"
pad_end(s, len, char)Right-pad to length
pad_end("x", 3, "_")
"x__"
trim_start(s)Remove leading whitespace
trim_start("  text")
trim_end(s)Remove trailing whitespace
trim_end("text  ")
repeat(s, n)Repeat string n times
repeat("ab", 3)
"ababab"
reverse(s)Reverse string
reverse("hello")
"olleh"

Math Functions

FunctionDescriptionExample
+, -, *, /Arithmetic operators
price * 1.1
round(n)Round to nearest integer
round(3.7)
4
floor(n)Round down
floor(3.9)
3
ceil(n)Round up
ceil(3.1)
4
abs(n)Absolute value
abs(-5)
5
min(...)Minimum value
min(a, b, c)
max(...)Maximum value
max(a, b, c)

Logic & Comparison

FunctionDescriptionExample
if(cond, true_val, false_val)Conditional expression
if(gte(age, 18), "adult", "minor")
not(bool)Boolean NOT
not(active)
eq(a, b)Equals
eq(status, "active")
ne(a, b)Not equals
ne(a, b)
gt(a, b)Greater than
gt(price, 100)
gte(a, b)Greater than or equal
gte(age, 18)
lt(a, b)Less than
lt(score, 50)
lte(a, b)Less than or equal
lte(price, 1000)

Type Checking & Utilities

FunctionDescriptionExample
is_null(val)Check if null
is_null(optional_field)
is_number(val)Check if number
is_number(val)
is_string(val)Check if string
is_string(val)
is_bool(val)Check if boolean
is_bool(val)
to_string(val)Convert to string
to_string(42)
"42"
parse_int(s)Parse string to integer
parse_int("123")
123
parse_float(s)Parse string to float
parse_float("3.14")
3.14
coalesce(...)First non-null value
coalesce(alt1, alt2, "default")
default(val, fallback)Fallback for null
default(optional, "N/A")

Complex Expressions

Functions can be nested and combined:

// Nested functions
    compute: "upper(trim(concat(first, ' ', last)))"

    // Arithmetic with functions
    compute: "round((price - discount) * 1.1)"

    // Conditional logic with comparisons
    compute: "if(gte(age, 18), 'adult', if(gte(age, 13), 'teen', 'child'))"

    // Multiple operations
    compute: "pad_start(to_string(round(price)), 10, '0')"

Type Coercion

Convert field values to specific types:

type Coerce =
      | { type: "string" }
      | { type: "i64" }      // 64-bit integer
      | { type: "f64" }      // 64-bit float
      | { type: "bool" }
      | { type: "timestamp_ms"; format?: "iso8601" | "unix_ms" | "unix_s" };

String Coercion

// Number → String: 42 → "42"
    // Boolean → String: true → "true"
    // Null → String: null → ""

    { targetFieldName: "age_str", coerce: { type: "string" } }

Integer Coercion (i64)

// String → Integer: "42" → 42
    // Float → Integer: 3.7 → 3 (truncates)
    // Boolean → Integer: true → 1, false → 0

    { targetFieldName: "count", coerce: { type: "i64" } }

Float Coercion (f64)

// String → Float: "3.14" → 3.14
    // Integer → Float: 42 → 42.0

    { targetFieldName: "price", coerce: { type: "f64" } }

Boolean Coercion

// String → Boolean: "true"/"false" → true/false
    // Number → Boolean: 0 → false, non-zero → true

    { targetFieldName: "active", coerce: { type: "bool" } }

Timestamp Coercion

// ISO8601 → Unix milliseconds
{
  targetFieldName: "created_ts",
  coerce: { type: "timestamp_ms", format: "iso8601" }
}
// "2024-01-15T10:30:00Z" → 1705315800000

// Unix seconds → milliseconds
{
  targetFieldName: "updated_ts",
  coerce: { type: "timestamp_ms", format: "unix_s" }
}
// 1705315800 → 1705315800000

Error Handling

onMissingField

Controls behavior when a mapped field is missing from input:

  • "error" — Fail the entire conversion
  • "null" — Insert null for missing fields
  • "drop" — Omit the field from output
transform: {
  fields: [{ targetFieldName: "optional" }],
  onMissingField: "null",  // Missing → null
}

onMissingRequired

Controls behavior when a required field is missing:

  • "error" — Fail the conversion (default)
  • "abort" — Stop processing this record
transform: {
  fields: [{ targetFieldName: "id", required: true }],
  onMissingRequired: "error",
}

onCoerceError

Controls behavior when type coercion fails:

  • "error" — Fail the entire conversion
  • "null" — Set field to null on failure
  • "dropRecord" — Silently skip records that fail coercion
// Skip records where age cannot be parsed
transform: {
  fields: [
    { targetFieldName: "age", coerce: { type: "i64" } }
  ],
  onCoerceError: "dropRecord",  // Skip invalid records
}

Performance Considerations

  • Zero JS overhead — All transforms execute in Rust/WASM
  • Streaming — Processes line-by-line without buffering
  • Memory efficient — Uses WASM linear memory, no JS objects
  • Type safe — Expression parser validates syntax at compile time
  • Low overhead — Typically adds <10% to conversion time

For extremely complex transformations requiring external API calls, database lookups, or custom business logic, use the onRecords callback for post-processing in JavaScript.

Complete Examples

E-commerce Product Enrichment

const controller = buddy.stream("products.csv", {
  outputFormat: "json",
  transform: {
    mode: "augment",
    fields: [
      // Price calculations
      {
        targetFieldName: "price_with_tax",
        compute: "round(price * 1.1 * 100) / 100"
      },
      {
        targetFieldName: "discount_amount",
        compute: "round((original_price - price) * 100) / 100"
      },
      // Categorization
      {
        targetFieldName: "price_tier",
        compute: "if(gt(price, 100), 'premium', if(gt(price, 50), 'mid', 'budget'))"
      },
      // SKU normalization
      {
        targetFieldName: "sku_normalized",
        compute: "upper(trim(replace(sku, '-', '')))"
      },
    ],
  },
  onRecords: async (ctrl, records) => {
    await db.products.upsertMany(records);
  },
});

User Data Cleaning Pipeline

const controller = buddy.stream("users-export.csv", {
  outputFormat: "ndjson",
  transform: {
    mode: "replace",
    fields: [
      { targetFieldName: "id", required: true },
      {
        targetFieldName: "email",
        compute: "lower(trim(email))",
        required: true
      },
      {
        targetFieldName: "full_name",
        compute: "trim(concat(first_name, ' ', last_name))"
      },
      {
        targetFieldName: "age",
        coerce: { type: "i64" },
        defaultValue: null
      },
      {
        targetFieldName: "is_verified",
        originFieldName: "verified",
        coerce: { type: "bool" },
        defaultValue: false
      },
      {
        targetFieldName: "created_ts",
        originFieldName: "created_at",
        coerce: { type: "timestamp_ms", format: "iso8601" }
      },
    ],
    onMissingRequired: "error",
    onCoerceError: "dropRecord",  // Skip malformed records
  },
  onRecords: async (ctrl, records, stats) => {
    console.log(`Processed ${stats.recordsOut} valid records`);
    await importToDatabase(records);
  },
});