Skip to content

Content Transformation

This document explains the content transformation system in ReX, covering how content is transformed as it flows through different layers of the system, from storage to presentation and back.

Core Transformation Concepts

Content transformation refers to the process of converting content between different representations while maintaining semantic equivalence. The transformation system follows these key principles:

  1. Immutability: Transformations produce new content instances rather than modifying existing ones
  2. Composability: Multiple transformations can be combined into pipelines
  3. Purity: Transformations are pure functions with predictable outputs for given inputs
  4. Type Safety: Transformations preserve or enhance type information
  5. Bidirectionality: Many transformations support both forward and reverse operations

Transformation Pipeline

The transformation pipeline processes content as it flows through the system:

Raw Storage Data → Parsing → Normalization → Validation → Enrichment → Presentation

Pipeline Stages

1. Parsing

Parsing converts raw storage format into structured content:

  • Text Parsing: Convert plain text to structured content
  • Format-Specific Parsing: Handle Markdown, MDX, JSON, etc.
  • Binary Parsing: Process binary formats like images or documents
typescript
// MDX parsing example
import { parseMdx } from '@lib/mdx/parser'

const rawContent = '# Title\n\nContent with *formatting*'
const parsedContent = await parseMdx(rawContent)

2. Normalization

Normalization standardizes content structure and metadata:

  • Structure Normalization: Ensure consistent content structure
  • Metadata Normalization: Standardize metadata fields
  • Format Normalization: Convert between different formats
typescript
// Metadata normalization example
import { normalizeMetadata } from '@lib/content/processors'

const normalizedContent = {
  ...content,
  metadata: normalizeMetadata(content.metadata),
}

3. Validation

Validation ensures content meets schema requirements:

  • Schema Validation: Verify content against JSON Schema
  • Type Validation: Check type constraints
  • Semantic Validation: Verify logical relationships and constraints
typescript
// Content validation example
import { validateContent } from '@lib/content/processors'
import { articleSchema } from '@lib/content/schemas'

const validationResult = validateContent(content, articleSchema)
if (!validationResult.valid) {
  throw new ContentValidationError(uri, validationResult.errors)
}

4. Enrichment

Enrichment adds derived or computed properties:

  • Computed Fields: Add calculated fields based on content
  • Relationships: Resolve references to other content
  • Contextual Data: Add environment-specific information
typescript
// Content enrichment example
import { enrichContent } from '@lib/content/processors'

const enrichedContent = await enrichContent(content, {
  resolveReferences: true,
  computeReadingTime: true,
  generateTableOfContents: true,
})

5. Presentation

Presentation formats content for specific display contexts:

  • Component Mapping: Convert content to React components
  • Format Conversion: Transform to HTML, PDF, etc.
  • Styling Application: Apply design themes and styles
typescript
// MDX to React component transformation
import { mdxToReact } from '@lib/mdx/compiler'
import { components } from '@components/mdx'

const ReactComponent = await mdxToReact(content.data, {
  components,
  scope: { title: content.metadata.title },
})

Transformation Types

Data Transformations

Data transformations modify the content data itself:

  • Format Conversion: Markdown → HTML, JSON → YAML, etc.
  • Structural Transformation: Reorganizing content structure
  • Content Generation: Creating derived content
  • Content Filtering: Removing sensitive or unnecessary content

Metadata Transformations

Metadata transformations focus on the content metadata:

  • Schema Migration: Update metadata to new schemas
  • Default Values: Apply default values for missing fields
  • Derived Metadata: Calculate metadata from content
  • Metadata Normalization: Standardize field formats

URI Transformations

URI transformations handle content addressing:

  • Path Normalization: Standardize path formats
  • URI Resolution: Resolve relative URIs
  • Link Rewriting: Update internal links
  • Identity Mapping: Map between different URI schemes

Transformation Implementation

Function Composition

Transformations are implemented using function composition:

typescript
import { pipe } from '@lib/utils/functional'

// Create transformation pipeline
const processMarkdown = pipe(
  parseMarkdown, // String → MarkdownAST
  extractFrontmatter, // MarkdownAST → {ast, frontmatter}
  validateFrontmatter, // Validate frontmatter
  transformAST, // Apply AST transformations
  renderToHTML // MarkdownAST → HTML
)

// Apply transformation
const result = await processMarkdown(rawContent)

Middleware Approach

Transformations can be implemented as middleware:

typescript
import { createContentStore } from '@lib/content/store'
import { withTransformation } from '@lib/content/middleware'

// Create store with transformation middleware
const store = createContentStore({
  adapter,
  middleware: [
    withTransformation({
      read: [parseMdx, extractFrontmatter, resolveLinks],
      write: [validateSchema, normalizeMetadata, serializeMdx],
    }),
  ],
})

Adapter Transformations

Adapters implement transformations between storage formats and Content objects:

typescript
import { createAdapter } from '@lib/content/adapters'

const jsonAdapter = createAdapter({
  // Transform storage format to Content object
  read: async uri => {
    const data = await readFile(resolveFilePath(uri))
    const parsed = JSON.parse(data)

    return {
      data: parsed.content,
      contentType: 'application/json',
      metadata: parsed.metadata || {},
    }
  },

  // Transform Content object to storage format
  write: async (uri, content) => {
    const data = JSON.stringify(
      {
        content: content.data,
        metadata: content.metadata,
      },
      null,
      2
    )

    await writeFile(resolveFilePath(uri), data)
  },
})

Type-Safe Transformations

The system uses TypeScript generics for type-safe transformations:

typescript
// Type-safe transformation
function transform<TInput, TOutput>(
  input: Content<TInput>,
  transformer: (data: TInput) => TOutput
): Content<TOutput> {
  return {
    ...input,
    data: transformer(input.data),
  }
}

// Usage example
const markdownContent: Content<string> = {
  data: '# Title\n\nContent',
  contentType: 'text/markdown',
  metadata: { title: 'Example' },
}

const htmlContent = transform(markdownContent, markdown =>
  markdownToHtml(markdown)
)

MDX Transformation

MDX transformation is a specialized pipeline:

MDX Text → MDX AST → Extract Frontmatter → Process Imports → Compile to React → React Component

Key stages in MDX transformation:

  1. Parse MDX: Convert MDX text to abstract syntax tree (AST)
  2. Extract Frontmatter: Separate frontmatter from content
  3. Process Imports: Resolve and load imported components
  4. Transform AST: Apply plugins and transformations
  5. Compile to JSX: Convert AST to JSX
  6. Create Component: Compile to React component
typescript
// MDX transformation pipeline
const processMdx = async (source, options) => {
  const { content, frontmatter } = extractFrontmatter(source)
  const ast = await parseMdx(content)
  const processedAst = applyMdxPlugins(ast, options.plugins)
  const jsx = mdxAstToJsx(processedAst)
  const component = await compileJsx(jsx, {
    components: options.components,
    scope: { ...frontmatter, ...options.scope },
  })

  return {
    component,
    frontmatter,
  }
}

Serialization and Deserialization

Serialization converts content to storage format:

typescript
// Serialize content for storage
function serializeContent(content: Content): Buffer | string {
  switch (content.contentType) {
    case 'text/markdown':
      return serializeMarkdown(content)
    case 'application/json':
      return JSON.stringify(content.data)
    case 'text/mdx':
      return serializeMdx(content)
    default:
      return String(content.data)
  }
}

Deserialization converts storage format to content:

typescript
// Deserialize content from storage
function deserializeContent(
  data: Buffer | string,
  contentType: string
): Content {
  switch (contentType) {
    case 'text/markdown':
      return deserializeMarkdown(data.toString())
    case 'application/json':
      return {
        data: JSON.parse(data.toString()),
        contentType,
        metadata: {},
      }
    case 'text/mdx':
      return deserializeMdx(data.toString())
    default:
      return {
        data: data.toString(),
        contentType,
        metadata: {},
      }
  }
}

Transformation Context

Transformations can access and modify context:

typescript
// Transformation with context
function transformWithContext<T>(
  content: Content<T>,
  context: TransformContext
): Content<T> {
  // Access context information
  const { baseUri, environment, options } = context

  // Transform content using context
  return {
    ...content,
    data: transformData(content.data, context),
    metadata: {
      ...content.metadata,
      transformedAt: new Date(),
      environment: environment.type,
    },
  }
}

Error Handling in Transformations

Transformation errors are handled consistently:

typescript
// Error handling in transformations
try {
  const transformed = await transform(content)
  return transformed
} catch (error) {
  if (error instanceof SyntaxError) {
    throw new ContentFormatError(uri, error.message)
  } else if (error instanceof ValidationError) {
    throw new ContentValidationError(uri, error.errors)
  } else {
    throw new ContentError(`Transformation failed: ${error.message}`, uri)
  }
}

Bidirectional Transformations

Some transformations support bidirectional conversion:

typescript
// Bidirectional transformation
interface BidirectionalTransform<A, B> {
  forward: (a: A) => B
  backward: (b: B) => A
}

// Example: JSON <-> Object transformation
const jsonTransform: BidirectionalTransform<string, any> = {
  forward: json => JSON.parse(json),
  backward: obj => JSON.stringify(obj, null, 2),
}

Performance Considerations

Transformation performance is optimized through:

  • Caching: Cache transformation results
  • Lazy Transformations: Transform only when needed
  • Incremental Processing: Process only what has changed
  • Worker Offloading: Run expensive transformations in workers
typescript
// Cached transformation
const cachedTransform = memoize(transform, content => contentHash(content))

// Lazy transformation
const lazyTransform = (content, options = {}) => {
  let transformed

  return {
    ...content,
    get transformed() {
      if (!transformed) {
        transformed = transform(content.data)
      }
      return transformed
    },
  }
}

Released under the MIT License.