Content Transformation

This document explains the content transformation system in ReX, covering how content is transformed as it flows through different layers of the system, from storage to presentation and back.

Core Transformation Concepts

Content transformation refers to the process of converting content between different representations while maintaining semantic equivalence. The transformation system follows these key principles:

Immutability: Transformations produce new content instances rather than modifying existing ones
Composability: Multiple transformations can be combined into pipelines
Purity: Transformations are pure functions with predictable outputs for given inputs
Type Safety: Transformations preserve or enhance type information
Bidirectionality: Many transformations support both forward and reverse operations

Transformation Pipeline

The transformation pipeline processes content as it flows through the system:

Raw Storage Data → Parsing → Normalization → Validation → Enrichment → Presentation

Pipeline Stages

1. Parsing

Parsing converts raw storage format into structured content:

Text Parsing: Convert plain text to structured content
Format-Specific Parsing: Handle Markdown, MDX, JSON, etc.
Binary Parsing: Process binary formats like images or documents

typescript

// MDX parsing example
import { parseMdx } from '@lib/mdx/parser'

const rawContent = '# Title\n\nContent with *formatting*'
const parsedContent = await parseMdx(rawContent)

2. Normalization

Normalization standardizes content structure and metadata:

Structure Normalization: Ensure consistent content structure
Metadata Normalization: Standardize metadata fields
Format Normalization: Convert between different formats

typescript

// Metadata normalization example
import { normalizeMetadata } from '@lib/content/processors'

const normalizedContent = {
  ...content,
  metadata: normalizeMetadata(content.metadata),
}

3. Validation

Validation ensures content meets schema requirements:

Schema Validation: Verify content against JSON Schema
Type Validation: Check type constraints
Semantic Validation: Verify logical relationships and constraints

typescript

// Content validation example
import { validateContent } from '@lib/content/processors'
import { articleSchema } from '@lib/content/schemas'

const validationResult = validateContent(content, articleSchema)
if (!validationResult.valid) {
  throw new ContentValidationError(uri, validationResult.errors)
}

4. Enrichment

Enrichment adds derived or computed properties:

Computed Fields: Add calculated fields based on content
Relationships: Resolve references to other content
Contextual Data: Add environment-specific information

typescript

// Content enrichment example
import { enrichContent } from '@lib/content/processors'

const enrichedContent = await enrichContent(content, {
  resolveReferences: true,
  computeReadingTime: true,
  generateTableOfContents: true,
})

5. Presentation

Presentation formats content for specific display contexts:

Component Mapping: Convert content to React components
Format Conversion: Transform to HTML, PDF, etc.
Styling Application: Apply design themes and styles

typescript

// MDX to React component transformation
import { mdxToReact } from '@lib/mdx/compiler'
import { components } from '@components/mdx'

const ReactComponent = await mdxToReact(content.data, {
  components,
  scope: { title: content.metadata.title },
})

Transformation Types

Data Transformations

Data transformations modify the content data itself:

Format Conversion: Markdown → HTML, JSON → YAML, etc.
Structural Transformation: Reorganizing content structure
Content Generation: Creating derived content
Content Filtering: Removing sensitive or unnecessary content

Metadata Transformations

Metadata transformations focus on the content metadata:

Schema Migration: Update metadata to new schemas
Default Values: Apply default values for missing fields
Derived Metadata: Calculate metadata from content
Metadata Normalization: Standardize field formats

URI Transformations

URI transformations handle content addressing:

Path Normalization: Standardize path formats
URI Resolution: Resolve relative URIs
Link Rewriting: Update internal links
Identity Mapping: Map between different URI schemes

Transformation Implementation

Function Composition

Transformations are implemented using function composition:

typescript

import { pipe } from '@lib/utils/functional'

// Create transformation pipeline
const processMarkdown = pipe(
  parseMarkdown, // String → MarkdownAST
  extractFrontmatter, // MarkdownAST → {ast, frontmatter}
  validateFrontmatter, // Validate frontmatter
  transformAST, // Apply AST transformations
  renderToHTML // MarkdownAST → HTML
)

// Apply transformation
const result = await processMarkdown(rawContent)

Middleware Approach

Transformations can be implemented as middleware:

typescript

import { createContentStore } from '@lib/content/store'
import { withTransformation } from '@lib/content/middleware'

// Create store with transformation middleware
const store = createContentStore({
  adapter,
  middleware: [
    withTransformation({
      read: [parseMdx, extractFrontmatter, resolveLinks],
      write: [validateSchema, normalizeMetadata, serializeMdx],
    }),
  ],
})

Adapter Transformations

Adapters implement transformations between storage formats and Content objects:

typescript

import { createAdapter } from '@lib/content/adapters'

const jsonAdapter = createAdapter({
  // Transform storage format to Content object
  read: async uri => {
    const data = await readFile(resolveFilePath(uri))
    const parsed = JSON.parse(data)

    return {
      data: parsed.content,
      contentType: 'application/json',
      metadata: parsed.metadata || {},
    }
  },

  // Transform Content object to storage format
  write: async (uri, content) => {
    const data = JSON.stringify(
      {
        content: content.data,
        metadata: content.metadata,
      },
      null,
      2
    )

    await writeFile(resolveFilePath(uri), data)
  },
})

Type-Safe Transformations

The system uses TypeScript generics for type-safe transformations:

typescript

// Type-safe transformation
function transform<TInput, TOutput>(
  input: Content<TInput>,
  transformer: (data: TInput) => TOutput
): Content<TOutput> {
  return {
    ...input,
    data: transformer(input.data),
  }
}

// Usage example
const markdownContent: Content<string> = {
  data: '# Title\n\nContent',
  contentType: 'text/markdown',
  metadata: { title: 'Example' },
}

const htmlContent = transform(markdownContent, markdown =>
  markdownToHtml(markdown)
)

MDX Transformation

MDX transformation is a specialized pipeline:

MDX Text → MDX AST → Extract Frontmatter → Process Imports → Compile to React → React Component

Key stages in MDX transformation:

Parse MDX: Convert MDX text to abstract syntax tree (AST)
Extract Frontmatter: Separate frontmatter from content
Process Imports: Resolve and load imported components
Transform AST: Apply plugins and transformations
Compile to JSX: Convert AST to JSX
Create Component: Compile to React component

typescript

// MDX transformation pipeline
const processMdx = async (source, options) => {
  const { content, frontmatter } = extractFrontmatter(source)
  const ast = await parseMdx(content)
  const processedAst = applyMdxPlugins(ast, options.plugins)
  const jsx = mdxAstToJsx(processedAst)
  const component = await compileJsx(jsx, {
    components: options.components,
    scope: { ...frontmatter, ...options.scope },
  })

  return {
    component,
    frontmatter,
  }
}

Serialization and Deserialization

Serialization converts content to storage format:

typescript

// Serialize content for storage
function serializeContent(content: Content): Buffer | string {
  switch (content.contentType) {
    case 'text/markdown':
      return serializeMarkdown(content)
    case 'application/json':
      return JSON.stringify(content.data)
    case 'text/mdx':
      return serializeMdx(content)
    default:
      return String(content.data)
  }
}

Deserialization converts storage format to content:

typescript

// Deserialize content from storage
function deserializeContent(
  data: Buffer | string,
  contentType: string
): Content {
  switch (contentType) {
    case 'text/markdown':
      return deserializeMarkdown(data.toString())
    case 'application/json':
      return {
        data: JSON.parse(data.toString()),
        contentType,
        metadata: {},
      }
    case 'text/mdx':
      return deserializeMdx(data.toString())
    default:
      return {
        data: data.toString(),
        contentType,
        metadata: {},
      }
  }
}

Transformation Context

Transformations can access and modify context:

typescript

// Transformation with context
function transformWithContext<T>(
  content: Content<T>,
  context: TransformContext
): Content<T> {
  // Access context information
  const { baseUri, environment, options } = context

  // Transform content using context
  return {
    ...content,
    data: transformData(content.data, context),
    metadata: {
      ...content.metadata,
      transformedAt: new Date(),
      environment: environment.type,
    },
  }
}

Error Handling in Transformations

Transformation errors are handled consistently:

typescript

// Error handling in transformations
try {
  const transformed = await transform(content)
  return transformed
} catch (error) {
  if (error instanceof SyntaxError) {
    throw new ContentFormatError(uri, error.message)
  } else if (error instanceof ValidationError) {
    throw new ContentValidationError(uri, error.errors)
  } else {
    throw new ContentError(`Transformation failed: ${error.message}`, uri)
  }
}

Bidirectional Transformations

Some transformations support bidirectional conversion:

typescript

// Bidirectional transformation
interface BidirectionalTransform<A, B> {
  forward: (a: A) => B
  backward: (b: B) => A
}

// Example: JSON <-> Object transformation
const jsonTransform: BidirectionalTransform<string, any> = {
  forward: json => JSON.parse(json),
  backward: obj => JSON.stringify(obj, null, 2),
}

Performance Considerations

Transformation performance is optimized through:

Caching: Cache transformation results
Lazy Transformations: Transform only when needed
Incremental Processing: Process only what has changed
Worker Offloading: Run expensive transformations in workers

typescript

// Cached transformation
const cachedTransform = memoize(transform, content => contentHash(content))

// Lazy transformation
const lazyTransform = (content, options = {}) => {
  let transformed

  return {
    ...content,
    get transformed() {
      if (!transformed) {
        transformed = transform(content.data)
      }
      return transformed
    },
  }
}

Archive

Archive

Content Transformation

Core Transformation Concepts

Transformation Pipeline

Pipeline Stages

1. Parsing

2. Normalization

3. Validation

4. Enrichment

5. Presentation

Transformation Types

Data Transformations

Metadata Transformations

URI Transformations

Transformation Implementation

Function Composition

Middleware Approach

Adapter Transformations

Type-Safe Transformations

MDX Transformation

Serialization and Deserialization

Transformation Context

Error Handling in Transformations

Bidirectional Transformations

Performance Considerations

Content Transformation ​

Core Transformation Concepts ​

Transformation Pipeline ​

Pipeline Stages ​

1. Parsing ​

2. Normalization ​

3. Validation ​

4. Enrichment ​

5. Presentation ​

Transformation Types ​

Data Transformations ​

Metadata Transformations ​

URI Transformations ​

Transformation Implementation ​

Function Composition ​

Middleware Approach ​

Adapter Transformations ​

Type-Safe Transformations ​

MDX Transformation ​

Serialization and Deserialization ​

Transformation Context ​

Error Handling in Transformations ​

Bidirectional Transformations ​

Performance Considerations ​

Related Concepts ​

Content Transformation

Core Transformation Concepts

Transformation Pipeline

Pipeline Stages

1. Parsing

2. Normalization

3. Validation

4. Enrichment

5. Presentation

Transformation Types

Data Transformations

Metadata Transformations

URI Transformations

Transformation Implementation

Function Composition

Middleware Approach

Adapter Transformations

Type-Safe Transformations

MDX Transformation

Serialization and Deserialization

Transformation Context

Error Handling in Transformations

Bidirectional Transformations

Performance Considerations

Related Concepts