Content Transformation
This document explains the content transformation system in ReX, covering how content is transformed as it flows through different layers of the system, from storage to presentation and back.
Core Transformation Concepts
Content transformation refers to the process of converting content between different representations while maintaining semantic equivalence. The transformation system follows these key principles:
- Immutability: Transformations produce new content instances rather than modifying existing ones
- Composability: Multiple transformations can be combined into pipelines
- Purity: Transformations are pure functions with predictable outputs for given inputs
- Type Safety: Transformations preserve or enhance type information
- Bidirectionality: Many transformations support both forward and reverse operations
Transformation Pipeline
The transformation pipeline processes content as it flows through the system:
Raw Storage Data → Parsing → Normalization → Validation → Enrichment → Presentation
Pipeline Stages
1. Parsing
Parsing converts raw storage format into structured content:
- Text Parsing: Convert plain text to structured content
- Format-Specific Parsing: Handle Markdown, MDX, JSON, etc.
- Binary Parsing: Process binary formats like images or documents
// MDX parsing example
import { parseMdx } from '@lib/mdx/parser'
const rawContent = '# Title\n\nContent with *formatting*'
const parsedContent = await parseMdx(rawContent)
2. Normalization
Normalization standardizes content structure and metadata:
- Structure Normalization: Ensure consistent content structure
- Metadata Normalization: Standardize metadata fields
- Format Normalization: Convert between different formats
// Metadata normalization example
import { normalizeMetadata } from '@lib/content/processors'
const normalizedContent = {
...content,
metadata: normalizeMetadata(content.metadata),
}
3. Validation
Validation ensures content meets schema requirements:
- Schema Validation: Verify content against JSON Schema
- Type Validation: Check type constraints
- Semantic Validation: Verify logical relationships and constraints
// Content validation example
import { validateContent } from '@lib/content/processors'
import { articleSchema } from '@lib/content/schemas'
const validationResult = validateContent(content, articleSchema)
if (!validationResult.valid) {
throw new ContentValidationError(uri, validationResult.errors)
}
4. Enrichment
Enrichment adds derived or computed properties:
- Computed Fields: Add calculated fields based on content
- Relationships: Resolve references to other content
- Contextual Data: Add environment-specific information
// Content enrichment example
import { enrichContent } from '@lib/content/processors'
const enrichedContent = await enrichContent(content, {
resolveReferences: true,
computeReadingTime: true,
generateTableOfContents: true,
})
5. Presentation
Presentation formats content for specific display contexts:
- Component Mapping: Convert content to React components
- Format Conversion: Transform to HTML, PDF, etc.
- Styling Application: Apply design themes and styles
// MDX to React component transformation
import { mdxToReact } from '@lib/mdx/compiler'
import { components } from '@components/mdx'
const ReactComponent = await mdxToReact(content.data, {
components,
scope: { title: content.metadata.title },
})
Transformation Types
Data Transformations
Data transformations modify the content data itself:
- Format Conversion: Markdown → HTML, JSON → YAML, etc.
- Structural Transformation: Reorganizing content structure
- Content Generation: Creating derived content
- Content Filtering: Removing sensitive or unnecessary content
Metadata Transformations
Metadata transformations focus on the content metadata:
- Schema Migration: Update metadata to new schemas
- Default Values: Apply default values for missing fields
- Derived Metadata: Calculate metadata from content
- Metadata Normalization: Standardize field formats
URI Transformations
URI transformations handle content addressing:
- Path Normalization: Standardize path formats
- URI Resolution: Resolve relative URIs
- Link Rewriting: Update internal links
- Identity Mapping: Map between different URI schemes
Transformation Implementation
Function Composition
Transformations are implemented using function composition:
import { pipe } from '@lib/utils/functional'
// Create transformation pipeline
const processMarkdown = pipe(
parseMarkdown, // String → MarkdownAST
extractFrontmatter, // MarkdownAST → {ast, frontmatter}
validateFrontmatter, // Validate frontmatter
transformAST, // Apply AST transformations
renderToHTML // MarkdownAST → HTML
)
// Apply transformation
const result = await processMarkdown(rawContent)
Middleware Approach
Transformations can be implemented as middleware:
import { createContentStore } from '@lib/content/store'
import { withTransformation } from '@lib/content/middleware'
// Create store with transformation middleware
const store = createContentStore({
adapter,
middleware: [
withTransformation({
read: [parseMdx, extractFrontmatter, resolveLinks],
write: [validateSchema, normalizeMetadata, serializeMdx],
}),
],
})
Adapter Transformations
Adapters implement transformations between storage formats and Content objects:
import { createAdapter } from '@lib/content/adapters'
const jsonAdapter = createAdapter({
// Transform storage format to Content object
read: async uri => {
const data = await readFile(resolveFilePath(uri))
const parsed = JSON.parse(data)
return {
data: parsed.content,
contentType: 'application/json',
metadata: parsed.metadata || {},
}
},
// Transform Content object to storage format
write: async (uri, content) => {
const data = JSON.stringify(
{
content: content.data,
metadata: content.metadata,
},
null,
2
)
await writeFile(resolveFilePath(uri), data)
},
})
Type-Safe Transformations
The system uses TypeScript generics for type-safe transformations:
// Type-safe transformation
function transform<TInput, TOutput>(
input: Content<TInput>,
transformer: (data: TInput) => TOutput
): Content<TOutput> {
return {
...input,
data: transformer(input.data),
}
}
// Usage example
const markdownContent: Content<string> = {
data: '# Title\n\nContent',
contentType: 'text/markdown',
metadata: { title: 'Example' },
}
const htmlContent = transform(markdownContent, markdown =>
markdownToHtml(markdown)
)
MDX Transformation
MDX transformation is a specialized pipeline:
MDX Text → MDX AST → Extract Frontmatter → Process Imports → Compile to React → React Component
Key stages in MDX transformation:
- Parse MDX: Convert MDX text to abstract syntax tree (AST)
- Extract Frontmatter: Separate frontmatter from content
- Process Imports: Resolve and load imported components
- Transform AST: Apply plugins and transformations
- Compile to JSX: Convert AST to JSX
- Create Component: Compile to React component
// MDX transformation pipeline
const processMdx = async (source, options) => {
const { content, frontmatter } = extractFrontmatter(source)
const ast = await parseMdx(content)
const processedAst = applyMdxPlugins(ast, options.plugins)
const jsx = mdxAstToJsx(processedAst)
const component = await compileJsx(jsx, {
components: options.components,
scope: { ...frontmatter, ...options.scope },
})
return {
component,
frontmatter,
}
}
Serialization and Deserialization
Serialization converts content to storage format:
// Serialize content for storage
function serializeContent(content: Content): Buffer | string {
switch (content.contentType) {
case 'text/markdown':
return serializeMarkdown(content)
case 'application/json':
return JSON.stringify(content.data)
case 'text/mdx':
return serializeMdx(content)
default:
return String(content.data)
}
}
Deserialization converts storage format to content:
// Deserialize content from storage
function deserializeContent(
data: Buffer | string,
contentType: string
): Content {
switch (contentType) {
case 'text/markdown':
return deserializeMarkdown(data.toString())
case 'application/json':
return {
data: JSON.parse(data.toString()),
contentType,
metadata: {},
}
case 'text/mdx':
return deserializeMdx(data.toString())
default:
return {
data: data.toString(),
contentType,
metadata: {},
}
}
}
Transformation Context
Transformations can access and modify context:
// Transformation with context
function transformWithContext<T>(
content: Content<T>,
context: TransformContext
): Content<T> {
// Access context information
const { baseUri, environment, options } = context
// Transform content using context
return {
...content,
data: transformData(content.data, context),
metadata: {
...content.metadata,
transformedAt: new Date(),
environment: environment.type,
},
}
}
Error Handling in Transformations
Transformation errors are handled consistently:
// Error handling in transformations
try {
const transformed = await transform(content)
return transformed
} catch (error) {
if (error instanceof SyntaxError) {
throw new ContentFormatError(uri, error.message)
} else if (error instanceof ValidationError) {
throw new ContentValidationError(uri, error.errors)
} else {
throw new ContentError(`Transformation failed: ${error.message}`, uri)
}
}
Bidirectional Transformations
Some transformations support bidirectional conversion:
// Bidirectional transformation
interface BidirectionalTransform<A, B> {
forward: (a: A) => B
backward: (b: B) => A
}
// Example: JSON <-> Object transformation
const jsonTransform: BidirectionalTransform<string, any> = {
forward: json => JSON.parse(json),
backward: obj => JSON.stringify(obj, null, 2),
}
Performance Considerations
Transformation performance is optimized through:
- Caching: Cache transformation results
- Lazy Transformations: Transform only when needed
- Incremental Processing: Process only what has changed
- Worker Offloading: Run expensive transformations in workers
// Cached transformation
const cachedTransform = memoize(transform, content => contentHash(content))
// Lazy transformation
const lazyTransform = (content, options = {}) => {
let transformed
return {
...content,
get transformed() {
if (!transformed) {
transformed = transform(content.data)
}
return transformed
},
}
}