Metadata System

This document explains the metadata system in ReX, which provides structured descriptive information about content.

Overview

The metadata system provides a flexible way to associate descriptive information with content beyond its primary data. This enables content to carry context, relationships, and additional properties that enhance its usability and richness.

Metadata Fundamentals

Metadata in the system is represented by the ContentMetadata interface:

typescript

interface ContentMetadata {
  // Common fields
  title?: string
  description?: string
  createdAt?: Date
  updatedAt?: Date

  // References to other content
  references?: ContentReference[]

  // Extension point for custom metadata
  [key: string]: any
}

Key characteristics of the metadata system:

Flexible Structure: The index signature allows for arbitrary additional properties
Optional Fields: All standard fields are optional
Strong Typing: Core fields have specific types for better type safety
Content Relationships: The references field supports linking to other content

Core Metadata Fields

Identity Metadata

typescript

// Identity-related fields
title?: string;         // Human-readable title
description?: string;   // Brief description
slug?: string;          // URL-friendly identifier
id?: string;            // Unique identifier

Temporal Metadata

typescript

// Time-related fields
createdAt?: Date;       // Creation timestamp
updatedAt?: Date;       // Last update timestamp
publishedAt?: Date;     // Publication timestamp
expiresAt?: Date;       // Expiration timestamp

Categorization Metadata

typescript

// Categorization fields
tags?: string[];        // Simple tag list
categories?: string[];  // Category classifications
language?: string;      // Content language (ISO code)
region?: string;        // Geographic region

Authorship Metadata

typescript

// Authorship fields
author?: string | {     // Simple string or object
  name: string;
  email?: string;
  url?: string;
};
contributors?: Array<string | {   // List of contributors
  name: string;
  email?: string;
  url?: string;
  role?: string;
}>;

State Metadata

typescript

// State-related fields
status?: 'draft' | 'published' | 'archived';  // Content status
visibility?: 'public' | 'private' | 'protected';  // Content visibility
version?: string;       // Content version identifier
draft?: boolean;        // Whether content is in draft state

References System

The metadata system includes a references system for expressing relationships between content:

typescript

interface ContentReference {
  uri: string // URI of referenced content
  type: 'embed' | 'link' | 'dependency' // Reference type
  title?: string // Optional title
  description?: string // Optional description
}

Reference types:

Embed: Content directly embedded within the parent (e.g., an image in a Markdown document)
Link: Content linked to but not embedded (e.g., a link to another article)
Dependency: Content required for functioning but not visually present (e.g., a CSS file for HTML)

Example:

typescript

const articleWithReferences: Content<string> = {
  data: '# Article with References\n\n![Image](image.jpg)\n\n[Link to related](related.md)',
  contentType: 'text/markdown',
  metadata: {
    title: 'Article with References',
    references: [
      { uri: 'image.jpg', type: 'embed', title: 'Featured Image' },
      { uri: 'related.md', type: 'link', title: 'Related Article' },
    ],
  },
}

Domain-Specific Metadata

The metadata system supports domain-specific extensions through specialized interfaces:

Document Metadata

typescript

interface DocumentMetadata extends ContentMetadata {
  author?: string
  publishedAt?: Date
  tags?: string[]
  language?: string
  wordCount?: number
  readingTime?: number
  toc?: Array<{ title: string; level: number; id: string }>
}

Media Metadata

typescript

interface ImageMetadata extends ContentMetadata {
  width?: number
  height?: number
  format?: string
  alt?: string
  caption?: string
}

interface VideoMetadata extends ContentMetadata {
  width?: number
  height?: number
  duration?: number
  format?: string
  thumbnail?: string
}

Data Metadata

typescript

interface DataMetadata extends ContentMetadata {
  schema?: string
  schemaVersion?: string
  validatedAt?: Date
  isValid?: boolean
}

Metadata Extraction

The system includes utilities for extracting metadata from content:

Frontmatter Extraction

typescript

// Extract metadata from Markdown frontmatter
const markdownWithFrontmatter = `---
title: Hello World
author: John Doe
tags:
  - example
  - markdown
createdAt: 2025-03-15
---

# Hello World

This is an example.`

const { data, metadata } = extractFrontmatter(markdownWithFrontmatter)
// data: "# Hello World\n\nThis is an example."
// metadata: {
//   title: "Hello World",
//   author: "John Doe",
//   tags: ["example", "markdown"],
//   createdAt: Date("2025-03-15")
// }

Media Metadata Extraction

typescript

// Extract metadata from an image
const imageMetadata = await extractImageMetadata(imageData)
// {
//   width: 1200,
//   height: 800,
//   format: "jpeg",
//   exif: { ... }
// }

Metadata Validation

The system supports metadata validation through schema validation:

typescript

// Define a schema for blog post metadata
const blogPostMetadataSchema = {
  type: 'object',
  required: ['title', 'author', 'createdAt'],
  properties: {
    title: { type: 'string', minLength: 1 },
    author: { type: 'string', minLength: 1 },
    createdAt: { type: 'string', format: 'date-time' },
    tags: {
      type: 'array',
      items: { type: 'string' },
    },
    draft: { type: 'boolean' },
  },
}

// Validate metadata
const isValid = validateMetadata(metadata, blogPostMetadataSchema)

Metadata Serialization

Metadata can be serialized for storage or transfer:

typescript

// Serialize metadata to JSON
const serialized = JSON.stringify(metadata)

// Deserialize from JSON with date handling
const deserialized = JSON.parse(serialized, (key, value) => {
  // Convert date strings back to Date objects
  if (key === 'createdAt' || key === 'updatedAt' || key === 'publishedAt') {
    return new Date(value)
  }
  return value
})

Usage Patterns

Basic Metadata

typescript

// Creating content with basic metadata
const content: Content<string> = {
  data: '# Hello World\n\nThis is a sample document.',
  contentType: 'text/markdown',
  metadata: {
    title: 'Hello World',
    description: 'A sample Markdown document',
    createdAt: new Date(),
    updatedAt: new Date(),
  },
}

Rich Metadata

typescript

// Creating content with rich metadata
const blogPost: Content<string> = {
  data: '# Advanced Techniques\n\nThis post explores advanced techniques...',
  contentType: 'text/markdown',
  metadata: {
    title: 'Advanced Techniques',
    description: 'Exploring advanced content techniques',
    author: {
      name: 'Jane Smith',
      email: '[email protected]',
    },
    tags: ['advanced', 'tutorial', 'content'],
    createdAt: new Date('2025-03-01T12:00:00Z'),
    updatedAt: new Date('2025-03-15T09:30:00Z'),
    publishedAt: new Date('2025-03-16T10:00:00Z'),
    status: 'published',
    readingTime: 8, // minutes
    wordCount: 1500,
  },
}

Metadata with References

typescript

// Creating content with references in metadata
const articleWithImages: Content<string> = {
  data: '# Article with Images\n\n![First Image](images/first.jpg)\n\n![Second Image](images/second.jpg)',
  contentType: 'text/markdown',
  metadata: {
    title: 'Article with Images',
    description: 'An article demonstrating image references',
    references: [
      {
        uri: 'images/first.jpg',
        type: 'embed',
        title: 'First Image',
      },
      {
        uri: 'images/second.jpg',
        type: 'embed',
        title: 'Second Image',
      },
    ],
  },
}

Querying by Metadata

typescript

// Hypothetical query based on metadata
const recentPosts = await store.query({
  contentType: 'text/markdown',
  metadata: {
    tags: { $contains: 'tutorial' },
    publishedAt: { $gte: new Date('2025-01-01') },
    status: 'published',
  },
})

Updating Metadata

typescript

// Reading existing content
const content = await store.read('posts/hello.md')

// Updating only the metadata
await store.write('posts/hello.md', {
  ...content,
  metadata: {
    ...content.metadata,
    updatedAt: new Date(),
    tags: [...(content.metadata.tags || []), 'updated'],
  },
})

Metadata and Content Lifecycle

Metadata changes throughout the content lifecycle:

Creation: Basic metadata is added (title, createdAt)
Editing: Metadata is updated (updatedAt, version)
Publishing: Publication metadata is added (publishedAt, status)
Categorization: Organizational metadata is added (tags, categories)
Linking: References to other content are added
Archiving: Archival metadata is added (archivedAt, status)

Best Practices

Consistent Fields: Use common field names consistently across content
Standard Types: Use standard types (Date for timestamps, arrays for lists)
Minimal Required Fields: Keep required metadata minimal to simplify content creation
Explicit Extension: Create explicit interfaces for domain-specific metadata
Validation: Validate metadata against schemas for consistency

Content Model: The overall content model
Content Structure: How content is structured
[TODO] ~~References~~: How content references work

Archive

Archive

Metadata System

Overview

Metadata Fundamentals

Core Metadata Fields

Identity Metadata

Temporal Metadata

Categorization Metadata

Authorship Metadata

State Metadata

References System

Domain-Specific Metadata

Document Metadata

Media Metadata

Data Metadata

Metadata Extraction

Frontmatter Extraction

Media Metadata Extraction

Metadata Validation

Metadata Serialization

Usage Patterns

Basic Metadata

Rich Metadata

Metadata with References

Querying by Metadata

Updating Metadata

Metadata and Content Lifecycle

Best Practices

Metadata System ​

Overview ​

Metadata Fundamentals ​

Core Metadata Fields ​

Identity Metadata ​

Temporal Metadata ​

Categorization Metadata ​

Authorship Metadata ​

State Metadata ​

References System ​

Domain-Specific Metadata ​

Document Metadata ​

Media Metadata ​

Data Metadata ​

Metadata Extraction ​

Frontmatter Extraction ​

Media Metadata Extraction ​

Metadata Validation ​

Metadata Serialization ​

Usage Patterns ​

Basic Metadata ​

Rich Metadata ​

Metadata with References ​

Querying by Metadata ​

Updating Metadata ​

Metadata and Content Lifecycle ​

Best Practices ​

Related Concepts ​

Metadata System

Overview

Metadata Fundamentals

Core Metadata Fields

Identity Metadata

Temporal Metadata

Categorization Metadata

Authorship Metadata

State Metadata

References System

Domain-Specific Metadata

Document Metadata

Media Metadata

Data Metadata

Metadata Extraction

Frontmatter Extraction

Media Metadata Extraction

Metadata Validation

Metadata Serialization

Usage Patterns

Basic Metadata

Rich Metadata

Metadata with References

Querying by Metadata

Updating Metadata

Metadata and Content Lifecycle

Best Practices

Related Concepts