Category: File Formats
Definition
TFRecord is TensorFlow's native binary format for storing sequences of binary records, optimized for efficient streaming of training data.
How It Works
TFRecord stores data as a sequence of binary strings with Protocol Buffer serialization. Each record can contain images, text, or any serialized data with associated labels.
The format supports sharding across multiple files and compression, essential for distributed training.
Why It Matters
TFRecord enables TensorFlow to stream data directly from disk during training, crucial for datasets too large for memory. It provides consistent I/O performance across different storage systems.
Google uses TFRecord internally for training models on petabyte-scale datasets.
← Back to File Formats | All Terms