作者介绍:简历上没有一个精通的运维工程师,下面的思维导图也是预计更新的内容和当前进度(不定时更新)。
我们上一章介绍了中间件:Zookeeper,本章将介绍另外一个中间件:Kafka。目前这2个中间件都是基于JAVA语言的。
我们前面讲解了Kafka的几个关键概念,生产者者发送的消息最终都会写到Broker节点的磁盘里面,那么它在本地数据是怎么样的呢?
Kafka 的日志(Log)是其存储模型的核心,采用 顺序追加写入(Append-Only) 和 分段存储(Segment) 的设计,确保高吞吐量和持久化能力。
所有消息按顺序追加到日志末尾,避免随机磁盘寻址,极大提升写入性能。
# Topic: my-topic,分区 0 的存储目录
my-topic-0/
├── 00000000000000000000.log # Segment 日志文件(存储实际消息)
├── 00000000000000000000.index # 位移索引文件(快速定位消息位置)
├── 00000000000000000000.timeindex # 时间戳索引文件(按时间查找消息)
├── 00000000000000000005.log
├── 00000000000000000005.index
└── ...
[root@localhost my-topic123-0]# ll
total 1315664
-rw-r--r-- 1 root root 1634712 May 9 01:11 00000000000000000000.index
-rw-r--r-- 1 root root 1073741798 May 9 01:11 00000000000000000000.log
-rw-r--r-- 1 root root 2452080 May 9 01:11 00000000000000000000.timeindex
-rw-r--r-- 1 root root 10485760 May 9 01:41 00000000000011386211.index
-rw-r--r-- 1 root root 265066873 May 9 01:41 00000000000011386211.log
-rw-r--r-- 1 root root 10 May 9 01:11 00000000000011386211.snapshot
-rw-r--r-- 1 root root 10485756 May 9 01:41 00000000000011386211.timeindex
-rw-r--r-- 1 root root 8 May 8 22:32 leader-epoch-checkpoint
#真实消息存储
[root@localhost my-topic123-0]# /root/kafka_2.13-2.8.2/bin/kafka-dump-log.sh --files ./00000000000011386211.log --print-data-log |more
OpenJDK 64-Bit Server VM warning: If the number of processors is expected to increase from one, then you should configure the number of parallel GC threads appropriately using -XX:ParallelGCThreads=N
Dumping ./00000000000011386211.log
Starting offset: 11386211
baseOffset: 11386211 lastOffset: 11386225 count: 15 baseSequence: -1 lastSequence: -1 producerId: -1 producerEpoch: -1 partitionLeaderEpoch: 0 isTransactional: false isControl: false position: 0 CreateTime: 1746724313903 size: 1441 magic: 2 compresscodec: NONE crc: 3132510745 isvalid: true
| offset: 11386211 isValid: true crc: null keySize: -1 valueSize: 83 CreateTime: 1746724313899 baseOffset: 11386211 lastOffset: 11386225 baseSequence: -1 lastSequence: -1 producerEpoch: -1 partitionLeaderEpoch: 0 batchSize: 1441 magic: 2 compressType: NONE position: 0 sequence: -1 headerKeys: [] payload: {"timestamp": "2025-05-09 01:11:53", "count": 34152860,
"data": "Message-34152860"}
| offset: 11386212 isValid: true crc: null keySize: -1 valueSize: 83 CreateTime: 1746724313899 baseOffset: 11386211 lastOffset: 11386225 baseSequence: -1 lastSequence: -1 producerEpoch: -1 partitionLeaderEpoch: 0 batchSize: 1441 magic: 2 compressType: NONE position: 0 sequence: -1 headerKeys: [] payload: {"timestamp": "2025-05-09 01:11:53", "count": 34152861,
"data": "Message-34152861"}
| offset: 11386213 isValid: true crc: null keySize: -1 valueSize: 83 CreateTime: 1746724313899 baseOffset: 11386211 lastOffset: 11386225 baseSequence: -1 lastSequence: -1 producerEpoch: -1 partitionLeaderEpoch: 0 batchSize: 1441 magic: 2 compressType: NONE position: 0 sequence: -1 headerKeys: [] payload: {"timestamp": "2025-05-09 01:11:53", "count": 34152862,
"data": "Message-34152862"}
| offset: 11386214 isValid: true crc: null
#偏移量offset记录
[root@localhost my-topic123-0]# /root/kafka_2.13-2.8.2/bin/kafka-dump-log.sh --files ./00000000000011386211.index --print-data-log |more
OpenJDK 64-Bit Server VM warning: If the number of processors is expected to increase from one, then you should configure the number of parallel GC threads appropriately using -XX:ParallelGCThreads=N
Dumping ./00000000000011386211.index
offset: 11386508 position: 26767
offset: 11386567 position: 32194
offset: 11386626 position: 37437
offset: 11386676 position: 42159
#时间戳和偏移量的记录
[root@localhost my-topic123-0]# /root/kafka_2.13-2.8.2/bin/kafka-dump-log.sh --files ./00000000000011386211.timeindex --print-data-log |more
OpenJDK 64-Bit Server VM warning: If the number of processors is expected to increase from one, then you should configure the number of parallel GC threads appropriately using -XX:ParallelGCThreads=N
Dumping ./00000000000011386211.timeindex
timestamp: 1746724314114 offset: 11386508
timestamp: 1746724314170 offset: 11386567
timestamp: 1746724314210 offset: 11386626
timestamp: 1746724314236 offset: 11386676
timestamp: 1746724314263 offset: 11386721
timestamp: 1746724314301 offset: 11386767
当然我们一般不会使用脚本去读取这些信息,我这里通过Kafka自带的脚本只是为了方便理解这里的信息。
000000000000005.log
)。.index
和 .timeindex
)。.index
文件快速定位消息在日志文件中的物理位置。保留最近 N 天的数据(默认 7 天),由 log.retention.hours 控制。
log.cleanup.policy=delete
控制)。log.cleanup.policy=compact
控制)。