博客
关于我
kafka日志存储(五):LogSegment
阅读量:250 次
发布时间:2019-03-01

本文共 3132 字,大约阅读时间需要 10 分钟。

为了防止Log文件过大,Log文件会被切分成多个日志文件,每个日志文件对应一个LogSegment。在LogSegment中,封装了FileMessageSet和OffsetIndex对象。LogSegment类的实现如下:

LogSegment类结构

class LogSegment(val log: FileMessageSet,                 val index: OffsetIndex,                 val baseOffset: Long,                 val indexIntervalBytes: Int,                 val rollJitterMs: Long,                 time: Time) extends Logging {    private var bytesSinceLastIndexEntry = 0  var created: Long = time.milliseconds}

append方法

def append(offset: Long, messages: ByteBufferMessageSet): Unit = {  if (messages.sizeInBytes > 0) {    trace("Inserting %d bytes at offset %d at position %d".format(      messages.sizeInBytes, offset, log.sizeInBytes()))        if (bytesSinceLastIndexEntry > indexIntervalBytes) {      index.append(offset, log.sizeInBytes())      this.bytesSinceLastIndexEntry = 0    }        log.append(messages)    this.bytesSinceLastIndexEntry += messages.sizeInBytes  }}

read方法

def read(  startOffset: Long,  maxOffset: Option[Long],  maxSize: Int,  maxPosition: Long = size): FetchDataInfo = {  if (maxSize < 0) {    throw new IllegalArgumentException("Invalid max size for log read (%d)".format(maxSize))  }    val logSize = log.sizeInBytes  val startPosition = translateOffset(startOffset)    if (startPosition == null) {    return null  }    val offsetMetadata = new LogOffsetMetadata(    startOffset,     this.baseOffset,     startPosition.position  )    if (maxSize == 0) {    return FetchDataInfo(offsetMetadata, MessageSet.Empty)  }    val length = maxOffset match {    case None =>      min((maxPosition - startPosition.position).toInt, maxSize)    case Some(offset) =>      if (offset < startOffset) {        return FetchDataInfo(offsetMetadata, MessageSet.Empty)      }            val mapping = translateOffset(offset, startPosition.position)      val endPosition = if (mapping == null) {        logSize      } else {        mapping.position      }            min(min(maxPosition, endPosition) - startPosition.position, maxSize).toInt  }    FetchDataInfo(offsetMetadata, log.read(startPosition.position, length))}

recover方法

def recover(maxMessageSize: Int): Int = {  index.truncate()  index.resize(index.maxIndexSize)    var validBytes = 0  var lastIndexEntry = 0    val iter = log.iterator(maxMessageSize)    try {    while (iter.hasNext) {      val entry = iter.next      entry.message.ensureValid()            if (validBytes - lastIndexEntry > indexIntervalBytes) {        val startOffset = entry.message.compressionCodec match {          case NoCompressionCodec =>            entry.offset          case _ =>            ByteBufferMessageSet.deepIterator(entry).next().offset        }                index.append(startOffset, validBytes)        lastIndexEntry = validBytes      }            validBytes += MessageSet.entrySize(entry.message)    }  } catch {    case e: CorruptRecordException =>      logger.warn("Found invalid messages in log segment %s at byte offset %d: %s.".format(        log.file.getAbsolutePath, validBytes, e.getMessage))  }    val truncated = log.sizeInBytes - validBytes  log.truncateTo(validBytes)  index.trimToValidSize()  truncated}

转载地址:http://gjxx.baihongyu.com/

你可能感兴趣的文章
Pandas、Matplotlib、Pyecharts数据分析实践
查看>>
Pandas中文官档 ~ 基础用法1
查看>>
Pandas中文官档~基础用法2
查看>>
Pandas中文官档~基础用法5
查看>>
Pandas中文官档~基础用法6
查看>>
Pandas中的GROUP BY AND SUM不丢失列
查看>>
Pandas之iloc、loc
查看>>
pandas交换两列
查看>>
pandas介绍-ChatGPT4o作答
查看>>
pandas去除Nan值
查看>>
pandas实战:电商平台用户分析
查看>>
Pandas库函数
查看>>
Pandas库常用方法、函数集合
查看>>
pandas打乱数据的顺序
查看>>
pandas指定列数据归一化
查看>>
pandas改变一列值(通过apply)
查看>>
Pandas数据分析的环境准备
查看>>
Pandas数据可视化怎么做?用实战案例告诉你!
查看>>
Pandas数据处理与分析教程:从基础到实战
查看>>
Pandas数据结构之DataFrame常见操作
查看>>