您的位置：首頁 >觀察 > 正文

天天觀察：PostgreSQL的clog屬于日志還是數(shù)據(jù)，需要遵守write-WAL-before-data嗎？

來源：騰訊云時間：2023-03-07 08:07:13

(資料圖片僅供參考)

總結(jié)

從原理上來看，MVCC需要給定事務(wù)ID后，能查詢到事務(wù)的狀態(tài)。

在PG中事務(wù)狀態(tài)可以從幾個路徑獲取：

在快照中查詢（活躍事務(wù)）在元組頭的狀態(tài)為查詢（不活躍事務(wù)）在CLOG中查詢（不活躍事務(wù)）

如果不看實現(xiàn)只看概念，不活躍事務(wù)提交狀態(tài)也可以在XLOG中查詢，CLOG可以視作一種XLOG commit/rollback日志的緩存、映射，一種事務(wù)提交狀態(tài)的快速查詢方式。

所以在write-WAL-before-data中，CLOG也會按照data來處理，只有XLOG屬于WAL。

Postgresql中clog寫盤實現(xiàn)SlruPhysicalWritePage

postgresql中clog使用SLRU機制讀寫，在Slru寫盤前，會有保證xlog先寫的機制：

group_lsn表示32個事務(wù)一組中最大的日志序列號（LSN）。group_lsn主要用于事務(wù)提交非同步落盤的場景。

static boolSlruPhysicalWritePage(SlruCtl ctl, int pageno, int slotno, SlruWriteAll fdata){...if (shared->group_lsn != NULL){/* * We must determine the largest async-commit LSN for the page. This * is a bit tedious, but since this entire function is a slow path * anyway, it seems better to do this here than to maintain a per-page * LSN variable (which"d need an extra comparison in the * transaction-commit path). */XLogRecPtrmax_lsn;intlsnindex,lsnoff;lsnindex = slotno * shared->lsn_groups_per_page;max_lsn = shared->group_lsn[lsnindex++];for (lsnoff = 1; lsnoff < shared->lsn_groups_per_page; lsnoff++){XLogRecPtrthis_lsn = shared->group_lsn[lsnindex++];if (max_lsn < this_lsn)max_lsn = this_lsn;    <<<<<<<<<<<<<<<<<<<<<<<<< 找到最大的LSN}if (!XLogRecPtrIsInvalid(max_lsn)){/* * As noted above, elog(ERROR) is not acceptable here, so if * XLogFlush were to fail, we must PANIC.  This isn"t much of a * restriction because XLogFlush is just about all critical * section anyway, but let"s make sure. */START_CRIT_SECTION();XLogFlush(max_lsn);      <<<<<<<<<<<<<<<<<<<<<<<<< 先保證XLOG寫到這個位點！END_CRIT_SECTION();}}  ...  if (pg_pwrite(fd, shared->page_buffer[slotno], BLCKSZ, offset) != BLCKSZ)  {    ...  }}

Postgresql中用戶數(shù)據(jù)寫盤實現(xiàn)FlushBuffer

數(shù)據(jù)頁面同理，也是先找到頁面lsn，刷xlog，在寫數(shù)據(jù)。

static voidFlushBuffer(BufferDesc *buf, SMgrRelation reln){...buf_state = LockBufHdr(buf);/* * Run PageGetLSN while holding header lock, since we don"t have the * buffer locked exclusively in all cases. */recptr = BufferGetLSN(buf);   <<<<<<<<<<<<<<<<<<<<<<<<< 找到頁面的LSN/* To check if block content changes while flushing. - vadim 01/17/97 */buf_state &= ~BM_JUST_DIRTIED;UnlockBufHdr(buf, buf_state);/* * Force XLOG flush up to buffer"s LSN.  This implements the basic WAL * rule that log updates must hit disk before any of the data-file changes * they describe do. * * However, this rule does not apply to unlogged relations, which will be * lost after a crash anyway.  Most unlogged relation pages do not bear * LSNs since we never emit WAL records for them, and therefore flushing * up through the buffer LSN would be useless, but harmless.  However, * GiST indexes use LSNs internally to track page-splits, and therefore * unlogged GiST pages bear "fake" LSNs generated by * GetFakeLSNForUnloggedRel.  It is unlikely but possible that the fake * LSN counter could advance past the WAL insertion point; and if it did * happen, attempting to flush WAL through that location would fail, with * disastrous system-wide consequences.  To make sure that can"t happen, * skip the flush if the buffer isn"t permanent. */if (buf_state & BM_PERMANENT)XLogFlush(recptr);         <<<<<<<<<<<<<<<<<<<<<<<<< 先保證XLOG寫到這個位點！    ...smgrwrite(reln,  BufTagGetForkNum(&buf->tag),  buf->tag.blockNum,  bufToWrite,  false);  ...}

關(guān)鍵詞 PostgreSQL