(資料圖片僅供參考)
從原理上來看,MVCC需要給定事務(wù)ID后,能查詢到事務(wù)的狀態(tài)。
在PG中事務(wù)狀態(tài)可以從幾個路徑獲取:
在快照中查詢(活躍事務(wù))在元組頭的狀態(tài)為查詢(不活躍事務(wù))在CLOG中查詢(不活躍事務(wù))如果不看實現(xiàn)只看概念,不活躍事務(wù)提交狀態(tài)也可以在XLOG中查詢,CLOG可以視作一種XLOG commit/rollback日志的緩存、映射,一種事務(wù)提交狀態(tài)的快速查詢方式。
所以在write-WAL-before-data中,CLOG也會按照data來處理,只有XLOG屬于WAL。
postgresql中clog使用SLRU機制讀寫,在Slru寫盤前,會有保證xlog先寫的機制:
group_lsn表示32個事務(wù)一組中最大的日志序列號(LSN)。group_lsn主要用于事務(wù)提交非同步落盤的場景。static boolSlruPhysicalWritePage(SlruCtl ctl, int pageno, int slotno, SlruWriteAll fdata){...if (shared->group_lsn != NULL){/* * We must determine the largest async-commit LSN for the page. This * is a bit tedious, but since this entire function is a slow path * anyway, it seems better to do this here than to maintain a per-page * LSN variable (which"d need an extra comparison in the * transaction-commit path). */XLogRecPtrmax_lsn;intlsnindex,lsnoff;lsnindex = slotno * shared->lsn_groups_per_page;max_lsn = shared->group_lsn[lsnindex++];for (lsnoff = 1; lsnoff < shared->lsn_groups_per_page; lsnoff++){XLogRecPtrthis_lsn = shared->group_lsn[lsnindex++];if (max_lsn < this_lsn)max_lsn = this_lsn; <<<<<<<<<<<<<<<<<<<<<<<<< 找到最大的LSN}if (!XLogRecPtrIsInvalid(max_lsn)){/* * As noted above, elog(ERROR) is not acceptable here, so if * XLogFlush were to fail, we must PANIC. This isn"t much of a * restriction because XLogFlush is just about all critical * section anyway, but let"s make sure. */START_CRIT_SECTION();XLogFlush(max_lsn); <<<<<<<<<<<<<<<<<<<<<<<<< 先保證XLOG寫到這個位點!END_CRIT_SECTION();}} ... if (pg_pwrite(fd, shared->page_buffer[slotno], BLCKSZ, offset) != BLCKSZ) { ... }}
數(shù)據(jù)頁面同理,也是先找到頁面lsn,刷xlog,在寫數(shù)據(jù)。
static voidFlushBuffer(BufferDesc *buf, SMgrRelation reln){...buf_state = LockBufHdr(buf);/* * Run PageGetLSN while holding header lock, since we don"t have the * buffer locked exclusively in all cases. */recptr = BufferGetLSN(buf); <<<<<<<<<<<<<<<<<<<<<<<<< 找到頁面的LSN/* To check if block content changes while flushing. - vadim 01/17/97 */buf_state &= ~BM_JUST_DIRTIED;UnlockBufHdr(buf, buf_state);/* * Force XLOG flush up to buffer"s LSN. This implements the basic WAL * rule that log updates must hit disk before any of the data-file changes * they describe do. * * However, this rule does not apply to unlogged relations, which will be * lost after a crash anyway. Most unlogged relation pages do not bear * LSNs since we never emit WAL records for them, and therefore flushing * up through the buffer LSN would be useless, but harmless. However, * GiST indexes use LSNs internally to track page-splits, and therefore * unlogged GiST pages bear "fake" LSNs generated by * GetFakeLSNForUnloggedRel. It is unlikely but possible that the fake * LSN counter could advance past the WAL insertion point; and if it did * happen, attempting to flush WAL through that location would fail, with * disastrous system-wide consequences. To make sure that can"t happen, * skip the flush if the buffer isn"t permanent. */if (buf_state & BM_PERMANENT)XLogFlush(recptr); <<<<<<<<<<<<<<<<<<<<<<<<< 先保證XLOG寫到這個位點! ...smgrwrite(reln, BufTagGetForkNum(&buf->tag), buf->tag.blockNum, bufToWrite, false); ...}