Learning Notes on Designing Data-Intensive Applications (vii)
Chapter 7. Transactions
This is a series of learning notes on Designing Data-Intensive Applications.
A way for an application to group several reads and writes into one single unit. All the reads and writes in a transaction are executed as one operation: either the entire operation succeeds (commit) or it fails (abort, rollback). It was created to simplify the programming model for applications accessing a database
If one transactions fails, it can safely retry, and there’s no partial failure.
Often used to describe the safety guarantees provided by transactions.
- Atomicity. Something that cannot be broken down into smaller parts, aka. when a client wants to make several writes, but a fault occurs, the whole transaction aborts and the client can safely retry. Abortability would have been a better term than atomicity.
- Consistency. Invariants on your data must always be true. The idea of consistency depends on the application’s notion of invariants and it’s the responsibility of the application to ensure that. Atomicity, isolation, and durability are properties of the database, whereas consistency is a property of the application.
- Isolation. Concurrently executing transactions are isolated from each other. It’s also called serializability as each transaction can pretend that it is the only transaction running on the entire database, and the result is the same as if they had run serially(True serializability carries performance penalty, sometimes a weaker verison e.g. snapshot isolation is used).
- Durability. Once a transaction has committed successfully, any data it has written will not be forgotten, even if there is a hardware fault or the database crashes. In a single-node database this means the data has been written to nonvolatile storage,e.f. hard drive. In a replicated database it means the data is copied to some number of nodes.
ACID in single vs. multi-object operations
Storage engines aim to provide atomicity and isolation on the level of single object. Atomicity can be implemented using log for crash recovery, and isolation using a lock on each object.
Multi-object transactions might be needed when a foreign key references to a row in another table, or when denormalized information and secondary indexes needs to be updated. However, many distributed datastores abandoned multi-object transactions because they are difficult to implement across partitions.
2. Weak isolation levels
Concurrency issues come into play when one transaction reads data that is concurrently modified by another transaction, or when two transactions try to simultaneously modify the same data.
Databases have long tried to hide concurrency issues to applications by providing transaction isolation, e.g. serializable isolation, but it has a performance cost. It’s common for systems to use weaker levels of isolation.
Weak isolation levels used in practice:
- Dirty Reads
- Dirty Writes
- Implementing read committed
Snapshot Isolation and Repeatable Read
- Implementing snapshot isolation
- Indexes and snapshot isolation
- Repeatable read and naming confusion
Preventing Lost Updates
- Automatically detecting lost updates
- Conflict resolution and replication
Write Skew and Phantoms
- Materializing Conflicts
A. Read committed
It makes two guarantees:
- No dirty reads: When reading from the database, you will only see data that has been committed. Writes by a transaction only become visible to others when that transaction commits.
- No dirty writes: When writing to the database, you will only overwrite data that has been committed. Dirty writes are prevented usually by delaying the second write until the first write’s transaction has committed or aborted.
Most databases prevent dirty writes by using row-level locks that hold the lock until the transaction is committed or aborted. Only one transaction can hold the lock for any given object.
On dirty reads, requiring read locks does not work well in practice as one long-running write transaction can force many read-only transactions to wait. For every object that is written, the database remembers both the old committed value and the new value set by the transaction that currently holds the write lock. While the transaction is ongoing, any other transactions that read the object are simply given the old value.
B. Snapshot isolation and repeatable read
With the read committed isolation level, there is still room for concurrency bugs, e.g. non-repeatable read or a read skew, means a value that has been read by a still in-flight transaction might be overwritten by another transaction. (A read only transaction in a presence of concurrent writes)
This happens because the read committed isolation only applies a lock on values that are about to be modified but not on read.Thus, a long running read-only transaction can have situations where the value of an object or multiple objects changes between when the transaction starts and when it ends.
There are situations that cannot tolerate such inconsistencies:
- Backups: During the time that the backup process is running, which normally takes long time, writes will continue to be made to the database.
- Analytic queries and integrity checks: You may get nonsensical results if they observe parts of the database at different points in time.
Definition: Snapshot isolation is the most common solution where each transaction reads from a consistent snapshot of the database, aka. a transaction will only see all the data that was committed in the database at the start of the transaction. Readers never block writers, and writers never block readers.
To implement snapshot isolation, databases potentially keep multiple different committed version of the same object, aka. multi-version concurrency control (MVCC). Contrary to read committed isolation where only 2 copies of an object are kept — the committed version and the overwritten-but-uncommitted version.
MVCC-based snapshot isolation is implemented by given each transaction a unique, always-increasing transaction ID. Any writes to the database by a transaction are tagged with the transaction ID of the writer. Each row in the table is tagged with a created_by and deleted_by field which has the transaction ID that performed the creation or deletion (when applicable).
C. Preventing lost updates
This might happen with concurrent writes, if an application reads some value from the database, modifies it, and writes it back (read-modify-write). If two transactions do this concurrently, one of the modifications can be lost (later write clobbers the earlier write).
Atomic write operations:
A solution to avoid read-modify-write cycles and provide atomic operations:
UPDATE counters SET value = value + 1 WHERE key = 'foo'
Atomic operations are implemented by taking an exclusive lock on the object at read to prevent any other transaction from reading it until the update has been applied.This force the read-modify-write cycle to happen sequentially.
Automatically detecting lost updates:
Another solution is to allow transactions to execute in parallel, if the transaction manager detects a lost update, abort the transaction and force it to retry its read-modify-write cycle.
A solution when an operation wants to update a value, it reads the previous value and only completes if the value at the time of the update is the same as the value it read earlier.If the current value does not match with a previous read, the update has no effect.
UPDATE wiki_pages SET content = 'new content'
WHERE id = 1234 AND content = 'old content';
Conflict resolution and replication:
With multi-leader or leaderless replication, compare-and-set do not apply, as there can be multiple copies of data across nodes at the same time.
A common approach in replicated databases is to allow concurrent writes to create several conflicting versions of a value (also know as siblings), and to use application code to resolve and merge these versions after the fact.
D. Write skew and phantoms
Write skew happens if two transactions read from the same objects, and then update some of those objects (as different transactions may update different objects). This leads to a dirty write or lost update anomaly.
Examples: user name uniqueness constraint; prevent double-spending.
- A select query checks if some constraint is met;
- Depends on result of first query, the application code decides whether go ahead
- If so, then a write command is issued.
The effect, where a write in one transaction changes the result of a search query in another transaction, is called a phantom.
Ways to prevent write skew are a bit more restricted:
- Atomic operations don’t help as things involve more objects.
- Automatically prevent write skew requires true serializable isolation.
- The second-best option in this case is probably to explicitly lock the rows that the transaction depends on. Unfortunately, if the original query returns no rows (say it’s checking for the absence of rows matching a condition), we can’t attach locks to anything.
The approach of taking a phantom and turning it into a lock conflict on a concrete set of rows introduced in the database is known as materializing conflicts.
This is the strongest isolation level. It guarantees that even though transactions may execute in parallel, the end result is the same as if they had executed one at a time, serially.
There are 3 techniques:
- Executing transactions in serial order in real;
- Two-phase locking;
- Serializable snapshot isolation
A. Actual serial execution
This is only possible recently as:
- RAM become cheap enough to fit an entire dataset in memory.
- OLTP transactions are usually short and make only a small number of reads and writes vs. OLAP transactions
This execution requires encapsulating transactions in stored procedures., aka. submitting the entire transaction code to the database ahead of execution. As with interactive transaction, a lot of time is spent in network communication.
Executing all transactions serially limits the transaction throughput to the speed of a single CPU.
In order to scale to multiple CPU cores you can potentially partition your data and each can have its own transaction processing thread. But this requires coordinate across all the partitions and is vastly slower than single-partition transactions.
As a result, serial execution fits when:
- Each transaction is small and fast
- Active dataset can fit in memory
- Write throughput must be low enough to be handled on a CPU core
B. Two-phase locking (2PL)
In transaction processing, databases, and computer networking, the two-phase commit protocol (2PC) is a distributed algorithm that coordinates all the processes that participate in a distributed atomic transaction on whether to commit or abort the transaction (it is a specialized type of consensus protocol).
The two-phase commit (2PC) protocol should not be confused with the two-phase locking (2PL) protocol, a concurrency control protocol.
- A transaction cannot write a value that has been read by another transaction.
- A transaction cannot read a value that has been written by another transaction.
Unlike snapshot isolation, readers can block writers here, and writers can block readers. Blocking is implemented by a having lock on each object in the database, in either exclusive or shared mode.
First phase is when the locks are acquired, second phase is when all the locks are released. Several transactions are allowed to concurrently read the same object as long as nobody is writing to it. But as soon as anyone wants to write an object, exclusive access is required.
- To read an object, it must first acquire a lock in shared mode.
- To write to an object, it must first acquire the lock in exclusive mode.
- First reads and then writes an object, it may upgrade its shared lock to an exclusive lock.
- After a transaction has acquired the lock, it must continue to hold the lock until the end of the transaction (commit or abort).
It can happen that transaction A is stuck waiting for transaction B to release its lock, and vice versa (deadlock).
Bad as databases running 2PL can have unstable latencies, and they can be very slow at high percentiles. One slow transaction, or one transaction that accesses a lot of data and acquires many locks can cause the rest of the system to halt.
With phantoms, one transaction may change the results of another transaction’s search query. A predicate locks can prevent phantom by lock all the objects that meet a condition, even for objects that do not yet exist in the database (but can be added in the future). A condition is first run in a select query, and a predicate lock holds a lock on any objects that could meet that query.
Predicate locks do not perform well due to the overheads for checking for matching locks. A looser version is called index-range locks.
For example, if an index is hit in the original query, we could lock any writes to that index entry, even if it doesn’t match the condition.
C. Serializable snapshot isolation (SSI)
It provides full serializability and has a small performance penalty compared to snapshot isolation. SSI is fairly new.
This comes down to the difference between pessimistic versus optimistic concurrency control.
- Two-phase locking is pessimistic concurrency control because if anything might possibly go wrong, it’s better to wait.
- Serial execution is also pessimistic as is equivalent to each transaction having an exclusive lock on the entire database.
- Serializable snapshot isolation is optimistic concurrency control. Instead of blocking if something potentially dangerous happens, transactions continue anyway, in the hope that everything will turn out all right. The database is responsible for checking whether anything bad happened. If so, the transaction is aborted and has to be retried.
SSI is based on snapshot isolation, reads within a transaction are made from a consistent snapshot of the database. On top of snapshot isolation, SSI adds an algorithm for detecting serialization conflicts among writes and determining which transactions to abort.
Example Scenario: 2 bank accounts, 500 each, Aa, Bb.
- Dirty reads (client reads the uncommitted data from other client): B deposits 100 in account Aa, A will not see Aa account with 600 until B’s transaction completed, and only sees 500 while the transaction is happening.
- Dirty writes (client overwrites the uncommitted data from other client): B deposits 100 in account Aa, A will not be able to deposit into Aa account until B’s transaction completed.
- Read skew (client sees different parts of the db at different time) B transfers 100 from account Aa to Bb (deduct in Aa + addition in Bb), A reads Aa before transaction in Aa starts (500), and somehow reads after Bb finishes (600), so now total is 1100; or A reads Aa after Aa completes (400), and before Bb finishes (500), so now total is 900. → MVCC
- Lost update (concurrent read-modify-writes problem when writes A overwrites writes B): A program to get current balance of Aa, add 100, and write it back. If A and B both run the program, it can happen when A reads 500, B reads 500, B writes the value 600 back, and A also writes 600 back (but it should be 700). → snapshot isolation/manula lock
- Write skew(A generalisation of lost update): Two writes happen concurrently and each succeeds but ends breaking consistency. E.g. a constraint that at least one account has to be above 0 but if A and B both withdraw 500 from Aa, Bb separately, at the time of checking balance, the condition is met, but after withdrawn, the constraint is broken. → serialisable isolation
- Phantom reads ( client reads objects that match some conditions while another client writes objects that affects the result of the read).