编程辅导 www.cardiff.ac.uk/medic/irg-clinicalepidemiology – cscodehelp代写
www.cardiff.ac.uk/medic/irg-clinicalepidemiology
Concurrency control
Copyright By cscodehelp代写 加微信 cscodehelp
Information modelling
& database systems
Concurrency
large databases are used by many users
many users many transactions
if these transactions are run sequentially, then long transactions will make others wait for long periods
therefore, it is desirable to let transactions run concurrently
… but we need to preserve isolation
let C be a column
transaction A: decrease the value of C by 5, i.e. C := C – 5
transaction B: increase the value of C by 5, i.e. C := C + 5
What would be the effect on C after completing transactions A and B?
(C – 5) + 5 = C no change in the value of C
Problem: lost update
the final value of C has increased by 5 (i.e. C=12), but should not have changed (i.e. C=7–5+5=7)
Transaction A Time Transaction B e.g. C = 7
read(C) t1 7
C = C – 5 t2 2
t3 read(C) 7
t4 C = C + 5 12
write(C) t5 2
t6 write(C) 12
COMMIT t7 2
t8 COMMIT 12
occurs when two different transactions are trying to update the same cell at the same time
Problem: uncommitted update (“dirty read”)
the final value of C has not changed (i.e. C=7), but should have increased by 5 (i.e. C=12)
it should be as if transaction A never happened
Transaction A Time Transaction B e.g. C = 7
read(C) t1 7
C = C – 5 t2 2
write(C) t3 2
t4 read(C) 2
t5 C = C + 5 7
t6 write(C) 7
ROLLBACK t7 7
t8 COMMIT 7
occurs when a transaction reads data that has been modified by another transaction, but not yet committed
Problem: inconsistent analysis
SUM should be C + D = 2 + 9 = 11
Transaction A Time Transaction B e.g. C = 7, D = 4
read(C) t1 7
C = C – 5 t2 2
write(C) t3 2
t4 read(C) 2
t5 read(D) 4
t6 SUM = C + D 6
read(D) t7 4
D = D + 5 t8 9
write(D) 9
print(SUM) 6
occurs when a transaction reads several values trying to aggregate them, but another transaction updates some of them
Concurrency control
a transaction may be correct in itself, but it can still produce incorrect result if its execution is interfered by other transactions
concurrency control is about managing simultaneous execution of multiple transactions without them interfering with one another
to solve the problem, transactions use locks on shared data items before operating on them
Serialisability
a schedule is a time–ordered execution of operations from a set of concurrent transactions
a serial schedule is a schedule in which …
operations of individual transactions are executed consecutively
… and do not interleave with operations from other transactions
… and each transaction commits before another one is allowed to begin
Serial schedule
serial schedules are guaranteed to avoid interference and keep the database consistent
however, databases need concurrent access, which means interleaving operations from different transactions
Transaction A Time Transaction B e.g. C = 7
read(C) t1 7
C = C – 5 t2 2
write(C) t3 2
COMMIT t4 2
t5 read(C) 2
t6 C = C + 5 7
t7 write(C) 7
t8 COMMIT 7
Serialisability
two schedules are equivalent if they always have the same effect
a schedule is serialisable if it is equivalent to some serial schedule
if two transactions only read some data items, then the order is which they do it is not important
if transaction A reads and updates a data item C and transaction B reads and updates a different data item D, then again they can be scheduled in any order
Serial vs. serialisable
if two transactions only read some data items,
then the order is which they do it is not important
interleaved schedule
serial schedule
Transaction Operation
Transaction Operation
this schedule is serialisable
Conflict serialisability
Do two transactions have a conflict?
NO if they refer to different resources
NO if they only read
YES if at least one is a write and
they use the same resource
a schedule is conflict serialisable if transactions in the schedule have a conflict, but the schedule is still serialisable
Conflict serialisable schedule
interleaved schedule
serial schedule
Transaction Operation
A write(X)
B write(X)
A write(Y)
B write(Y)
this schedule is serialisable even though A and B read and write the same resources X and Y, i.e. they have a conflict
Transaction Operation
A write(X)
A write(Y)
B write(X)
B write(Y)
Conflict serialisable schedules
the main focus of concurrency control
they allow for interleaving and at the same time they are guaranteed to behave like serial schedules
important questions:
How to determine whether a schedule is conflict serialisable?
How to construct conflict serialisable schedules?
Precedence graphs
to determine whether a schedule is serialisable or not, we build a precedence graph
nodes are transactions
edges are precedence: there is an edge from A to B if A must happen before B in any equivalent serial schedule
the schedule is serialisable if there are no cycles in its precedence graph
Precedence
let A and B be two transactions
let a be an action of A and b is an action of B
A takes precedence over B if:
a is ahead of b in the schedule
both a and b involve the same resource R
at least one of a and b is a write action
in other words, A B if:
A read(R) followed by B write(R)
A write(R) followed by B read(R)
A write(R) followed by B write(R)
remember, A B if:
A read(R) followed by B write(R)
A write(R) followed by B read(R)
A write(R) followed by B write(R)
Transaction Action
A write(Y)
D write(X)
this schedule is serialisable
remember, A B if:
A read(R) followed by B write(R)
A write(R) followed by B read(R)
A write(R) followed by B write(R)
Transaction Action
A write(Y)
D write(X)
A write(Z)
this schedule is not serialisable
locking is a procedure used to control concurrent access to data by ensuring serialisability of concurrent transactions
in order to use a resource (e.g. table, row, etc.) a transaction must first acquire a lock on that resource
this may deny access to other transactions to prevent incorrect results
two types of locks:
read lock (shared lock or S–lock)
write lock (exclusive lock or X–lock)
read lock allows several transactions simultaneously to read a resource, but no transactions can change it at the same time
write lock allows one transaction exclusive access to write to a resource and no other transaction can read this resource at the same time
the lock manager in the DBMS assigns locks and records them in the data dictionary
Concurrency control by locking
let T be a transaction and R be a resource
if T holds a write lock on R, then no other transactions may lock R
if T holds a read lock on R, then no other transactions may write lock A
T must acquire a read lock on R before reading R
T must acquire a write lock on R before writing R
after using a lock on R, T must release the lock in order to free up R
if the requested lock is not available, transaction waits
Two–phase locking
a transaction follows the two-phase locking protocol (2PL) if all locking operations precede the first unlock operation in the transaction
two phases:
growing phase where locks are acquired on resources
shrinking phase where locks are released
A follows 2PL protocol
all of its locks are acquired before any of them is released
B does not follow 2PL
it releases its lock on X and then goes on acquire a lock on Y
Transaction A Transaction B
read-lock(X) read-lock(X)
read(X) read(X)
write-lock(Y) unlock(X)
unlock(X) write-lock(Y)
read(Y) read(Y)
Y = Y + X Y = Y + X
write(Y) write(Y)
unlock(Y) unlock(Y)
Serialisability theorem
Any schedule of two–phased transactions is conflict serialisable.
Lost update cannot happen with 2PL
Comment Transaction A Transaction B Comment
read–lock(C) read(C)
read(C) read–lock(C)
cannot acquire write–lock(C) because B has read–lock(C) write(C)
write(C) cannot acquire write–lock(C) because A has read–lock(C)
Uncommitted update cannot happen with 2PL
Comment Transaction A Transaction B Comment
read–lock(C) read(C)
write–lock(C) write(C)
read(C) waits for A
to release
write–lock(C)
locks released ROLLBACK
Inconsistent analysis cannot happen with 2PL
Comment Transaction A Transaction B Comment
read–lock(C) read(C)
write–lock(C) write(C)
read(C) waits for A
to release
write–lock(C) and later write–lock(D)
sum = C + D
read–lock(D) read(D)
write–lock(D) write(D)
deadlock detection
deadlock prevention
timestamping
the use of locks solves one problem (serialising schedules)
… but introduces another (deadlocked schedules)
deadlock is a situation in which two or more transactions are in a simultaneous wait state, each waiting for others to release a lock
transaction A has a lock on a resource C and is waiting for a lock on a resource D
transaction B has a lock on a resource D and is waiting for a lock on a resource C
https://www.youtube.com/watch?v=lSpNhAX7DNY
Wait–for graphs
given a schedule, potential deadlocks can be detected using a wait–for graph (WFG)
nodes are transactions
there is an edge from transaction B to transaction A if B waits for A, i.e.
A holds a lock on a resource R
B is waiting for a lock on the resource R
B cannot get the lock on R unless A releases it
if the graph does not contain cycles, then the schedule will not be deadlocked
transaction B waits for A in any of the following scenarios:
A read–locks R, then B tries to write–lock it
A write–locks R, then B tries to read–lock it
A write–locks R, then B tries to write–lock it
the transactions will be deadlocked
Transaction Action Lock Waits for
A write(P) write–lock(P)
C read(S) read–lock(S)
D read(R) read–lock(R)
C read(P) A
B write(R) D
C read(S) read–lock(S)
B read(Q) read–lock(Q)
D write(P) A
A read(Q) read–lock(Q)
A write(Q) B
B read(S) read–lock(S)
Deadlock prevention
deadlocks can arise with two–phase locking
deadlock is less of a problem than an inconsistent database
we can detect and recover from deadlock
… but it would be nice to avoid it altogether
e.g. conservative two–phase locking
all locks must be acquired before the transaction starts
low ‘lock utilisation’ – transactions can hold on to locks for a long time, but not effectively use them much
Deadlock prevention
deadlocks may be prevented if transactions are to lock resources in some arbitrary but fixed order
we impose an ordering on the resources
transactions must acquire locks in this order
transactions can be ordered on the last resource they locked
this prevents deadlock
if B is waiting for a resource from A then that resource must come after all of A’s current locks
all edges in the wait–for graph point ‘forwards’, so no cycles
Resource ordering
let the resource order be X < Y, i.e.
if a transaction need locks on X and Y, it will first acquire a lock on X and only afterwards a lock on Y
it is impossible to end up in a situation where:
B is waiting for a lock on X held by A, and
A is waiting for a lock on Y held by B
therefore, no deadlocks
Timestamping
Timestamping
transactions can run concurrently using a variety of techniques
we previously looked at using locks to prevent interference
an alternative technique is timestamping
requires less overhead in terms of tracking locks or detecting deadlocks
determines the order of transactions before they are executed
Timestamping
each transaction has a timestamp, TS
if transaction A starts before transaction B,
then TS(A) < TS(B)
timestamps can be generated using the system clock or an incrementing counter
each resource X has two timestamps:
R(X) the largest timestamp of any transaction
that has read X
W(X) the largest timestamp of any transaction
that has written X
Timestamp protocol
let T be a transaction and X be a resource
If T tries to read X, then
if TS(T) < W(X), then T is rolled back and restarted with a later timestamp
otherwise the read succeeds and R(X) is set to be max(R(X), TS(T))
if T tries to write X, then
if TS(T) < W(X) or TS(T) < R(X), then T is rolled back and restarted with a later timestamp
otherwise the write succeeds and W(X) is set to TS(T)
let A and B be two transactions
we assume that:
the transactions make alternate actions
timestamps are allocated from a counter starting with 1
A goes first
transactions
transactions
no W(X) stamp, so A succeeds in read(X)
R(X) is set to TS(A), which is 1
transactions
no W(X) stamp, so B succeeds in read(X)
R(X) is set to max(R(X), TS(B)) = max(1, 2) = 2
transactions
no W(Y) stamp, so A succeeds in read(Y)
R(Y) is set to TS(A), which is 1
transactions
no W(Y) stamp, so B succeeds in read(Y)
R(Y) is set to max(R(Y), TS(B)) = max(1, 2) = 2
1 Y = Y + X
transactions
no reading or writing, so no change in timestamps
1 Y = Y + X
2 Z = Y – X
transactions
no reading or writing, so no change in timestamps
2 Z = Y – X
1 write(Y)
transactions
TS(A) = 1 < 2 = read(Y)
transactions
A is rolled back and restarted with a later timestamp
2 write(Z)
transactions
B succeeds to write(Z)
W(Z) is set to TS(B), which is 2
transactions
A succeeds in read(X)
R(X) is set to TS(A), which is 3
transactions
A succeeds in read(Y)
R(Y) is set to TS(A), which is 3
3 Y = Y + X
transactions
no reading or writing, so no change in timestamps
3 write(Y)
transactions
A succeeds to write(Y)
W(Y) is set to TS(A), which is 3
Timestamping
transactions with higher timestamps take precedence
equivalent to running transactions in order of their final time values
no waiting, no deadlock
disadvantages
long transactions might keep getting restarted by new transactions
rolls back old transactions, which may have done a lot of work
/docProps/thumbnail.jpeg
程序代写 CS代考 加微信: cscodehelp QQ: 2235208643 Email: kyit630461@163.com