출처: https://3months.tistory.com/307 [Deep Play]

3-2/운영체제

Lock-based Concurrent Data Structures

코딩하는 랄뚜기 2021. 11. 13. 17:40

Lock-based Concurrent Data structure

  • Adding locks to a data structure makes the structure thread safe.
    • How locks are added determine both the correctness and performance of the data structure.

여기서 알아두면 좋은 것은 correctness와 performance가 trade off 관계에 있다는 것이다.


Example : Concurrent Counters without Locks

 

typedef struct __counter_t{
    int value;
    pthread_lock_t lock;
} counter_t;

void init(counter_t *c){
    c->value = 0;
    Pthread_mutex_init(&c->lock,NULL);
}

void increment(counter_t *c){
    Phread_mutex_lock(&c->lock);
    c->value++;
    Pthread_mutex_unlock(&c->lock);
}

void decrement(counter_t *c){
    Pthread_mutex_lock(&c->lock);
    c->value--;
    Pthread_mutex_unlock(&c->lock);
}

int get(counter_t *c){
    Pthread_mutex_lock(&c->lock);
    int rc = c->value;
    Pthread_mutex_unlock(&c->lock);
    return rc;
}
  • Add a single lock.
    • The lock is acquired when calling a routine that manipulates the data structure.

Counter를 concurrent하게 구현하기 위해서는 data에 접근 할 때마다 lock을 걸어줘야 한다.


The peformance costs of the simple approach

  • Each thread updates a single shared counter.
    • Each thread updates the counter one million times.
    • iMac with four Intel 2.7GHz i5 CPUs.

백만개를 Concurrent하게 count한다고 하였을 때, 한 개를 셀 때마다 update를 일일히 해주면 각각의 thread에서 백만번의 update가 일어나므로 performance가 상당히 떨어지는 것을 확인 할 수 있다.


Sloppy counter

  • The sloppy counter works by representing...
    • A single logical counter via numerous local physical counters, on per CPU core.
    • A single global counter
    • There are locks : One for each local counter and one for the global counter.
  • Example : on a machine with four CPUs
    • Four local counters
    • One global counter

Sloppy counter의 개념은 한 thread에서 셀 때마다 update를 해주는 것이 아니라 어느 지정된 값(threshold)에 도달하였을 때만 update를 해준다는 의미이다. 또, 각각의 thread의 counter마다 local lock이 필요하고, 한 개의 global lock이 필요하다.

 

Tracing the Sloppy Counters(S=5)


Importance of the threshold value S

  • Each of four threads increments a counter 1 milion times on four CPUs
    • Low S -> Performance is poor, The global count is always quite accurate.
    • High S -> Performance is excellent, The global count lags.

global count의 값을 정확하게 할수록 performance가 떨어지고, 이를 부정확하게 할 수록 Performance가 좋아진다는 것을 기억하자.


Concurrent Linked List

//basic node structure
typedf struct __node_t{
    int key;
    struct __node_t *next;
} node_t;

//basic list structure (one used per list)
typedef struct __list_t{
    node_t *head;
    pthread_mutex_t lock;
} list_t;

void List_Init(list *L){
    L->head = NULL;
    pthread_mutex_init(&L->lock,NULL);
}

int List_Insert(list_t *L, int key){
    pthread_mutex_lock(&L->lock);
    node_t *new = malloc(sizeof(node_t));
    if(new == NULL){
        perror("malloc");
        pthread_mutex_unlock(&L->lock);
        return -1;
    }
    new->key=key;
    new->next=L->head;
    L->head=new;
    pthread_mutex_unlock(&L->lock);
    return 0;//success
}

int List_Lookup(list_t *L,int key){
    pthread_mutex_lock(&L->lock);
    node_t *curr = L->head;
    while(curr){
        if(curr->key==key){
            pthread_mutex_unlock(&L->lock);
            return 0; //success
        }
        curr = curr->next;
    }
    pthread_mutex_unlock(&L->lock);
    return -1; //failure
}

 

D

  • The code acquires a lock in the insert routine upon entry.
  • The code releases the lock upon exit.
    • If malloc() happens to fail, the code must also release the lcok before failing the insert.
    • This kind of exceptional control flow has been shown to be quite error prone.
    • Solution : The lock and release only surround the actual critical section in the insert code

Lock을 구현할 때는 반드시 임계 지점만을 골라서 구현해야 한다. 따라서 위의 코드는 다시 구현해주어야 한다.

void List_Insert(list_t *L, int key){
    // synchronization not needed
    node_t *new = malloc(sizeof(node_t));
    if(new == NULL){
        perror("malloc");
        return;
    }
    new->key=key;
    
    // just lock critical section
    pthread_mutex_lock(&L->lock);
    new->next = L->head;
    L->head = new;
    pthread_mutex_unlock(&L->lock);
}

int List_Lookup(list_t *L, int key){
    int rv = -1;
    pthread_mutex_lock(&L->lock);
    node_t *curr = L->head;
    while(curr){
        if(curr->key==key){
            rv=0;
            break;
        }
        curr = curr->next;
    }
    pthread_mutex_unlock(&L->lock);
    return rv; // now both success and failure
}

Scaling Linked List

  • Hand-over-hand locking (lock coupling)
    • Add a lock per node of the list instead of having a single lock for the entire list.
    • When traversing the list, first grabs the next node's lock. And then releases the current node's lock.
  • Enable a high degreee of concurrency in list operations.
    • However, in practice, the overheads of acquiring and releasing locks for each node of a list traversal is prohibitive.

한 개의 lock을 이용하여 concurrency를 구현하는 것이 아니라, 각각의 node마다 lock을 넣어서 concurrency를 구현하면 훨씬 더 완벽하게 구현된다. ( 다음 노드로 들어갈 때, 해당 노드에 lock을 걸고 현재 노드를 release한다. ) 하지만 위 처럼 구현하면 acquiring과 releasing의 비용이 너무 많이 들게 된다.


Michael and Scott Concurrent Queues

  • There are two locks.
    • One for the head of the queue.
    • One for the tail.
    • The goal of these two locks is to enable concurrency of enqueue and dequeue operations.
  • Add a dummy node
    • Allocated in the queue intialization code
    • Enable the seperation of head and tail operations


Concurrent Hash Table

  • Focus on a simple hash table
    • The hash table does not resize.
    • Built using the concurrent lists
    • It uses a lock per hash bucket each of which is represented by a list.

Performance of Concurrent Hash Table

  • From 10,000 to 50,000 concurrent updates from each of four threads.

The simple concurrent hash table scales magnificently.

'3-2 > 운영체제' 카테고리의 다른 글

Condition Variables  (0) 2021.11.15
Locks  (0) 2021.11.12
Thread API  (0) 2021.11.08
Concurrency  (0) 2021.11.08
LRU Implementation in more detail  (0) 2021.11.08