代写Data Structures | 算法代做 | Algorithms | C代写 – Data Structures and Algorithms
代写Data Structures | 算法代做 | Algorithms | C代写 – 这是一个利用C语言进行BST搜索算法的任务,对时间复杂度有要求,属于比较典型的数据结构题目
Data Structures and Algorithms
######## LECTURE 26
Motivation
######## Organization and retrieval of information is at the heart of most computer
######## applications, and searching is the most frequently performed task
Searching
If the array is not sorted, the search requires O( n ) time
If the value isnt there, we need to search all n elements
If the value is there, we search n /2 elements on average
If the array is sorted, we can do a binary search
A binary search requires O(log n ) time
About equally fast whether the element is found or not
It doesnt seem like we could do much better
How about constant time search, O(1)?
We can do it if the array is organized in a particular way
Consider the problem of searching an array for a given value
######## Hashing is a technique used for performing insertions, deletions, and
######## searches in constant average time
######## Tree operations that require any ordering information among the elements
######## are not supported efficiently
######## The Data structure used is the hash table(a generalization of ordinary arrays)
oSeveral method of implementing the hash table
oCompare these methods
Hashing
Hash table
######## Given a table T and a record x (a record has a keyand an entry), we want to
######## support only the dictionary operations:
oInsert(T, x)
oDelete(T, x)
oSearch(T, x)
######## Goal
Fast operations without sorting the records beforehand
What is a table?
######## A table is simply another term for an array
It has several fields (i.e. types of information)
e.g., a telephone book has field name, field address, field phonenumber ,etc.
Smith 124 Moody St. 967 - 583 - 3322
White 1 South St. 781 - 222 - 1111
...
...
######## To find an entry in the table, you only need to know the content of one of
######## the fields (not all of them)
o This field is called key
######## Ideally, a key uniquely identifies an entry
Hash table big idea
######## Banking application
Imagine we have a file Account Info, and we want to be able to use a customers account
number to quickly (constant time) bring up their info
Account # Name
1345 Anderson
5076 Baker
0014 Cooper
2015 Dunhill
8428 Erickson
1007 Fourier
We could use an array , of length 10,000, and simply use the account
######## numbers to index directly into the array
Direct-address table
######## Direct-address tables are ordinary arrays
######## Facilitate direct addressing
o Element whose key is k is obtained by indexing into the k thposition of the array
######## It is applicable when we can afford to allocate an array with one position for
######## every possible key
######## Insert/Delete/Search can be implemented to take O(1)time
####### 11/9/
Direct-address table
######## We have n elements that need to be stored
######## Universe U = {0, 1, …, m 1} of keys
m >= n
######## Direct-address table
T [0 ... m 1]
Each slot has a key k from K (actual keys used) in U
Empty slots contain NIL
254 Chapter 11 Hash Tables
11.1 Direct-address tables
Direct addressing is a simple technique that works well when the universeUof
keys is reasonably small. Suppose that an application needs a dynamic set in which
each element has a key drawn from the universeUDf0; 1; : : : ; m! (^1) g, wherem is not too large. We shall assume that no two elements have the same key. To represent the dynamic set, we use an array, or direct-address table ,denoted byT0::m! 1 , in which each position, or slot ,correspondstoakeyintheuni- verseU.Figure11.1illustratestheapproach;slotkpoints to an element in the set with keyk. If the set contains no element with keyk, thenTkDNIL. The dictionary operations are trivial to implement: DIRECT-ADDRESS-SEARCH.T; k/ 1 return Tk DIRECT-ADDRESS-INSERT.T; x/ 1 Tx: key Dx DIRECT-ADDRESS-DELETE.T; x/ 1 Tx: key DNIL Each of these operations takes onlyO.1/time. T (universe of keys) U K (actualkeys) 2 3 (^58) 1 9 4 0 7 6 2 3 5 8 key satellite data 2 0 1 3 4 5 6 7 8 9 Figure 11.1 How to implement a dynamic set by a direct-address tableT. Each key in the universe UDf0; 1; : : : ; 9gcorresponds to an index in the table. The setKDf2;3;5;8gof actual keys determines the slots in the table that contain pointers to elements. The other slots, heavily shaded, containNIL. Note: data does not have to be stored in an external object referenced from the slot in T. It can be just stored in the actual slot in T UUniverse of all possible keys KSet of keys actually stored in the table Account # Name 1345 Anderson 5076 Baker 0014 Cooper 2015 Dunhill 8428 Erickson 1007 Fourier
######## The disadvantageis that we could end up with a big array that has a lot of
######## empty space. This wastes lots of memory.
o The bigger we scale, the worse this problem gets (imagine if we used 9-digit account
numbers, and had only 10,000 customers
Direct-address table
Cons direct-address table
######## If U is large then a direct-address table T of size | U | is impractical or impossible
######## (considering memory)
oThe range of keys determines the size of T
######## The set K could be so small relative to U that most of the allocated space is
######## wasted
Direct-address table vs. Hash table
Direct-address table
o Element with key k is stored in slot k
Hash table
o Use hash function h ( k ) to compute the slot
Hash table big idea
Account # Name
1345 Anderson
5076 Baker
0014 Cooper
2015 Dunhill
8428 Erickson
1007 Fourier
######## We can make the array smaller, and use the hash function to select which slot
######## we put our values into. For example the hash function = modulus function
######## Basically, we are compressing the domain of our keys so that they fit into the
######## array
0 1 2 3 4 5 6 7 8 9 10
Hash table big idea
Account # Name
1345 Anderson
5076 Baker
0014 Cooper
2015 Dunhill
8428 Erickson
1007 Fourier
0 1 2 3 4 5 6 7 8 9 10
1345 mod 11 = 3
Anderson
h (k) = k mod m, where m is the size of your hash table
Hash table big idea
Account # Name
1345 Anderson
5076 Baker
0014 Cooper
2015 Dunhill
8428 Erickson
1007 Fourier
0 1 2 3 4 5 6 7 8 9 10
5076 mod 11 = 5
Anderson Baker
Hash table big idea
Account # Name
1345 Anderson
5076 Baker
0014 Cooper
2015 Dunhill
8428 Erickson
1007 Fourier
0 1 2 3 4 5 6 7 8 9 10
0014 mod 11 = 3
Anderson Baker
Cooper
Hash Tables
######## The size of the hash table is proportional to | K |
oWe lose the direct-addressing ability
oWe need to define functions that map keys to slots of the hash table
Hash function definition
######## A hash function h transforms a key k into a number that is used as an index in
######## an array to locate the desired location (bucket or slot)
oAn element with key k hashes to slot h ( k )
o h ( k ) is the hash value of key k
o h maps all keys in K from universe U into the slots of the hash table T[0 ... m 1]
######## Hash function
oSimple to compute
oAny two distinct keys get different cells
oThis is impossible, thus we seek a hash function that distributes the keys evenly among the
cells
########
######## h : U { 0 , 1 ,…, m 1 }
####### 11/9/
Hash function
256 Chapter 11 Hash Tables
11.2 Hash tables
The downside of direct addressing is obvious: if the universeUis large, storing
a tableTof sizejUjmay be impractical, or even impossible, given the memory
available on a typical computer. Furthermore, the setKof keys actually stored
may be so small relative toUthat most of the space allocated forTwould be
wasted.
When the setKof keys stored in a dictionary is much smaller than the uni-
verseUof all possible keys, a hash table requires much less storage than a direct-
address table. Specifically, we can reduce the storage requirement to.jKj/while
we maintain the benefit that searching for an element in the hash table still requires
onlyO.1/time. The catch is that this bound is for the average-case time ,whereas
for direct addressing it holds for the worst-case time.
With direct addressing, an element with keykis stored in slotk.Withhashing,
this element is stored in sloth.k/; that is, we use a hash function hto compute the
slot from the keyk.Here,hmaps the universeUof keys into the slots of a hash
table T0::m! 1 :
hWU!f0; 1; : : : ; m! 1 g;
where the sizemof the hash table is typically much less thanjUj.Wesaythatan
element with keyk hashes to sloth.k/;wealsosaythath.k/is the hash value of
keyk.Figure11.2illustratesthebasicidea.Thehashfunctionreducestherange
of array indices and hence the size of the array. Instead of a size ofjUj, the array
can have sizem.
T
U
(universe of keys)
K
(actual
keys)
0
m
k 1
k 2
k 3
k (^4) k 5 h ( k 1 ) h ( k 4 ) h ( k 3 ) h ( k 2 ) = h ( k 5 ) Figure 11.2 Using a hash functionhto map keys to hash-table slots. Because keysk 2 andk 5 map to the same slot, they collide.
######## The ideal hash table is an array of some fixed size m , containing items (keyand
######## additional entry)
######## Each key k is mapped (using a hash function) into some number in the range 0
######## to m 1 and places in the appropriate cell
######## Insertion of a new entry is O(1)
######## Lookup for data is O(1) on average
oStatistically unlikely that O( n ) will occur
######## Decide what to do when two keys hash to the same value ( collision )
######## Choose a good hash function
######## Choose the size of the table
Issues with Hashing
Division method
Maps key k into one of m slots by taking the remainder of k divided by m
o h ( k ) = k % m
oFor example, hash table size m = 12 and key k = 100 h ( k ) = 4
######## Reasonable strategy, unless the key happens to have some undesirable
######## properties
oFor example, if the table size is 10 and all of the keys end in 0
Hash Function
######## Finding a perfect hashing function is not always possible
######## A hash function maps most of the keys onto unique integers, but maps some
######## other keys onto the same integer
######## This is called collision
Hash Function
######## Multiple keys can hash to the same slot: collisions are unavoidable
######## Design a good hash function will minimizethe number of collisions that
######## occur, but they will still happen, so we need to be able to deal with them
######## Design collision-resolution techniques
Collisions
######## There are two basic policies to handle collisions:
o Chaining
Store all elements that hash to the same slot in a linked list
Store a pointer to the head of the linked list in the hash table slot
Open addressing
All elements stored in hash table itself
When collision occur, use a systematic procedure to store elements in free slots of the table