## Reading

here is some discussion of a hashtable https://cs.csub.edu/~msarr/cmps2020/lab/lab27/lec/hashtables.pdf A Hash is a specialized container that attempts to have an insert or search that has a big O of 1 depending on the data and the efectivness of the hashing function it can get fast results, HOWEVER it generally does not work well as it becomes full. for these examples we will add values untill the hash is a certain percent full. if an insert or search has 0 colisions the operation will be at Big of 1 as you can see here above 60% full we see a dramatic decrease in efficiency Hash insert 203 to Hash of size 2039 ( 10%): time 255 : average time 1 : longest 3 : colisions 0 Hash insert 407 to Hash of size 2039 ( 20%): time 597 : average time 1 : longest 80 : colisions 0 Hash insert 611 to Hash of size 2039 ( 30%): time 826 : average time 1 : longest 59 : colisions 0 Hash insert 815 to Hash of size 2039 ( 40%): time 1168 : average time 1 : longest 72 : colisions 175 Hash insert 1019 to Hash of size 2039 ( 50%): time 1443 : average time 1 : longest 76 : colisions 379 Hash insert 1223 to Hash of size 2039 ( 60%): time 1729 : average time 1 : longest 75 : colisions 583 Hash insert 1427 to Hash of size 2039 ( 70%): time 2000 : average time 1 : longest 69 : colisions 787 Hash insert 1631 to Hash of size 2039 ( 80%): time 3858 : average time 2 : longest 69 : colisions 6144 Hash insert 1835 to Hash of size 2039 ( 90%): time 17338 : average time 9 : longest 87 : colisions 49840 Hash insert 2039 to Hash of size 2039 (100%): time 36020 : average time 17 : longest 225 : colisions 110591 take away the larger the hashtable is the faster it will be, but that could mean a lot of unused space and memory may be an concern, if not make it big. as you can see trying to make it small in relation to how many values you plan to insert can really be a concern another key aspect of a hashing approach is to create a hash key that will spread out our potential values evenly. here are a few examples creating different hash keys for a string input value.. we are using names and our hashtable will have a size of 2047 ( a prime number ) /* pass in a string , add up the ASCII values of all the individual characters, mod it with the size of the hashtable we will insert into and return the value */ int hashFunction(string key) { int hashCode = 0; for (int i = 0; i < key.length(); i++) { hashCode += key[i]; } return hashCode%HASHSIZE; } results Hash "Bob Roberts " hash1 = 1044 Hash "Rob Boberts " hash1 = 1044 Hash "Eduardo Lopez " hash1 = 1262 Hash "Batman " hash1 = 595 Hash "Bartman " hash1 = 709 there are two problems here , we have to similar names hash to the same value also our potential values seem to be clustering in the 600-1200 range we want something like 0-2046 so lets try again /* lets muliply the ASCII value of each character in the string by a constant to inrease the spread of the values of the keys */ #define PRIME_CONST 31 int hashFunction(string key) { int hashCode = 0; for (int i = 0; i < key.length(); i++) { hashCode += key[i] * PRIME_CONST;// make the key larger } return hashCode%HASHSIZE; } results Hash "Bob Roberts " hash1 = 1659 Hash "Rob Boberts " hash1 = 1659 Hash "Eduardo Lopez " hash1 = 229 Hash "Batman " hash1 = 22 Hash "Bartman " hash1 = 1509 thats a bit better , now we are covering the range of 22-1659 our keys are more spread out in our hash but Bob and Rob still have the same hash key value, becuase they have the same characters so lets try again /* lets make it have something that looks at the position in the string as well so Bob Roberts and Rob Boberts dont have the same key even if they have the exact same characters */ #define PRIME_CONST 31 int hashFunction(string key) { int hashCode = 0; for (int i = 0; i < key.length(); i++) { hashCode += key[i] * i * PRIME_CONST; // ASCII * position in string * constant prime number } return hashCode%HASHSIZE; } results Hash "Bob Roberts " hash1 = 255 Hash "Rob Boberts " hash1 = 318 Hash "Eduardo Lopez " hash1 = 1464 Hash "Batman " hash1 = 286 Hash "Bartman " hash1 = 276 now Bob and Rob are different but it still seems to be clustering in the 200-400 range lets try to change the prime number we are multiplying to 67 #define PRIME_CONST 67 int hashFunction(string key) { int hashCode = 0; for (int i = 0; i < key.length(); i++) { hashCode += key[i] * i * PRIME_CONST; // ASCII * position in string * constant prime number } return hashCode%HASHSIZE; } results Hash "Bob Roberts " hash1 = 287 Hash "Rob Boberts " hash1 = 93 Hash "Eduardo Lopez " hash1 = 853 Hash "Batman " hash1 = 354 Hash "Bartman " hash1 = 1587 OK now thats a pretty good spread , we can probably insert names into the hash and minimize colissions lets try a few more, im currious Hash "Bob Roberts " hash1 = 287 Hash "Rob Boberts " hash1 = 93 Hash "Eduardo Lopez " hash1 = 853 Hash "Batman " hash1 = 354 Hash "Millhouse " hash1 = 1361 Hash "Lisa " hash1 = 1002 Hash "Homer J. Simpson " hash1 = 313 Hash "Marge Simpson " hash1 = 1867 Hash "Maggie Simpson " hash1 = 1385 Hash "Krusty The Clown " hash1 = 1981 Hash "Sideshow Bob " hash1 = 975 Hash "Mayor Quimby " hash1 = 235 Hash "Edna Krabappel " hash1 = 1875 looks good.