Chengwei LEI, Ph.D.    Associate Professor

Department of Computer and Electrical Engineering and Computer Science
California State University, Bakersfield

 

CMPS 4450 Data Mining and Visualization

 

Introduction



=====>!!!Do not use any "magic function" in this class!!!<=====






 

 

Introduction to Data Science

 




Interesting Examples


 

Day and Night is a television series directed by Wang Wei and written by Zhiwen. The series tells the story of Guan Hongfeng, the former captain of the Changfeng Criminal Investigation Detachment, who solves many cases to get his brother Guan Hongyu exonerated.



Guan Hongfeng, a former police captain suffering from nyctophobia, returns to solving mysteries alongside the hot-tempered Captain Zhou Xun and rookie officer Zhou Shutong. However, he has a hidden agenda, which is to clear his identical twin brother Guan Hongyu's name from the alleged murder of an entire family.



Assume you are Captain Zhou Xun, with a Computer Science Ph.D background. :)
Can you figure out their identities based on the behavior patterns? How?

 


Hint: English Professor style or Math Professor style?
Both!



 

!!!DISHONEST CASINO!!!

A casino has two dice, casino player switches back and forth between fair and loaded die once in a while.







The dice are unfair!!!!!!!!!!



Fair die
P(1) = P(2) = P(3) = P(4) = P(5) = P(6) = 1/6

Loaded die
P(1) = P(2) = P(3) = P(4) = P(5) = 1/10
P(6) = 1/2

Switching between fair and loaded die
P(trans) = 0.05


Consider the above dice model, we have a series of die rolling outcomes

61226351241253662512......123643452416243

Try your best to guess which outcomes are generated by the fair die, and which are by the loaded die


Are you sure you can do it? Yes? Then try this one! (1000 outcomes)

You can try to evaluate your guess on Odin by following CMD:
/home/fac/clei/checker/HMM/DiceChecker1000 WhateverYourAns.txt
Sample Answer (F stands for Fair, L stands for Loaded)
Can you make it better than 80%?




Here is the public dataset drawn from the U.S. Army Anthropometric Survey  form University of Michigan

We have some data sheets in our office, but they are ruined by rats.

Rats Are Eating Files Along With Food Scraps In East Delhi Municipal  Headquarter 

Try to write a program to fix the following broken dataset

AllData

I split the data for your convenience

OkayData
RuinedData

Test your answer (Sample Answer) by
/home/fac/clei/checker/KNN/armyChecker1 YourAnsForData1.txt

 

 


Hint: "I can do it too" or "How can I do it" ?