Are you an Aspiring Data Scientist ? Here is what you need to know

Art-template

Most of us are now quite familiar with terms like Big Data , Analytics, Data Science and Data Scientist . The United States White House recently announced Mr. D. J. Patil (@dpatil) as the first U.S Chief Data Scientist (white house blog). According an article in Harvard Business Review, Data Scientist has been considered as the Sexiest Job of the 21st Century(HBR). Most of the tech savvy companies including Google, LinkedIn , Facebook and Twitter have developed their internal data science teams which are headed by Chief Data Scientists or Chief Data Officers and many other non-tech companies from different sectors including retail, healthcare, aviation, logistics and human resources are in process of developing their internal data science and analytics teams to have the power of data driven decision making. So if you are looking for a career in data analytics and you are an aspiring Data Scientist here are top 10 things you may find useful.

1] Join a Masters Program : With the rise of Big Data the demand for data analytics professionals is rising everyday and many top universities have started specialized programs in data analytics and data science field to cater this demand. Most of these Masters programs can be completed in 1 year, which focus not only on the technical skills required to be ready for analytics profession but the programs also equip students with business fundamentals and project management and leadership skills which are equally essential in this profession. Few select university programs are as below and readers can find many more similar programs from web search.

1. Master of Science in Business Analytics by University of Minnesota 2. Master of Science in Business Analytics by University of Texas Austin 3. Master of Science in Analytics by Georgia Tech University 4.Master of Science in Analytics byNorthwestern University

For those of you who are working full time at present and are interested in joining a Masters program there are few universities who offer online programs in Analytics.For example: Northwestern Predictive Analytics. Here is a consolidated list of most of the Analytics programs in United States complied by NCSU ( List )

2] Start Learning R and Python : R and Python are the two most widely used programming languages used by lot of analytics professionals. If you are an undergraduate student thinking of pursuing a Masters,a college kid or an experienced professional, having these languages in your armory is going to be a huge advantage for you. There are lot of Online courses available on Coursera , Edx and many other MOOCs. Code School is one more source where you can learn bunch of programming languages. Here is a introduction to R

3]Learn From Peers : Lot of professionals in data analytics field are posting their work on their blogs, or sharing the links of their git-hub repositories. You can find many useful posts on blogs like R-Bloggers , FiveThirtyEight , Revolutions and websites like Kdnuggets.

4]Join Data Science Competitions : There are lots of different forums like Kaggle, CrowdAnalyticx , DrivenData etc. where you can participate in data science competition and even win some cash prize.You can learn a lot by applying your knowledge on real data sets on there forums and also comparing yourself against your peers. There are lot of useful forums and discussion boards on these platforms where you can learn new techniques and approaches to a problem shared by the peers .

5] Read Case Studies and Articles in Analytics : “In theory, theory and practice are the same. In practice, they are not.”-Albert Einstein. Knowing the techniques and theory about applying the models and tweaking the parameters and everything is one thing but when it comes to application of it from grass root level, a person needs to understand the scenario from business perspective and this is where learning from what others have done becomes useful. Having the domain knowledge of the industry, know small details about the practices used in that industry can be very useful when solving a industry specific problem.Reading case studies and articles on how analytics was implemented can give reader the sense of application of techniques learnt in theory. There are lot of case studies available on HBR as well as those shared by top consulting firms including Mu Sigma, McKinsey and Accenture etc.

6]Follow the Leaders in your Field : There is no better way than learning from experts to stay updated about latest trends and happening in your industry. Follow the influencers and experts in analytics, on social platforms like twitter. Here is a list of top 10 influencers in predictive analytics shared by dataeconomy.

7] Network, Meet People, Attend Conferences : Networking with people in analytics profession, joining local groups on websites like meetup.com to find like minded people can be a great source of knowledge. If possible and affordable it is recommended to attend analytics and data science conferences to stay abreast of latest happenings. Starta , HP Vertica are few of the top conferences that happen every year.

8]Develop the Ability of Story Telling : Data Scientist is someone who is not just required to understand the technical and business aspects of the problem, but also needs to explain the end results to top management and CXOs in the form of a story that will enable them to take right business decisions. So start practicing explaining complex business problem and its solution to a person who don’t know anything about it.

9]Write Blogs , Share Articles : Writing blogs about your work and sharing with it with others is a great way to showcase your work as well as to help others to learn from you. Writing a solution to a complex analytics problem you solved and explaining it in the form of a story can be a great practice to your story telling skills as well.

10]Learn to Ask Why: This is the most important skill that all the leading data scientist possess. They want to know why something is happening the way it is happening , whether it is increased sales, increased response to some advertising campaign or sudden drop in clicks. As a Data Scientist one needs to have that curiosity of understanding why it is happening this way and using the data to find the answer to that WHY.

Please share your thoughts about the post.

*Image credit: Data Scientist graphic. For Bersin by Deloitte, Deloitte Consulting LLP. 2014, jenniferhines.net

Advertisements

A Tale of Titanic from the eyes of ‘’Data”

titanic_ship-1920x1080

For this analysis I used a data set published on  http://biostat.mc.vanderbilt.edu/twiki/bin/view/Main/DataSets

The data has information about 1309 passengers of the Titanic who were on board. The list excludes the crew members. The data contains various attributes about passengers like age , class , gender , fare etc.

I used R to analyze and visualize this data. The data had missing age values for 263 passengers. I have used a regression tree model to replace these missing values.

Being women ,  having age less than 18 and having a ticket of class 1 were some of the factors that helped people on Titanic to be the luckiest ones to survive 

15th April 1912, a British passenger liner, Titanic, travelling from Southampton, UK to New York City, USA collided with an Iceberg and as the giant ship broke apart lives of 1309 passengers who were aboard suddenly came in huge trouble. The ship was carrying people from different classes of society, different age groups, and many people were on board with their family members including parents , siblings , spouse ,and children. What followed to this was one of the greatest mishaps of the century and of the 1309 people aboard only 500 could manage to survive .What really helped these 500 people to save their lives and be the lucky ones to escape the death?  Let’s see what the “Data” reveals.

Of the 1309 people aboard, 466 were female, which is approximately 36 % of the total people embarked and around 64 % that is 843 were male.                                                                                                                                                                                                                    gender_overall

The ship was carrying people from all ages with the smallest child being 2 months old and the oldest person being 80 years old.

age_distribution

To find more insights about the distribution of age and survival of people in that age range , I binned the ages in three categories as people below 18 years of age were labeled ‘Children’ , people above 60 years of age were labeled ‘Senior Citizen ‘ and rest were termed ‘Adult’. Below we can see the number of people aboard in each of the 3 categories.

age_catego

 

Similarly I Analyzed the number of people who  embarked from different cities and different classes.

entry_point

 

 

As we can see , maximum people embarked in Southampton , followed by Cherbourg and Queenstown. Also if we look at number of people who were located in different classes , and their gender distribution as shown in figure below , we can see class 3 had highest number of people of both the genders followed by class 1 and class 2.

Class_gender

 

So after looking at all these descriptive statistics regarding people who embarked on Titanic , lets look at some of the factor that contributed towards the survival of the 500 people.

So what made these 500 people lucky……..

  • Gender

gender_wise_survival

 

As we can see from above graph , out of 466 females on board , 339 females survived , which is approximately  73 % survival rate , where as out of 843 males on the ship , only 161 survived , a ratio of mere 19 %.  One reason behind this huge difference between the survival rate could be that , there might be high number of married women on board and since the number of life boats were very few , in fact only 20 , the husbands might have tried to save life of their wives first at the stake of their own life. So being female was one of the most important factor for survival.

  • Class

class_survival

As we can see in above graph around 200 people from class 1 , survived out of total 323 people in that class , which is approximately 62 % . This was followed by class 2 which had around 52 % survival rate with 119 people getting survived out of total 227.Class 3 people were not so lucky with survival rate being just 25 % .  The reason behind such huge difference between survival rates could be the way these three classes were located in boat.May be it was more easy for class 1 and class 2 people to reach to life boat exit spots as compared to class 3.

  • Age

age_wise_sur

 

Age was also one of the most important factor for getting survived as can be seen from above graph. From above graph we can see , if the age group was child, having age less than 18 , the survival rate was almost 51 % , followed by adult age group  for which the survival rate was approximately 38 % . The senior citizen age group had very low survival rate with only 18 people getting survived out of 65. The reason behind high survival of child age group shows, children were given preference over adults and senior citizens while sending people on life boats.

  • Point of Entry

 

entry_ppoint_survival

 

Point of entry was also one of the factors that helped people in being lucky to survive. As we can see from above plot , for people who embarked from Cherbourg ,around 150 people out of 270 were survived ,which is the highest survival rate out of the three entry points. A reason behind this could be that , lot of people who embarked from Cherbourg were from class 1 and hence were located at right place to reach to life saving boats.

  • Family size

family wise survival

 

I have created a new attribute family size by adding number of siblings , spouse , children , parents and the person himself.As we can see the survival rate was better for people having family size between 2 to 4 and was lower for people having family size greater than 4. It is quite reasonable to guess that people with higher family size might have tried to gather and look for all family members before trying for own survival. Interestingly here people with family size 1 means the people who were travelling alone , and the survival rate for this category is very poor as can be seen from above graph. So many of these alone travelers might have purchased a class 3 ticket and hence the lower survival rate.

So from the above analysis we can see that women and children first strategy was applied in Titanic rescue operation, as well as the people of class 1 and class 2 had an advantage over the people from class 3 for reaching to life saving boats and ultimately for survival.People with lower family size were able to gather all family members soon , and hence were having higher survival rate.