DataAspirant July2015 newsletter

Home | About | Data scientists Interviews | For beginners | Join us | Monthly newsletter



This July newsletter our team rounded up the best blogs for anyone interested in learning more about data science. Whatever your experience level in data science or someone who’s just heard of the field  these blogs provide enough detail and context for you to understand what you’re reading. We hope you will enjoy July dataaspirant newsletter.

Blog Posts:

1. Can deep learning help find perfect girl:

When a Machine Learning PhD student at University of Montreal starts using Tinder, he soon realizes that something is missing in the dating app – the ability to predict to which girls he is attracted. Harm de Vries applies Deep Learning to assist in the pursuit of the perfect match.

Read Complete Post On : kdnuggets Blog… 

2. Data Scientist to follow on Github:

The lives of people on GitHub doesn’t appear to as tempting as you would observe on other platforms, but if you love coding, programming and data science, you’ll surely enjoy the company of 9 million users on this platform!

Following influencers is usually a good practice. It has helped me in multiple ways.

Read Complete Post On :

3. What is data science? Why study data science?

“Data Science is extraction of knowledge from data”.

“By combining aspects of statistics, computer science, applied mathematics, and visualization, data science can turn the vast amounts of data the digital age generates into new insights and new knowledge”.

Read Complete Post On :  welcomedata

4. What is deep learning?

Imagine a baby coming into the world. From the moment she opens her eyes, there is a world of learning to absorb. Every action will create an imprint on her brain, and she will not stop learning until the moment she leaves the world. She develops because of her environment, because of the things she studied, the books she read, the work she carried out, the people she talked with, etc.

Now, imagine an artificial neural network – that learns and draws its own conclusions. Similar in some ways to a human brain, but far easier to program, and with a limitless potential for growth. It analyses arrays of data – looking at inputs and outputs. If a cat is thrown into some water, it won’t be too happy. A deep learning array could work that one out without breaking a sweat.

Read Complete Post On :

5. Introduction to big data with apache spark:

With the advent of new technologies, there has been an increase in the number of data sources. Web server logs, machine log files, user activity on social media, recording a user’s clicks on the website and many other data sources have caused an exponential growth of data. Individually this content may not be very large, but when taken across billions of users, it produces terabytes or petabytes of data. For example, Facebook is collecting 500 terabytes(TB) of data everyday with more than 950 million users. Such a massive amount of data which is not only structured but also unstructured and semi-structured  is considered under the roof known as Big Data.

Read Complete Post On : rideondata blog

6. Data manipulation primitives in R and Python:

Both R and Python are incredibly good tools to manipulate your data and their integration is becoming increasingly important1. The latest tool for data manipulation in R is Dplyr2 whilst Python relies onPandas3.

This blog post  show you the fundamental primitives to manipulate your dataframes using both libraries highlighting their major advantages and disadvantages.

Read Complete Post On : blog

7. Brain vs deep learning singularity:

This blog post is complex as it arcs over multiple topics in order to unify them into a coherent framework of thought. I have tried to make this article as readable as possible, but I might have not succeeded in all places. Thus, if you find yourself in an unclear passage it might become clearer a few paragraphs down the road where I pick up the thought again and integrate it with another discipline.

Read Complete Post On : timdettmers blog

8. Classification trees using R:

The goal of classification trees is to predict or explain responses on a categorical dependent variable, such as species of trees or customer segmentation classes. As such, the available techniques have much in common with the techniques used in the more traditional methods of Discriminant Analysis, Cluster Analysis, Nonparametric Statistics, and Nonlinear Estimation. The flexibility of classification trees make them a very attractive analysis option. However, like all methods, I do not recommend their use to the exclusion of more traditional methods. Indeed, when the typically more stringent theoretical and distributional assumptions of more traditional methods are met, the traditional methods may be preferable. But as an exploratory technique, or as a technique of last resort when traditional methods fail, classification trees are, in the opinion of many researchers, unsurpassed.

Read Complete Post On :


1. Sparkling pandas

2. Data mining technique clustering for bank data set

3. How to Become a Data Scientist

4. How to Interview a Data Scientist

5. Scalable, Distributed, Machine Learning for Big Data


That’s all for July 2015 newsletter. Please leave your suggestions on newsletter in the comment box so that we improve for next month newsletter. To get total dataaspirant newsletters you can visit monthly newsletter page. Subscribe to our blog so that every month you get our news letter in your inbox.


Follow us:


Home | About | Data scientists Interviews | For beginners | Join us | Monthly newsletter

Interview with Data science expert Kai Xin Thia, Data scientist at Lazada, Co-Founder DataScience SG

Home | About | Data scientists Interviews | For beginners | Join us

Interview with Kai Xin Thia


We are excited to interview Kai Xin Thia as the first data scientist for our dataaspirant blog lovers.  He has shared some interesting things about data science. So let us see what he has shared with us.

Hi Kai Xin Thia we are so delighted to interview you and thanks a lot for your time with us. Before going to interview let me introduce Kai Xin Thia.


Kai Xin is a data scientist at Lazada. He specializes in behavioral analytics and has interest in large recommendation systems. He has been building behavioral models for 3 years and is the top 1% on Kaggle, which is an international data science competition portal. He is also the Co-Founder of DataScience SG (the largest data science community in Singapore) & volunteer at DataKind SG (NGO that helps other NGOs through data science).

Hi Thiakx! Let me start with asking about your background. Can you tell your background for our Data science Enthusiasts?

Hi, I began my data journey from singapore management university, where I graduated with a degree in information systems, business intelligence and analytics. I then spent time working at SAS and EMC, building my foundation. I moved on to focus on healthcare analytics at Khoo Teck Puat hospital and I am currently at Lazada, working on retail and behavioral analytics.

That’s great now we came to know about your Healthcare analytics. What is your definition of data science ?

Sure Generally, data science is the use of hacking skills, math & stats and domain expertise to generate useful insights for business and you can see some reference like.

I am personally excited  to know about you. How did you get started with data science and  which things inspired you a lot towards data science ?

When I first got started, it was all about business intelligence & business analytics. Pretty much about generating reports to understand the current performances of businesses. Things started to get interesting when I started on Kaggle, building predictive models based on historical data.

Can you share your experience about data science? ( At specially regarding your projects and start up “Foxhole”)

I will say there is a growing interest among companies in Singapore (and probably Asia in general) regarding the use of data science in their operations but we are still behind our US counterparts.

You have been building behavioral models for 3 years. Can you give us introduction and insights about behavioral models?

Behavioral models as its namesake suggest, is about understanding why people behave / respond in a certain way and how we can encourage them to adjust their behavior using data models. Beyond data models, there are a lot to learn in this field, for example, how predictably irrational most people are:

You had participated in data science competitions. You were 2nd place winner in “Unilever Prediction Challenge on consumer preference” and “Singapore’s Data in the City Visualization Challenge on education “. Can you share your experience about those?

Data in the City was interesting as we took the chance to research and understand Singapore’s education journey and we grew from a third world, improvised country into a developed city with an education system that attracts students from all around the region. In the Unilever challenge, we had the opportunity to present to management and learn from them what truly matters: sometimes it is not just about building the most complex models but rather, the act of balancing model accuracy with the ease of deploying the models into production.

You have done information systems from Singapore Management University. How has been information systems helping you in your career. What would be your recommendation for Data science enthusiasts regarding this?

University is the best time to pick up technical skills. If you are interested to try out / enter the data science industry, don’t be afraid to sign up for some difficult mathematics / statistics / machine learning modules. Use this opportunity to make mistakes and learn from them.   

What is your opinion about online courses for Data science? Which are your recommended online courses for Data science enthusiasts?

Coursera / edX / stanford online are fantastic platforms for learning. Here is what I recommend:

*John Hopkin’s data science specialization is not worth the money but is alright for a quick introduction to data science.

Can you share your favorite list of data science books for us?


What are the prerequisites that you think for a data science fresher who is starting from Zero level?

Not giving up. Most smart, rational peeps give up after 3-6 months because it is too hard / too boring / not earning them money. It takes years to train a doctor; it takes at least as long to train a data scientist.

What are the best programming languages for data science and which one is  your favorite?

Learn R for quick prototyping, python to deal with larger datasets and Apache Spark for enterprise level work.

What are the primary questions that will ask in data scientist interviews?

See the list from Quora, quite accurate.

 I will like to add one question that I was asked before: “Describe to me the greatest data project that you have worked on so far.

What is the present scope of data science and how it would be in future?

Current popular data science tools (R/python) are limited to single machines while enterprise software tools (SAS / Teradata) are expensive and unwieldy. Next generation tools like Spark will bridge the gap, bringing enterprise level scalability to popular data science tools  (R/python).

Final question. Can you share your opinion on our Blog?

It will be really interesting if you can interview more data scientists 🙂

Sure. We have more interviews coming up 🙂

Thank you so much for enlightening interview with us. This will definitely add value to our readers. Once again thank you.

Follow us:


I hope you liked today interview. If you have any questions then feel free to comment.  If you want to ask a question to data scientist then let us know in the comments. You can find link for comments just below the title.  so that we ask those questions in next data scientist interviews.


Home | About | Data scientists Interviews | For beginners | Join us