Home | About | Data scientists Interviews | For beginners | Join us


Ancient story of Datamining

In the 1960s, statisticians used terms like “Data Fishing” or “Data Dredging” to refer to what they considered the bad practice of analyzing data without an a-priori hypothesis. The term “Data Mining” appeared around 1990 in the database community.

Data mining in Technical words

Data mining is a process of extracting specific information from data and presenting relevant and usable information that can be used to solve problems. There are different kinds of services in the process like text mining, web mining, audio and video mining, pictorial data mining and social network data mining.

Why Data mining is hot cake Topic for this generation?

Data mining is young and promising field for present generation because of its spacious applications. In general way of saying, it has an attracted a great deal of attention in the information industry and in society, due to the wide availability of huge amounts of data and the imminent need for turning such data into useful information and knowledge.The information and knowledge gained can be used for applications ranging from market analysis, fraud detection, and customer retention to production control and science exploration. This is the reason why data mining is also called as knowledge discovery from data.

Understanding of data mining with buying apples example


Before going to explain data mining with this fresh apples, let me say some interesting facts about apples.

Nutrition:  According to the United States Department of Agriculture, a typical apple serving weighs 242 grams and contains 126 calories with significant dietary fiber and modest vitamin C content, with otherwise a generally low content of essential nutrients.

Toxicity of apple seeds: The seeds of apples contain small amounts of amygdalin, a sugar and cyanide compound known as a cyanogenic glycoside. Ingesting small amounts of apple seeds will cause no ill effects, but in extremely large doses can cause adverse reactions. There is only one known case of fatal cyanide poisoning from apple seeds; in this case the individual chewed and swallowed one cup of seeds. It may take several hours before the poison takes effect, as cyanogenic glycosides must be hydrolyzed before the cyanide ion is released.

Now we will step into our example.

Suppose your family members want to meet some one who is suffering from pancreatic cancer. We all know that the consumption of apples could help to reduce pancreatic cancer by up to 23 percent. So your father asked you to bring apples from a nearby shop to your house. Also your father teach (learn) you how to buy apples by giving some set of rules.

Rules for buying apples

  • Big size apples are having less taste than small size apples.
  • Dark red apples are not fresh ones.
  • Light red apples are fresh ones.
  • Green apples are good for health.

On seeing this list of rules you can pick the apples which you want to buy. Your family members want to give  these apples to an unhealthy person. Hence, you obviously pick green apples. So when you go to shop you pick small size apples which are in green color. End of the story to select apples which are good for health.

Non Data mining  Algorithm

selecting apples Algorithm
if( selected_apple == small (in size ))
     if(selected_apple == green ( in color ) ){
            select apple
     else {
           don't select apple

Comparing  with data mining

  • You will randomly select an apple from the shop ( training data )
  • Make a table of all the physical characteristics of each apple, like color, size( features )
  • Tasty apples, apple which are good for health( output variables )
  • If you went to other shop and buy the apples ( test data )

You can now buy  apples with great confidence, without worrying about the details of how to choose the best apples. And what more, you can make your algorithm and improve it over time (reinforcement learning), so that it will improve its accuracy as it reads more training data, and modifies itself when it makes a wrong prediction. But the best part is, you can use the same algorithm to train different models, one each for predicting the quality of apples, oranges, bananas, grapes, cherries and watermelons, and keep all your loved ones happy.

This type of learning is called as supervised learning in data mining. In next post I will give you clear picture of difference between supervised learning and unsupervised learning with real life examples.


Follow us:


I hope you liked todays post. If you have any questions then feel free to comment below.  If you want me to write on one specific topic then do tell it to me in the comments below.


Home | About | Data scientists Interviews | For beginners | Join us

7 thoughts on “DATA MINING…….!

  1. Hi Sai, Thanks for the basics of data mining. What is training, what is model here? And what is training a model?
    Could you please help me understanding these basic terms?


  2. Hi Sai Madhu.! Great article to introduce Data Mining.
    I have a question. You have said about “reinforcement learning”. What exactly is this and how does this happen? How can an algorithm learn over time? Does it store in any memory to remember? Please give me clear picture about this.


    • Hii someone 🙂
      Thanks for your compliment.
      Reinforcement learning:
      The reason for using reinforcement learning in the introduction to data mining post was to express learning from data is like trial and error learning. This means ,you build the model from your train data by considering few parameters. when you tested your model it’s not giving better accuracy. Then you will change the parameters you have consider before to get better accuracy. This process will go until you gratify with you model accuracy.
      Suppose in our apples example we believed that small and green apples are good one ,but when you buy green apples you feel that those apple are not good enough. so you will find some new parameters like buying morning is good or evening is good some thing like that then you will change your model based on your new parameters.

      Coming to your next question how an algorithm will learn over time:

      In real world problems our training data will update on regular intervals so our model will also update so our model accuracy will also changes. this is the reason why we will called algorithm will learn over time.

      Coming to your next question Does it store in any memory to remember:

      once we done with model we will store it in an variable. when you changed your model you will update the variable that’s it.

      if you have any other questions you can mail to


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s