# Integrating heterogeneous agriculture information using naive Bayes and FCA.

INTRODUCTIONAgriculture is one of the fields that have not yet made full use of the potential of technology. It is one field where technology and techniques should be applied and help farmers who provide us our basic needs, the food. There are many factors that affect the growth of a crop. All the factors have to be analyses before investing on a particular crop. The project aims to use data mining technique to help farmers. The location of the farmers is got from them and based on the location all the factors are analysed and data sets of past years production are analysed. Thus the farmers can predict the crops to be planted in future. This will help farmers to maximize their profit.

There are research paper help to identify mechanisms to get good quality and improved crop yields using a new algorithm named as "Agro algorithm" implemented in Hadoop platform and uses Hadoop framework to handle large amount of data sets. A paper presents an ontology-based approach and a formal concept analysis (FCA) approach to integrating heterogeneous tourism information for online tour planning. An analytic hierarchy process is used to rank the tourism attractions suggested by the ontology and FCA-based approaches.

Naive Bayes and Genetic Algorithm are applied for classifying texture of agriculture soil data in a research. The database contains measurements of various Soil profiles. The achieved performances were compared and analyzed on the collected Soil data. However, Naive Bayes algorithm has much better performance than Genetic algorithms. A paper also compares the different classifiers and the outcome of the research could improve the management and systems of soil uses throughout a large number of fields.

Our goal is to find the crop that can be planted by the farmers so that they can get maximum profit. The location of the farmer is got and the location is analysed for its soil, rainfall and climate. The crop that best suits the location of the farmer is recommended by this recommendation system.

Architecture:

The recommendation process consists of two stages. The first stage is to apply Naive Bayes algorithm and classifies the preprocessed data sets. Weka a free open source data mining tool is used to classify the data based on the threshold value.

Side by side input data is collected from the user such as their location, investment and area of the user.

The next stage is to map the results of the Naive Bayes algorithm and the user input the crop that matches the location are marked and to find the best of results the profit is calculated and the recommendation is given to the user as which crop would yield them the maximum profit.

[FIGURE 1 OMITTED]

Naive Bayes Algorithm:

Naive Bayes classifiers are linear classifiers that are known for being simple yet very efficient. The probabilistic model of naive Bayes classifiers is based on Bayes' theorem, and the adjective naive comes from the assumption that the features in a dataset are mutually independent. In practice, the independence assumption is often violated, but naive Bayes classifiers still tend to perform very well under this unrealistic assumption. Especially for small sample sizes, naive Bayes classifiers can outperform the more powerful alternatives.

Being relatively robust, easy to implement, fast, and accurate, naive Bayes classifiers are used in many different fields. Some examples include the diagnosis of diseases and making decisions about treatment processes, the classification of RNA sequences in taxonomic studies, and spam filtering in e-mail clients. However, strong violations of the independence assumptions and non-linear classification problems can lead to very poor performances of naive Bayes classifiers. We have to keep in mind that the type of data and the type problem to be solved dictate which classification model we want to choose. In practice, it is always recommended to compare different classification models on the particular dataset and consider the prediction performances as well as computational efficiency.

Formal Concept Analysis:

Formal concept analysis (FCA) is a method of data analysis with growing popularity across various domains. FCA analyzes data which describe relationship between a particular set of objects and a particular set of attributes. Such data commonly appear in many areas of human activities. F CA produces two kinds of output from the input data. The first is a concept lattice. A concept lattice is a collection of formal concepts in the data which are hierarchically ordered by a subconcept-superconcept relation. Formal concepts are particular clusters which represent natural human-like concepts such as "organism living in water", "car with all-wheel drive system", "number divisible by 3 and 4", etc. The second output of FCA is a collection of so-called attribute implications. An attribute implication describes a particular dependency which is valid in the data such as "every number divisible by 3 and 4 is divisible by 6", "every respondent with age over 60 is retired", etc. A distinguishing feature of FCA is an inherent integration of three components of conceptual processing of data and knowledge, namely, the discovery and reasoning with concepts in data, discovery and reasoning with dependencies in data, and visualization of data, concepts, and dependencies with folding/unfolding capabilities.

Integration of these components makes FCA a powerful tool which has been applied to various problems. Examples include hierarchical organization of web search results into concepts based on common topics, gene expression data analysis, information retrieval, analysis and understanding of software code, debugging, data mining, and design in software engineering, internet applications including analysis and organization of documents and e-mail collections, annotated taxonomies, and further various data analysis projects described in the literature.

A table with logical attributes can be represented by a triplet (X,Y,I) where I is a binary relation between X and Y. Elements of X are called objects and correspond to table rows, elements of Y are called attributes and correspond to table columns, and for x [member of] X and y [member of] Y,(x, y) [member of] I indicates that object x has attribute y while (x,y) / [member of] I indicates that x does not have y. For instance, Fig. 1 depicts a table with logical attributes. The corresponding triplet (X,Y,I) is given by X = {x1,x2,x3,x4}, Y = y1 y2 y3 ... x1 x x x x2 x x ... x3 x x ... ??? ... y1 y2 y3 ... x111 0.7 x2 0.8 0.6 0.1 ... x3 0 0.9 0.9 ... ... ... Figure 1: Tables with logical attributes: crisp attributes (left), fuzzy attributes (right). {y1,y2,y3}, and we have (x1,y1) [member of] I, (x2,y3) / [member of] I, etc. Since representing tables with logical attributes by triplets is common in FCA, we say just "table (X,Y,I)" instead of "triplet (X,Y,I) representing a given table". FCA aims at obtaining two outputs out of a given table. The first one, called a concept lattice, is a partially ordered collection of particular clusters of objects and attributes. The second one consists of formulas, called attribute implications, describing particular attribute dependencies which are true in the table.

The clusters, called formal concepts, are pairs (A,B) where A [not subset or equal to] X is a set of objects and B [not subset or equal to] Y is a set of attributes such that A is a set of all objects which have all attributes from B, and B is the set of all attributes which are common to all objects from A. For instance, ({x1,x2},{y1,y2}) and ({x1,x2,x3},{y2}) are examples of formal concepts of the (visible part of) table in Fig. 1. An attribute implication is an expression A [??] B with A and B being sets of attributes. A [??] B is true in table (X,Y,I) if each object having all attributes from A has all attributes from B as well. For instance, {y3} [??] {y2} is true in the (visible part of) table in Fig. 1, while {y1,y2} [??] {y3}is not (x2 serves as a counterexample).

Experimental Results:

Datasets are collected from various sources available on the internet. The dataset are analyzed to understand what data is provided. All unwanted, missing and noisy data from the dataset are removed and is used for further processing. A threshold value is selected for classifying the preprocessed data. The preprocessed data is given as input to the Naive Bayes algorithm. The algorithm classifies the data based on the probability values. The values above the threshold are considered for further processing. Input is got from the user using a user interface. The users fill input fields such as investment capability, soil type, rainfall, etc. The input which is received is given as input for the formal concept analysis. The formal concept analysis is a matrix mapping technique for mapping data. The two entities that are passed as input to F CA are the output of Naive Bayes algorithm and the data received from the user. Both the entities are mapped to derive efficient results. The best results of the formal concept analysis are chosen. The crops that produce maximum profit are found. These results are further analyzed using F-measure value for recommendation to users.

Related Work:

The existing system classifies only the soil and Genetic algorithm is used in the existing system. With the Genetic Algorithms, the binary decision tree explains the prediction of the category of Soil data that is built first. To generate the decision tree for the dataset it takes more time and then the rules are classified with that decision tree. Genetic Algorithms are not suitable for managing imprecise or uncertainty in soil data.

Proposed Work:

Naive Bayes and Formal Concept analysis is used in the proposed system. Datasets of previously collected data over years is used to predict the future. Naive Bayes is used in this project which makes use of probability to arrive at the results. It handles real and discrete data well. The Formal Concept Analysis is used in mapping the results of the Naive Bayes and the farmer requirements.

Experimental Results:

Table: GENETIC ALGORITHM NAIVE BAYES AND FCA Recall 0.8701 0.8403 Precision 0.6912 0.9318 F-Measure 0.7696 0.8838

Graph:

COMPARISON OF GENETIC ALGORITHM AND NAIVE BAYES & FCA GENETIC ALGORITHM NAIVE BAYES AND FCA recall 0.87 0.8403 precision 0.69 0.9318 f measure 0.7696 0.8838 Note: Table made from bar graph.

Conclusion:

Data mining is the analysis step of the "knowledge discovery in databases" process, it means not extracting data but knowledge and discovering patterns from data. Agriculture is one field which needs to make use of the knowledge. A pattern is analyzed from the large data sets and the current production is predicted based on probability using Naive Bayes algorithm. Formal Concept Analysis is then used to map the output of the Naive Bayes algorithm with the user input. Thus both the future prediction and the user requirements are implemented. Further the profit from that prediction is also approximately calculated and the recommendation is provided. This recommendation creates an opportunity for farmers to make effective use of technology for future prediction. Thus two algorithms are employed to provide recommendation in the field of agriculture.

REFERENCES

[1.] Kushawa, A.K. and B. Sweta, 2015. "Crop yielding prediction using agro algorithm in Haddop" IRACST, 5: 2.

[2.] Huang, Y. and L. Bian, 2015. "Using Ontologies and Formal Concept Analysis to Integrate Heterogeneous Tourism Information.

[3.] Bhargavi, P., 2012. "Comparative Study of Naive Bayes with Genetic Algorithm for Soil Dataset", IJCAET, 1: 3.

[4.] Ramesk, V. and K. Ramar, 2011. "Classification of Agriculture Land Soil: A data mining Approach," Agricultural Journal.

[5.] Yethiraj, N.G., 2012. "Applying data mining techniques in the field of agriculture and allied sciences", International Journal of Business Intelligents ISSN: 2278-2400, 01: 02.

[6.] Minaei-Bidgoli, B., W. Punch, 2003. Using Genetic Algorithms for Data Mining Optimization in an Educational Web-based System. Genetic and Evolutionary Computation, pp: 2252-2263.

[7.] Poelmans, J., D.I. Ignatov, S.O. Kuznetsov and G. Dedene, 2013. ''Formal concept analysis in knowledge processing: A survey on applications,'' Expert Syst. 40(16): 6530-6560.

[8.] Nesbit, W.R., 1973. "the art of forecasting domestic air travel: A survey assessment and overview,'' in Proc. 4th Annu. Conf. Travel Res. Assoc., pp: 285-290.

[9.] Shanwad, U.K., V.C. Patil and H. Honne Gowda, 2014. "Precision Farming: Dreams and Realities for IndianAgriculture", Map India.

[10.] Takoi, K. Hamrita, Jeffrey S. Durrence, George Ve LLIdis, 2009. "Precision Farming Practices" IEEE Industry Applications Magazine.

(1) J. S. Kanchana and (2) S. Sujatha

(1) Department of Information Technology, K.L.N College Of Engineering, Pottapalayam

(2) Asst. Prof, MCA Dept, Anna University Regional Office, BIT Campus, Tiruchirapalli

Received 27 May 2016; Accepted 28 June 2016; Available 12 July 2016

Address For Correspondence:

J. S. Kanchana, Department of Information Technology, K.L.N College Of Engineering, Pottapalayam

Fig. 1: Applying fca on crisp set [y.sub.1] [y.sub.2] [y.sub.3] [x.sub.1] x x x [x.sub.2] x x [x.sub.3] x x

Printer friendly Cite/link Email Feedback | |

Author: | Kanchana, J.S.; Sujatha, S. |
---|---|

Publication: | Advances in Natural and Applied Sciences |

Date: | Jun 30, 2016 |

Words: | 2176 |

Previous Article: | Trust among militants in wireless sensor network. |

Next Article: | Design of PSO based buck boost converter for PV based inverter system. |

Topics: |