In this paper, we show that howdeep learning can be used for cancer detection and cancer type analysis as asuperior method in comparison to many other previously found machine learningmethods like Artificial Neural Network, Bayes Network, Support Vector Machine,Decision tree.
The technique is here applied to the classification of cancertypes. In this domain we show that the performance of this method is betterthan that of previous methods, therefore promising a more comprehensive andgeneric approach for cancer detection and diagnosis. In this paper a frameworkhas been presented where we show various cancer types and classificationmethods applied to them. It was seen the size of dataset was a factor in determiningthe cancer. Theoretically it was found out that the deep learning method forclassifying cancer is better than other old methods of classification. Chapter-1 Introduction .
Machinelearning, the ability of machines to learnwithout being explicitly programmed, has proved to be a promising part ofartificial intelligence. Because of new computing technologies, machinelearning today is not like machine learning of the past. It was born frompattern recognition and the theory that computers can learn without beingprogrammed to perform specific tasks; researchers interested in artificialintelligence wanted to see if computers could learn from data. Theiterative aspect of machine learning is important because as models are exposedto new data, they are able to independently adapt. They learn from previouscomputations to produce reliable, repeatable decisions and results. It’s ascience that’s not new – but one that has gained fresh momentum.
Artificialintelligence has been the root to solving problems in many fields likeEconomics, Robotics, Linguistics, Medical diagnosis and many others Machine learning in cancer research dates backto the 20th century. Machine learning can be supervised, unsupervisedand semi-supervised, the later proving to be more useful. New methodsconsisting of modified algorithms are being implemented in predicting, treatingcancer. Efficiency is highly required when predicting cancer because even onelife matters. If our method can provide us with accuracy of 100% then only itmeans not even one prediction was wrong. So, search for the perfect method ofpredicting cancer is essential. ANN(artificial neural network) has been the gold standard, BN(Bayesin network)proved to be good in predicting certain cancers like colon cancer, Decision Trees (DT) has been efficient tooand deep learning is that promisingmethod that has been recently researched about and something that has not beenyet included in that many review papers. It uses a variety of optimizationtechniques that permits us to learn from past training and detect complexpatterns from large and complex data sets.
In cancer prediction we need a largedataset for training and testing, deep learning could be more efficient and incomparison to other methods it could be the best.Canceris the general name for a group of more than 100 diseases. Although cancerincludes different types of diseases, they all start because abnormal cellsgrow out of control. Without treatment, cancer can cause serious healthproblems and even loss of life. Early detection of cancer may reduce mortalityand morbidity. AI techniques are approaches that are utilized to produce anddevelop computer software programs. AI is an application that can re-createhuman perception. This application normally requires obtaining input to endowAI with analysis or dilemma solving, as well as the ability to categorize andidentify objects.
This paper describes various AI techniques, such as supportvector machine (SVM) neural network, fuzzy models, artificial neural network(ANN), and K-nearest neighbor (K-NN). Feedforward neural networks that arecapable of classifying cancer cases with high accuracy rate have become aneffective tool. Computation time is fixed, and extremely high computation speedresults from the parallel structure. Moreover, the approach is fault-tolerantbecause of the distributed nature of network knowledge. General solutions canbe learned from presented training data. Neural networks eliminate therequirement to produce an explicit model of a process. Moreover, these networkscan easily model parts of a process that cannot be modeled or even usuallyunidentified.
A neural network could learn from incomplete and noisy data. Chapter 2 Background Variousmethods of machine learning discussed in the referred papers are discussedbelow: The purely supervised learning algorithms:1. Logistic Regression -using Theano for something simple2. Multilayerperceptron – introduction to layers3. Deep Convolutional Network – a simplified version of LeNet5The unsupervised and semi-supervisedlearning algorithms:· AutoEncoders, Denoising Autoencoders – description ofautoencoders· StackedDenoising Auto-Encoders – easy steps intounsupervised pre-training for deep nets· RestrictedBoltzmann Machines – single layer generative RBM model· DeepBelief Networks – unsupervised generative pre-training of stackedRBMs followed by supervised fine-tuningAll ofthese will be discussed ahead.
2.1 MethodsArtificial Neural NetworkANNa. Input layerThe inputlayer receives the values of the explanatory attributes for each observation.Usually, the number of input nodes in an input layer is equal to the number ofexplanatory variables, the patterns are introduced to the network, whichcommunicate to one or more ‘hidden layers’. Nodes of this layer do not changethe data.
They receive a single value on their input and duplicate the value totheir many outputs-the hidden nodes.b. Hidden layerThe Hiddenlayers which can be many in number, apply given transformations to the inputvalues inside the network. It connects with outgoing arcs to output nodes or toother hidden nodes. In this, the actual processing is done via a system ofweighted ‘connections’. The values entering a hidden node are multiplied byweights, a set of predetermined numbers stored in the program.
The weightedinputs are then added to produce a single number.c. Output layerOutputlayers are linked from the hidden layers.
They receive connections from hiddenlayers or from input layer and return an output value that corresponds to theprediction of the response variable. In classification problems, there isusually only one output node. Data is changed in this layer of the network. Theability of the neural network to provide useful data manipulation lies in theproper selection of the weights.
Some submethods of ANN: a) Fuzzy neuralnetwork: A neuro-fuzzy systemis represented as special three-layer feedforward neural network.. The first layer corresponds to the inputvariables. The second layer symbolizes the fuzzy rules. The third layer represents the output variables.The fuzzy sets areconverted as (fuzzy) connectionweights.b) k-NN: n pattern recognition, the k-nearest neighbors algorithm(k-NN) is a non-parametric method used for classification andregression. In both cases, the input consists of the k closest trainingexamples in the feature space.
The output depends on whether k-NN is used for classification or regressionc) Multilayer perception (MLP): A multilayer perceptron (MLP) is aclass of feedforward artificial neural network. An MLP consists of at leastthree layers of nodes. Except for the input nodes, each node is a neuron thatuses a nonlinear activation function.d) Self-organisingmap: A self-organizing map (SOM) or self-organizing feature map (SOFM) is a type of artificial neural network (ANN) that is trained using unsupervised learning toproduce a low-dimensional (typically two-dimensional), discretizedrepresentation of the input space of the training samples, called a map, and is therefore a method todo dimensionality reduction. Support Vector Machine(SVM):A SupportVector Machine (SVM) is a discriminative classifier formally defined by aseparating hyperplane.
In other words, given labeled training data (supervisedlearning), the algorithm outputs an optimal hyperplane which categorizes newexamples.Hybrid network k-SVM: It uses John Platt’s SMO algorithm forsolving the SVM QP problem an most SVM formulations. On the spoc-svc, kbb-svc, C-bsvc and eps-bsvr formulationsa chunking algorithm based on the TRON QP solver is used. Formulticlass-classification with $k$ classes, $k > 2$, ksvm uses the`one-against-one’-approach, in which $k(k-1)/2$ binary classifiers are trained;the appropriate class is found by a voting scheme, The spoc-svc and the kbb-svc formulationsdeal with the multiclass-classification problems by solving a single quadraticproblem involving all the classes. If the predictor variables include factors,the formula interface must be used to get a correct model matrix.
Decision Tree: A decision tree is a decision support tool that usesa tree-like graph or modelof decisions and theirpossible consequences, including chance event outcomes, resource costs, andutility. It is one way to display an algorithm that only contains conditionalcontrol statements.C4.5/J48: C4.5 is an algorithm used togenerate a decision tree developed by Ross Quinlan. C4.5 is an extension ofQuinlan’s earlier ID3 algorithm. The decision trees generated by C4.
5 can beused for classification, and for this reason, C4.5 is often referred to as astatistical classifier. Bayesin network (BN) :A Bayesian network, Bayes network, belief network, Bayes(ian) model or probabilistic directed acyclic graphical modelis a probabilistic graphical model (a type of statistical model) thatrepresents a set of variables and their conditional dependencies via a directedacyclic graph.Supposethat there are two events which could cause grass to be wet: either thesprinkler is on or it’s raining. Also, suppose that the rain has a directeffect on the use of the sprinkler (namely that when it rains, the sprinkler isusually not turned on).
Then the situation can be modeled with a Bayesiannetwork (shown below). All three variables have two possible values, T (fortrue) and F (for false). Deep learning Deep Learning is a new area ofMachine Learning research, which has been introduced with the objective ofmoving Machine Learning closer to one of its original goals: ArtificialIntelligence. See these course notes for a brief introduction to MachineLearning for AI and an introduction to Deep Learning algorithms.
Deeplearning allows computational models that are composed of multiple processinglayers to learn representations of data with multiple levels of abstraction.These methods have dramatically improved the state-of-the-art in speech recognition,visual object recognition, object detection and many other domains such as drugdiscovery and genomics. Deep learning discovers intricate structure in largedata sets by using the backpropagation algorithm to indicate how a machineshould change its internal parameters that are used to compute therepresentation in each layer from the representation in the previous layer.Deep convolutional nets have brought about breakthroughs in processing images,video, speech and audio, whereas recurrent nets have shone light on sequential datasuch as text and speech. Performance Measures used:• Sensitivity= tp /(tp+fn)• Specificity= tn/(fp+tn)• Accuracy= tp+tn/ tp+fp+fn+tn • Area under the curve where tp are true positive, fp falsepositive, fn false negative, and tn true negative counts. Toolsused: a) Datasets from UCI repository: Wisconsin original breastcancer dataset, Breast cancer dataset(long), colon cancer dataset, lung cancerdataset, all processed datasets but coding changed ( in JAVA) before runningthrough Weka. b) WEKA application software: Weka contains a collection of visualization tools andalgorithms for dataanalysis and predictivemodeling, together withgraphical user interfaces for easy access to these functions. The originalnon-Java version of Weka was a Tcl/Tkfront-end to (mostly third-party) modeling algorithms implemented in otherprogramming languages, plus datapreprocessing utilitiesin C, and a Makefile-basedsystem for running machine learning experiments.
This original version wasprimarily designed as a tool for analyzing data from agriculturaldomains, but the more recent fully Java-based version (Weka 3), for whichdevelopment started in 1997, is now used in many different application areas,in particular for educational purposes and research. Advantagesof Weka include:a) Free availability under the GNUGeneral Public License.b) Portability, since it is fully implemented inthe Javaprogramming language and thus runs on almost any moderncomputing platform.
c) A comprehensive collection of datapreprocessing and modeling techniques.d) Ease of use due to its graphical userinterfaces. Chapter 3 LiteratureReview 3.1 Classifying tumor cellPhysicianscan benefit from the within abstract tumor attributes by better understandingthe properties of different types of tumors 2. Different kinds of machinelearning and statistical approaches are used to classify tumor cells 14.
Hybrid methods have proved to be very much accurate. K-SVM methodology, ahybrid of ANN and SVM improves the accuracy to 97.38%, when tested on theWisconsin Diagnostic Breast Cancer (WDBC) data set from the University ofCalifornia – Irvine machine learning repository. The results shows capabilityof diagnosis and time saving during the training phase 2. 3.2 Various techniques used for predictionAccordingto the better designed and validated studies machine learning methods haveproved to substantially (15-25%) improve the accuracy of predicting cancersusceptibility, recurrence and mortality 27. Even though some progress hasbeen achieved, there are still many challenges remaining and directions forfurther research, such as developing better classification algorithms andintegration of classifiers to reduce false positives 1.
Using automatedcomputer tools and in particular machine learning to facilitate and enhancemedical analysis and diagnosis is a promising and important area 3. A reviewarticle showed survey application, opportunities and barriers of intelligentdata analysis as an approach to improve cancer care management. Here, it isshown that Intelligent Data Analysis (IDA) definitely has significant role inimproving cancer care, prevention, increased speed and accuracy in diagnosisand treatment, reduce costs, proving in every way that machine learning is apromising way of detecting cancer. It has been noticed that different methodsprovide high accuracies for different types of cancer. Neuralnetworks are currently the most active research area in medical science,especially in the areas of cardiology, radiology, oncology, urology and etc.Classification between the normal, abnormal and cancerous cells identified byusing an artificial neural network, produces accurate results than the manualscreening methods like Pap smear and Liquid cytology based (LCB) test 14.
ButANN mentioned is an older technique, better ML techniques are available 10. Agrowing dependence on protein biomarkers and microarray data, a strong biastowards applications in prostate and breast cancer, and a heavy reliance onolder techniques such as artificial neural networks (ANNs), support vectormachine (SVM) can be noticed instead of more recently developed or more easilyinterpretable machine learning methods27. The motivations beyond usingensemble classifiers are that the results are less dependent on peculiaritiesof a single training set and because the ensemble system outperforms theperformance of the best base classifier in the ensemble 26. Results of neuralnetwork structures can be enhanced by proper settings of neural networkparameters. Although neural network techniques provide good classificationrate, but their training time is very high. Several researchers thus hybridizeneural network techniques with optimization algorithms like PSO for furtherenhancement of accuracy.
The optimization algorithms are used fordimensionality reduction, they suppresses search space and therefore, reducesthe training time of neural network. FLANN alone shows 63.4% accuracy whereasPSO-FLANN provides good classification rate with 92.36%. In future study, accuracyof neural network can be enhanced by increasing the number of neurons in thehidden layer .Different training and learning rules can be applied for trainingANN in order to improve the performance of classifier 14. Results inpaper 13 shows that with increased number of training samples, the number offalse positive and false negative rates decreases and the author of the paperfurther agrees in increasing the number of patients to be tested forimplementation of their proposed method called GONN. In one paper 6 SSL wasproved to be the best among ANN and SVM, and the differences in performancewere statistically significant 6.
3.3 Gap and superior methodDeeplearning- deep neural network has been considered to be a superior method by severalresearches. DeeperBind, (mentioned in 20) an application using deep learningmethod can model the positional dynamics of probe sequences. It can be trainedand tested on datasets containing different length sequences. It has beenclaimed that this is the most accurate pipeline that can predict bindingspecificities of DNA sequences from the data produced by high-throughputtechnologies through utilization of the power of deep learning 23. A databaseHGMD consisting of information about germline mutation in nuclear genes hasbeen compared with other related databases like OMIM, ClinVar and was found tobe superior 9.
So, data can be collected from this database too though whatwe used in this paper is from UCI repository including Wisconsin Breast canceroriginal dataset. In paper 20 among the four main algorithms: SVM, NB, k-NNand C4.5 on the Wisconsin Breast Cancer (original) datasets, SVM has proven itsefficiency in Breast Cancer prediction and diagnosis and obtains the bestperformance in terms of precision and low error rate 20. But deep learningmethods were not implemented for the dataset, so we are interested in applyingthe same dataset used in the mentioned paper, using deep learning techniques. Chapter 4 MethodologyRunning methods via WekaForfinding out which machine learning method is superior, Weka, a software whichis a collection of machine learning algorithms was used. Targeted methods areSVM shown as SMO, ANN as multi layer perception, Decision tree as decisionstump, Bayes Network specially NaiveBayes have been used to run through thedatasets of breast, colon and lung cancer. Wisconsin original dataset forbreast cancer have been run through Weka for checking.
These aresome screenshots showing results: ANN for Wisconsin original dataset(shows “CorrectlyClassified Instances” as accuracy of 96.2264% )SVM for the same:SVM(SMO) for colon cancer dataset: SVM for lung cancerdataset (having highest accuracy/lowest incorrectness I.e. 4.4235%)Chapter 5 Resultsand Discussion Methods Sub-method (If any) Types of cancer Paper no.
Accuracy Sensitivity Specificity Colon cancer Breast cancer Oral cancer Basal Cell cancer Lung cancer Decision Tree • 5 0.936 0.958 0.907 17 0.93 – – Artificial Neural Network • 6 0.65 0.73 0.
58 • • 17 0.835 – – Multi-layer Perception(MLP) • 5 0.947 0.956 0.
928 Support Vector Machine • 1 0.6456 1 0.6449 6 0.51 0.
65 0.52 5 0.957 0.971 0.945 17 0.69 – – • 17 0.75 Bayesin Network • 17 – – – Deep Learning • 0.921 0.
887 0.941 Semi-supervised Learning • 6 0.71 0.76 0.65 Graph based • 17 0.807 – – • 17 0.
767 – – Table: A framework of accuracy of different cancerprediction techniques The dotsindicate the category of cancer of the method involved. For Wisconsin original dataset for breastcancer, according the finding of this paper the highest accuracy could beachieved using ANN, it could be seen that with ANN huge datasets were not beingable to get any results from i.e with Weka it would take almost hours or morethan that to find the results.
Thus, proving ANN inappropriate for hugedatasets. It can alsobe seen that for long datasets the accuracy is very low for example for coloncancer dataset the highest accuracy obtained was via SMO and the accuracy comesto 85.4839%. J48/C4.
5-a decision tree method gives near 83% clearly indicatingwe need a better method for long datasets for more accuracy.MLP (MultiLayerPerception) gives 97.1% accuracy and PNN(Probabilistic Neural Network) whichprovides 96% accuracy, Perception with 93 % and ART1 shows 92% accuracy aswell14. “Recall”shown in the screenshot is the sensitivity of each method. Chapter6 ConclusionIt canclearly be seen from the screenshots and calculated results above that ANN iscomparatively an old approach for most of the classification purposes(exceptions are with short datasets).
Support Vector Machine proves to be asuperior measure though for colon and lung cancer datasets. Further,deep learning methods can be and should be applied for better accuracy to beachieved with its sophisticated approach.