1.0 INTRODUCTION.For so long, computer scientists have struggledwith the question, can computers truly learn to perform a task through examplesor previously solved tasks? Can computers improve significantly on the basis ofpast mistakes? So, to solve these questions “machine learning” research began andit is now working towards making this a possibility in computers.In order for a computer or computer controlledrobot to perform a task, traditional programming demands that a programmerwrites a correct algorithm to perform such task and then implement saidalgorithm in the computer using a programming language. Such process is usuallya tedious and time consuming one which is best done by trained personnel 7.Machine learning also promises to reduce the stress of hand programming.Thus, Machine learning according to Tom Mitchell”is concerned with the question of how to construct computer programs thatautomatically improve with experience” 3.
This paper therefore looks to understand whatmachine learning is and how it can improve software testing, particularly thetesting method known as “FUZZING”which consists of repeatedly testing an application with modified, or fuzzed,inputs with the goal of finding security vulnerabilities in input-parsing code.5 2.0RELATED LITERATURE/WORKS§ ThomasJ.
Cheatham wrote a paper on the use of Machine Learning techniques to identifyattributes that are important in predicting software testing costs and softwaretesting time in a particular company.§ DUZHANG and JEFFREY J.P.
TSAI worked on the possibility of applying machinelearning in software engineering, whereby in the paper they provided thecharacteristics and applicability of some frequently utilized machine learningalgorithms. They also offer some guidelines on applying machine learningmethods to software engineering tasks.§ Backin 2017, William Blum, Rishabh Singh, and Mohit Rajpal all Microsoftresearchers began a research project looking at ways to improve fuzzingtechniques using machine learning and deep neural networks. They wanted to seewhat a machine learning model could learn if we were to insert a deep neuralnetwork into the feedback loop of a grey box fuzzer.
§ PatriceGodefroi, Hila Peleg, and Rishabh Singh in their paper “Learn&Fuzz: MachineLearning for Input Fuzzing” show how to automate the generation of an inputgrammar suitable for input fuzzing using sample inputs and neural-network-basedstatistical machine-learning techniques. They then present a detailed casestudy with a complex input format, namely PDF, and a large complexsecurity-critical parser for this format, namely, the PDF parser embedded inMicrosoft’s new Edge browser. They also present a new algorithm for thislearn challenge which uses a learnt input probability distribution tointelligently guide where to fuzz inputs.2.0 SUMMARY OF FINDINGS FROMLITERATURETom Mitchell stated in his book”MACHINE LEARNING” 3 that:Acomputer program is said to learnfrom experience E with respect to some class of tasks T and performance measureP, if its performance at tasks in T, as measured by P, improves with experienceE.For example, a computer program thatlearns to play chess might improve itsperformance as measured by itsability to win at the classof tasks involving playing checkersgames, through experienceobtained by playing games against itself.In general, a well-defined learning problem, involves these three features: theclass of tasks, the measure of performance to be improved, and the source ofexperience.
The emergence of Machine Learning wasas a result of two significant discoveries:The first was the realization of ArthurSamuel in 1959 – that rather than teaching computers everything they need toknow about the world and how to carry out tasks, it might be possible to teachthem to learn for themselves.The second, was the emergence of theinternet, and the explosive increase in the amount of digital information madeavailable for analysis.To better understand machine learning,it would be good to consider its role within the following three niches in thesoftware world as stated by Tom Mitchell 2:a. Datamining: Data mining has to do with the of useolder data saved overtime to improve subsequent decision making.
b. Difficult-to-programapplications: Machine-learning algorithms can playan essentially useful in developing applications that have proven too difficultfor traditional programming such as face recognition and speech understanding.c. Customizedsoftware applications: In computer applications, such asonline news browsers and personal calendars, it would be preferable that suchsystems can automatically customizes to the needs of different users after ithas been developed.3.1Artificial Intelligence, Machine Learning and Deep Learning;Artificial Intelligence, MachineLearning and Deep Learning, three terms often used interchangeably making thedifferences between this three somewhat unclear.
The simplest way to actuallyunderstand their relationship is by imagining three concentric circles with AIcoming first, then machine learning — a subset of AI, and finally deep learning— which is an approach in machine learning — fitting inside both. Calum McClelland thus differentiates this three as12:· Artificial Intelligence: First coinedby John McCarthy in 1956, “AI involves machines that can perform tasks that arecharacteristic of human intelligence”. · Machine Learning: Machine Learning assimply a way of achieving AI. Coined by Arthur Samuel not too long after AI, in1959, defining it as, “the ability to learn without being explicitlyprogrammed.”· Deep learning: Deep learning as one ofmany approaches to machine learning.
3.2Real Life Applications;Some real-life examples of the use ofmachine learning: i. Learning to recognize spoken words: TheSPHINX system (e.
g., Lee 1989) learns speaker-specific strategies forrecognizing the primitive sounds (phonemes) and words from the observed speechsignal. ii. Learning to drive an autonomousvehicle: The ALVINN system (Pomerleau 1989) has used its learned strategies todrive unassisted at 70 miles per hour for 90 miles on public highways amongother cars. iii.
Learning to classify new astronomicalstructures: The decision tree learning algorithms have been used by NASA tolearn how to classify celestial objects from the second Palomar Observatory SkySurvey (Fayyad et al. 1995). iv. Learning to play world-class backgammon:The world’s top computer program for backgammon, TD-GAMMON (Tesauro 1992,1995). learned its strategy by playing over one million practices games againstitself. It now plays at a level competitive with the human world champion. v.
And in testing Microsoft have releaseda tool, called MicrosoftSecurity Risk Detection, which makes uses of fuzz testing, orfuzzing and significantly simplifies security testing and does not require youto be an expert in security in order to root out software bugs.3.3 Fuzzing it with MachineLearning;Software testing has always been atedious yet important part of the software development cycle, and fuzz testingis one of the mostly used automated software testing technique. Fuzzing is doneby presenting a target program with crafted malicious input designed to discover unexpected behaviorssuch as crashes, buffer overflows, memory errors, and exceptions.
Thecurrent fuzzing techniques can be broadly categorized into three maincategories: i) Blackbox fuzzing: rely solely on the sample input files togenerate new inputs. ii) Whitebox fuzzing: which analyze the target programeither statically or dynamically to guide the search for new inputs aimed atexploring as many code paths as possible. and iii) Greybox fuzzing: which makeuse of a feedback loop to guide their search based on observed behavior fromprevious executions of the program. 6Neural networks can then be made tolearn patterns in the input files from previous fuzzing explorations to guidethe future fuzzing explorations.4.0FUTURE RESEARCH/DEVELOPMENT PROPOSITIONSThe Neural fuzzing research projectdone by Microsoft is just scratching the surface of what can be achieved usingdeep neural networks for fuzzing. For now, the model only learns fuzzinglocations, but it could also use it to learn other fuzzing parameters such asthe type of mutation or strategy to apply.Thepossibility of developing computer programs that are capable of improving withexperience can lead the creation of computer software developed with greaterease yet able to optimize itself over time.5.0CONCLUDING REMARKSTheemergence of the internet and explosion in available data that followed hasgreatly helped in the development of machine learning and with new data being generated dailymachine learning still has a long way to go in its development and as such can better be incorporated into thefield of software development, seeing that machine learning is a subset ofArtificial Intelligence, machine learning’s growth will soon be involved insolving the problem AI aims at solving by truly making computer programs areconsidered to be smart being able to perform tasks that are characteristicof human intelligence.