For so long, computer scientists have struggled
with the question, can computers truly learn to perform a task through examples
or previously solved tasks? Can computers improve significantly on the basis of
past mistakes? So, to solve these questions “machine learning” research began and
it is now working towards making this a possibility in computers.
In order for a computer or computer controlled
robot to perform a task, traditional programming demands that a programmer
writes a correct algorithm to perform such task and then implement said
algorithm in the computer using a programming language. Such process is usually
a tedious and time consuming one which is best done by trained personnel 7.
Machine learning also promises to reduce the stress of hand programming.
Thus, Machine learning according to Tom Mitchell
“is concerned with the question of how to construct computer programs that
automatically improve with experience” 3.
This paper therefore looks to understand what
machine learning is and how it can improve software testing, particularly the
testing method known as “FUZZING”
which consists of repeatedly testing an application with modified, or fuzzed,
inputs with the goal of finding security vulnerabilities in input-parsing code.
J. Cheatham wrote a paper on the use of Machine Learning techniques to identify
attributes that are important in predicting software testing costs and software
testing time in a particular company.
ZHANG and JEFFREY J.P. TSAI worked on the possibility of applying machine
learning in software engineering, whereby in the paper they provided the
characteristics and applicability of some frequently utilized machine learning
algorithms. They also offer some guidelines on applying machine learning
methods to software engineering tasks.
in 2017, William Blum, Rishabh Singh, and Mohit Rajpal all Microsoft
researchers began a research project looking at ways to improve fuzzing
techniques using machine learning and deep neural networks. They wanted to see
what a machine learning model could learn if we were to insert a deep neural
network into the feedback loop of a grey box fuzzer.
Godefroi, Hila Peleg, and Rishabh Singh in their paper “Learn&Fuzz: Machine
Learning for Input Fuzzing” show how to automate the generation of an input
grammar suitable for input fuzzing using sample inputs and neural-network-based
statistical machine-learning techniques. They then present a detailed case
study with a complex input format, namely PDF, and a large complex
security-critical parser for this format, namely, the PDF parser embedded in
Microsoft’s new Edge browser. They also present a new algorithm for this
learn challenge which uses a learnt input probability distribution to
intelligently guide where to fuzz inputs.
2.0 SUMMARY OF FINDINGS FROM
Tom Mitchell stated in his book
“MACHINE LEARNING” 3 that:
computer program is said to learn
from experience E with respect to some class of tasks T and performance measure
P, if its performance at tasks in T, as measured by P, improves with experience
For example, a computer program that
learns to play chess might improve its
performance as measured by its
ability to win at the class
of tasks involving playing checkers
games, through experience
obtained by playing games against itself.
In general, a well-defined learning problem, involves these three features: the
class of tasks, the measure of performance to be improved, and the source of
The emergence of Machine Learning was
as a result of two significant discoveries:
The first was the realization of Arthur
Samuel in 1959 – that rather than teaching computers everything they need to
know about the world and how to carry out tasks, it might be possible to teach
them to learn for themselves.
The second, was the emergence of the
internet, and the explosive increase in the amount of digital information made
available for analysis.
To better understand machine learning,
it would be good to consider its role within the following three niches in the
software world as stated by Tom Mitchell 2:
mining: Data mining has to do with the of use
older data saved overtime to improve subsequent decision making.
applications: Machine-learning algorithms can play
an essentially useful in developing applications that have proven too difficult
for traditional programming such as face recognition and speech understanding.
software applications: In computer applications, such as
online news browsers and personal calendars, it would be preferable that such
systems can automatically customizes to the needs of different users after it
has been developed.
Artificial Intelligence, Machine Learning and Deep Learning;
Artificial Intelligence, Machine
Learning and Deep Learning, three terms often used interchangeably making the
differences between this three somewhat unclear. The simplest way to actually
understand their relationship is by imagining three concentric circles with AI
coming first, then machine learning — a subset of AI, and finally deep learning
— which is an approach in machine learning —
fitting inside both. Calum McClelland thus differentiates this three as
Artificial Intelligence: First coined
by John McCarthy in 1956, “AI involves machines that can perform tasks that are
characteristic of human intelligence”.
Machine Learning: Machine Learning as
simply a way of achieving AI. Coined by Arthur Samuel not too long after AI, in
1959, defining it as, “the ability to learn without being explicitly
Deep learning: Deep learning as one of
many approaches to machine learning.
Real Life Applications;
Some real-life examples of the use of
Learning to recognize spoken words: The
SPHINX system (e.g., Lee 1989) learns speaker-specific strategies for
recognizing the primitive sounds (phonemes) and words from the observed speech
Learning to drive an autonomous
vehicle: The ALVINN system (Pomerleau 1989) has used its learned strategies to
drive unassisted at 70 miles per hour for 90 miles on public highways among
Learning to classify new astronomical
structures: The decision tree learning algorithms have been used by NASA to
learn how to classify celestial objects from the second Palomar Observatory Sky
Survey (Fayyad et al. 1995).
Learning to play world-class backgammon:
The world’s top computer program for backgammon, TD-GAMMON (Tesauro 1992,
1995). learned its strategy by playing over one million practices games against
itself. It now plays at a level competitive with the human world champion.
And in testing Microsoft have released
a tool, called Microsoft
Security Risk Detection, which makes uses of fuzz testing, or
fuzzing and significantly simplifies security testing and does not require you
to be an expert in security in order to root out software bugs.
3.3 Fuzzing it with Machine
Software testing has always been a
tedious yet important part of the software development cycle, and fuzz testing
is one of the mostly used automated software testing technique. Fuzzing is done
by presenting a target program with crafted malicious input designed to discover unexpected behaviors
such as crashes, buffer overflows, memory errors, and exceptions.
current fuzzing techniques can be broadly categorized into three main
categories: i) Blackbox fuzzing: rely solely on the sample input files to
generate new inputs. ii) Whitebox fuzzing: which analyze the target program
either statically or dynamically to guide the search for new inputs aimed at
exploring as many code paths as possible. and iii) Greybox fuzzing: which make
use of a feedback loop to guide their search based on observed behavior from
previous executions of the program. 6
Neural networks can then be made to
learn patterns in the input files from previous fuzzing explorations to guide
the future fuzzing explorations.
FUTURE RESEARCH/DEVELOPMENT PROPOSITIONS
The Neural fuzzing research project
done by Microsoft is just scratching the surface of what can be achieved using
deep neural networks for fuzzing. For now, the model only learns fuzzing
locations, but it could also use it to learn other fuzzing parameters such as
the type of mutation or strategy to apply.
possibility of developing computer programs that are capable of improving with
experience can lead the creation of computer software developed with greater
ease yet able to optimize itself over time.
emergence of the internet and explosion in available data that followed has
greatly helped in the development of machine learning and with new data being generated daily
machine learning still has a long way to go in its development and as such can better be incorporated into the
field of software development, seeing that machine learning is a subset of
Artificial Intelligence, machine learning’s growth will soon be involved in
solving the problem AI aims at solving by truly making computer programs are
considered to be smart being able to perform tasks that are characteristic
of human intelligence.