Artificial Intelligence


Artificial intelligence (AI) is the field of software engineering which builds computer systems, and occasionally robots, to perform tasks which require intelligence. The term "artificial intelligence" was coined by John McCarthy (1958), then a graduate student at Princeton, at a summer workshop held at Dartmouth in 1956. This two month workshop marks the official birth of AI and brought together young researchers who would nurture the field as it grew over the next several decades: Marvin Minsky, Claude Shannon, Arthur Samuel, Ray Solomonoff, Oliver Selfridge, Allen Newell and Herbert Simon.

During the 1940s many researchers, under the guise of Cybernetics, had worked out much of the theoretical groundwork for AI and even designed the first computers. Among the most significant contributions upon which AI depended were Alan Turing's theory of computation and ACE computer, John Von Neumann's ENIAC computer, Claude Shannon's theory of communication, Norbert Weiner's theory of information and negative feedback, Warren McCulloch and Walter Pitts' neuronal logic networks, W. Ross Ashby's theory of learning and adaptive mechanisms, and W. Grey Walter's autonomous robotic tortoises. The young Dartmouth group differed from the earlier work of the Cyberneticians in that they concerned themselves primarily with writing digital computer programs which performed tasks that for humans would be deemed to require intelligence, rather than building machines or modelling brains.

The key aspects of intelligent behaviour around which AI research focused included automated reasoning, decision making, machine learning, machine vision, natural language processing, pattern recognition, automated planning, problem-solving, and robot control. This field of research set itself ambitious goals, seeking to build machines which could "out think" humans in particular domains of skill and knowledge, and achieving some success in this. Some researchers even speculated that it would be possible to build machines which could imitate human behaviour in general by the end of the 20th century, but most researchers today consider this goal to be unattainable by the end 21st century.

The first AI program, the Logic Theorist, was presented by Newell and Simon (1956) at the Dartmouth workshop. Logic Theorist proved theorems of mathematical logic from a given set of axioms and a set of rules for deducing new axioms from those it already had. Given a theorem, Logic Theorist would attempt to build a proof by trying various chains of deductive inference until it arrived at the desired theorem. This program was followed up by the General Problem Solver (Newell & Simon, 1961) which demonstrated that the technique of proving theorems could be applied to all sorts of problems by defining a "goal," and conducting a search to find a series of valid moves which lead from what is already known to the goal that is sought. This technique can work well for simple problems, but the total number of alternative moves which are possible can grow exponentially in the number of steps to a solution, and since the program has to keep backtracking and trying all the alternate routes, the technique quickly breaks down for problems with many steps. These challenges led Newell and Simon to suggest that AI research on problem-solving ought to focus on finding good heuristics to use when searching. A heuristic is a search strategy or rule-of-thumb, and a good heuristic helps one find a solution faster by reducing the number of dead-ends encountered during a search.

On June 27, 1963, AI research was catapulted forward by huge grant from the US military's DARPA to the MIT AI Laboratory, that was partly motivated by US fears after the Soviet launch of Spuntik, and partly motivated by extreme enthusiasm on the part of AI researchers that computers would soon have human-like intelligence. By the early 1970s, actual research at MIT was limiting search spaces by studying very simplified application domains, or micro-worlds as they came to be called. The most famous program, SHRDLU, planned manipulations in a world consisting only of wooden blocks sitting on a table called the blocks world (Winograd, 1972). While these systems did what their designers intended, often led to theoretically interesting results, and eventually developed into useful technologies, they did not live up to either the public or military expectations for intelligent machines. By the end of the 1970s, DARPA became deeply concerned that AI would fail to deliver on its promises and eventually cut its research funding.

The insights gained from the micro-worlds research eventually did find application in the area of automated planning. Generally, planning starts from knowing the current state of the "world," the desired state of the world, and a set of actions, called operators, which can be taken to transform the world. The Stanford Research Institute Problem Solver (STRIPS) was an early planner which used a language to describe actions that is still widely used and enhanced (Fikes & Nilsson, 1971). The STRIPS operators consist of three components: 1) the action description, 2) the preconditions of the action, or the way the world must be before the action can be taken, and 3) the effect, or how the world has been changed after the action has been taken. To develop a plan, the system searches for a reasonably short or cost-efficient sequence of operators which will achieve the goal. Planning systems have been used widely to generate production schedules in factories, to find the most efficient ways to layout circuits on microchips or to machine metal parts, and to plan and coordinate complex projects involving many people and organizations such as space shuttle launches.

One AI technology which had early real-word applications are expert systems. Expert systems are systems which utilize a large amount of knowledge about a small area of expertise in order to solve problems in that domain. The first such system was DENDRAL (Buchanan et al., 1969), which could logically infer the structure of a molecule if given its chemical formula and information from a mass spectrogram of the molecule. This difficult task was achieved by DENDRAL due to its being provided with rules-of-thumb, and tricks for recognizing common patterns in the spectrograms, developed in collaboration with Joshua Lederberg, a Nobel prize-winning chemist. The next generation expert system MYCIN used rules which incorporated uncertainty as probability weights on inferences. MYCIN used some 450 such rules to diagnose infectious blood diseases (Buchanan & Shortliffe, 1984). Expert systems have proven to be one of the most successful applications of AI so far. Thousands of expert systems are currently in use for medical diagnoses, servicing and trouble-shooting complex mechanical devices, and aiding information searches.

Another successful AI technology has been machine learning, which develops techniques for machines to actually learn from experience and improve over time. Machine learning was first conceived by Ashby (1940), while the first successful program for learning was Samuel's (1959) checkers-playing program. Most forms of machine learning involve using statistical induction techniques to infer rules and discover relationships in sample or training data. Machine learning is useful in solving problems in which the rules governing the domain are difficult to discover, and where a large amount of data is available for analysis.

Pattern recognition is the most common type of problem for machine learning applications. The most popular kind of pattern recognition systems, and perhaps the most popular single AI technology, are neural networks which learn from experience by adjusting weighted connections in a network. A typical feedforward neural network performs some version of statistical pattern classification-they induce statistical patterns from training data to learn a representative function, then apply this function to classify future examples. A classification is simply a mapping function from inputs to outputs, and so a neural net work just maps the objects to be classified into their types or classes.

Consider, for example, the problem of classifying some 2-dimensional geometric shapes into one type of the set: [square, circle, triangle, other]. A total mapping function would assign every member of the set of 2-dimensional geometric shapes to one of these four types. There are many possible mapping functions, however, and only a few of these will classify the inputs in a desirable way. Good techniques for neural networks will find these mappings efficiently, and avoid getting stuck in statistical dead-ends, called "local minima." Other examples of pattern recognition include speech recognition, face-recognition, hand-written letter recognition, robotic vision and scene analysis-where the program must match audio or visual patterns to words, faces, letters, objects or scenes respectively (Duda & Hart, 1973).

Another important area of AI research has been Natural Language Processing (NLP). NLP attempts to provide computers with the ability to understand natural human languages, such as English or Russian. Work in this area has drawn heavily on theories of grammar and syntax borrowed from computational linguistics, and attempted to decompose sentences into their grammatical structures, assign the correct meanings to each word, and interpret the overall meaning of the sentence. This task turned out to be very difficult because of the possible variations of language, and the many kinds of ambiguity which exist. The applications of successful NLP programs have included machine translation from one natural language to another, and natural language computer interfaces. A great deal of success has been achieved in the related areas of optical character recognition and speech recognition which employ machine learning techniques to translate text and sound inputs into words but stop short of interpreting the meaning of those words.

Game-playing programs have done much to popularize AI. Programs to play simple games like tic-tack-toe (noughts and crosses) are trivial, but games such as checkers (draughts) and chess are more difficult. At IBM, Samuel began working in 1952 on the program which would be the first to play tournament level checkers, a feat it achieved by learning from its own mistakes. The first computer to beat a human grandmaster in a chess match was HITECH (Berliner, 1989). And in May of 1997, IBM's DEEP BLUE computer beat the top-ranked chess player in the world, Gary Kasparov. Unfortunately, success in a single domain such as chess does not translate into general intelligence, but it does demonstrate that seemingly intelligent behaviours can be automated at a level of performance which exceeds human capabilities. One of the most common consumer AI applications is probably the computer opponent of video games which "plays against" the human user. These AIs use various techniques of varying sophistication to challenge human opponents, and often allow the human to select the skill level of their opponent.

It would be difficult to argue that the technologies derived from AI research had a profound effect on people's ways of life by the end of the 20th century. Yet AI technologies have been successfully applied in many industrial settings, medical diagnoses and video games. And programming techniques developed in AI research become incorporated into more widespread programming practices, such as high-level programming languages and time-sharing operating systems. While AI did not succeed in constructing a computer which displays the general mental capabilities of a typical human, such as the HAL computer in Arthur C. Clarke and Stanley Kubrick's film 2001: A Space Odyssey, it has produced programs which can perform some apparently intelligent tasks, and often at a much greater level of skill and reliability than humans (see Stork, 1998). More than this, it has provided a powerful and defining image of what computer technology might someday be capable of achieving.

by Peter M. Asaro



Ashby, W. Ross, "Adaptiveness and Equilibrium," Journal of Mental Science, Vol. 86: 478-483, 1940.


Berliner, H. J., "HITECH Chess: From Master to Senior Master with No Hardware Change," In MIV-89: Proceedings of the International Workshop on Industrial Applications of Machine Intelligence and Vision (Seiken Symposium), pp. 12-21, 1989.


Buchanan, B. G., G. L. Sutherland, and E. A. Fiegenbaum, "Heuristic DENDRAL: A Program for Generating Explanatory Hypotheses in Organic Chemistry," in B. Meltzer, D. Michie and M. Swann, eds. Machine Intelligence 4, Edinburgh, UK: Edinburgh University Press, pp. 209-254, 1969.


Buchanan, B. G., and E. H. Shortliffe (editors), Rule-Based Expert Systems: The MYCIN Experiments of the Stanford Heuristic Programming Project. Reading, MA: Addison-Wesley, 1984.


Duda, R., and P. Hart., Pattern Recognition and Scene Analysis. John Wiley, New York, 1973.


Fikes, R. E. and N. J. Nilsson, "STRIPS: A New Approach to the Application of Theorem-Proving to Problem-Solving," Artificial Intelligence, 2 (3-4): 189-208, 1971.


McCarthy, John, "Programs with common sense," Proceedings of the Symposium on Mechanisation of Thought Processes, vol. 1. London: Her Majesty's Stationery Office, pp. 77_84, 1958. Reprinted in Minsky, M. L., Ed. Semantic Information Processing. Cambridge, MA: MIT Press, pp. 403-418, 1968.


Newell, Allen, and Herbert Simon, "The logic theory machine: a complex information processing system," IRE Transactions on Information Theory IT_2, 3: 61_79, 1956.


Newell, Allen, and Herbert Simon, "GPS, A Program that Simulates Human Thought," in H. Billing (editor), Lerenende Automaten, pp. 109-124, 1961. Reprinted in E. A Feigenbaum and J. Feldman (editors), Computers and Thought. New York, NY: McGraw-Hill, 1963, pp. 279-293.


Samuel, Arthur, "Some Studies in Machine Learning Using the Game of Checkers," IBM Journal of Research and Development, 11(6): 601-617, 1959.


Stork, David G. (editor), HAL's Legacy: 2001's Computer as Dream and Reality. Cambridge, MA: MIT Press, 1997.


Winograd, Terry, Understanding Natural Language, New York, NY: Academic Press, 1972.


For Further Research


Anderson, James A., and Edward Rosenfeld (editors), Talking Nets: An Oral History of Neural Networks, Cambridge, MA: MIT Press, 1998.


Barr, A., E. A. Feigenbaum, and P. R. Cohen (editors), The Handbook of Artificial Intelligence, Vols. 1-4. Stanford and Los Altos, CA: Heuris Tech Press and Kaufmann, (1981-89).


Crevier, Daniel, AI: The Tumultuous History of the Search for Artificial Intelligence, New York, NY: Basic Books, 1993.


Dreyfus, Hubert, What Computers Can't Do: A Critique of Artificial Reason, New York: Harper and Row, 1979. Reprinted as What Computers Still Can't Do: A Critique of Artificial Reason. Cambridge, MA: MIT Press, 1992.


Haykin, Simon, Neural Networks: A Comprehensive Foundation, Englewood Cliffs, NJ: Prentice-Hall, 1998.


Jurafsky, Dan et al., Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics and Speech Recognition, Englewood Cliffs, NJ: Prentice-Hall, 2000.


McCorduck, Pamela, Machines Who Think, San Francisco, CA: W. Freeman and Company, 1979.


Mitchell, Tom, Machine Learning, New York, NY: McGraw-Hill,1997.


Newborn, M. and M. Newborn, Kasparov Versus Deep Blue: Computer Chess Comes of Age, 1996.


Russell, S. J., and P. Norvig, Artificial Intelligence: A Modern Approach, Englewood Cliffs, NJ: Prentice-Hall, 1995.


Shapiro, S. C. (editor), Encyclopedia of Artificial Intelligence, 2nd ed. New York: Wiley, 1992.


Webber, B. L., and N. J. Nilsson (editors), Readings in Artificial Intelligence, San Mateo, CA: Morgan Kaufmann, 1981.