Written by David Tebbutt, PC Pro 02/95 item 01 - scanned
Untold time and money has been expended on researching the interactions between humans and computers. But despite all this effort, most people are still stuck with the mouse and the keyboard as our main means of input, command and control. Some use trackballs or mini-joysticks, such as IBM's TrackPoint, but these are merely mouse substitutes. In the mass market, the aged mouse and its derivatives, along with the even more ancient keyboard, are the primary means of interaction with computers.
Can this really be the best the computer industry has to offer? Have all those millions of days been in vain? Certainly not, because promising new technologies are in the offing, just waiting for computer power to increase (and prices to drop) to the point at which they'd be chosen over present tools. Speech recognition is increasing its influence, as are pens, touchpads and strange virtual reality gadgets. However, none of them is likely to become as ubiquitous as the mouse - not in the near future, anyway.
The truth is that the mouse and its derivatives are simple, take up little space and are more or less intuitive to use. The keyboard is a nightmare for non-typists, but anyone familiar with it can get information into a computer more quickly that way than by using the main alternative of speech recognition. Various new keyboard layouts have been tried out and judged effective. The Maltron, Dvorak and Cy Endfield's six-key 'chord' keyboards all have their fans, but none has caught on to the extent of the familiar QWERTY arrangement.
There are few means of expressing your intentions to a computer. You can speak, pull faces, gesture, posture, touch things and move your eyes. Some specialised devices used by the handicapped can sense sucks and blows on a tube or react to tongue movement. Although these are innovative creations, they won't become mainstream because of their low bandwidth - each action has only a limited number of variations, so the information transmitted can't be very complex. A lot of effort is required to achieve very little.
The aim of human computer interaction is, or should be, to provide as natural an interface as possible. In fact, the perfect solution would be one in which the user wasn't aware of an 'interface' at all. You don't talk about the touch interface when you shake hands or the speech interface when you talk; you just do it. So perhaps the absence of an interface would be the computer user's Nirvana.
A world without keyboards, mice, joysticks and other contraptions would be a world in which the computer could shrink in size. If speech became the primary input, you might be able to wear your computers on your wrists or in your hair. This assumes, of course, that components continue to get smaller. The world has moved from room-sized computers to pocket-sized devices of equivalent power in the past 20 years, so it's not unreasonable to expect this miniaturisation to continue.
Output is a different problem, and not strictly part of this issue. Nevertheless, your small computers will be able to whisper in your ears or beam their information to other devices - be they printers, telephone antennae, other computers or display goggles.
On the input side, designs are often dictated by the requirements of the hardware or software. No technology illustrates this better than the QWERTY keyboard, the layout of which was designed to stop people typing too fast and jamming the mechanics of the original machines. This has long since ceased to be an issue, but the layout remains. And, even though the Dvorak keyboard produces results quicker, it's unlikely to become mainstream because QWERTY has become a self-perpetuating standard. In today's large vocabulary speech recognition systems, users have to pause between words. The pause is brief, but unnatural. Once again, the demands of the technology have had to take precedence over the desires of the user.
Each method of communicating with a computer has its strengths and weaknesses. A mouse or joystick can't easily be used for typing, and the keyboard or speech aren't ideal for manoeuvring the cursor. Most devices can be used for control, in addition to their main function of positioning or data entry. A mouse without a button or a keyboard without cursor controls would be ridiculously frustrating.
TOUCHY SUBJECT
Today's computers expect you to push buttons and move a pointer around the screen. A few devices, such as the virtual reality dataglove, are able to sense your position in space and the flexion in your fingers. One of the criteria for a successful user interface is that it shouldn't inconvenience users. At the moment, datagloves tether people to their machines and limit their ability to do other things with the gloved hand.
For certain tasks, this might be acceptable, but it probably rules out such devices for general use. The best devices are those which can be picked up and put down at will.
When pen technology first appeared, the pens were tethered by an umbilical cable which supplied power and received electrical signals from the pen's spring-loaded tip. Later, they contained batteries to make them freestanding. Modern pens, such as those manufactured by Wacom, pick up their electrical charge from the tablet and release it through a tuned circuit. The pen's pressure and location are captured by the sensitive tablet on which it's being used (see Wacom ArtPad). Some machines, such as PDAs (personal digital assistants), make the screens do all the work, so a fingernail or a chopstick could be substituted for the stylus.
Pens are best for freehand drawing and painting. They allow you to create excellent artwork on the computer screen, often far better than you could possibly achieve using conventional materials. Depending on the software you use, the pens afford access to a variety of paper textures, marking tools (pen, charcoal, paint, airbrush, for example) and special effects. They can also operate the screen controls with almost the same ease as the mouse. The only thing to watch out for is that for 'double taps' you have to hit the screen at more or less the same spot, although you can define an acceptable range in the number of pixels.
Clearly, a pen isn't much use when it comes to data entry, although, in the case of touch-sensitive screens, a dummy keyboard can be displayed. In fact, on a touch-sensitive screen, the pen has proved an agreeable editing tool. Circling text and running the line to the insertion point will usually be quicker using a pen than selecting text, cutting it to the clipboard, moving to the insertion point and then issuing a paste command. Admittedly, many word processors already allow selected text to be dragged and dropped to a new location. A similar technique has been used in design applications, which allows you to move objects from one place to another.
Most people associate the pen with handwriting recognition. This is a tedious and imprecise art and is probably doomed to the dustbin of history as soon as speech recognition improves to its true potential. Sure, if you write carefully and consistently, a product like Apple's Newton MessagePad will recognise much of what you write, but the process is slow compared with typing or speaking and should only be considered for tiny volumes of data. It's far better to capture the 'ink' image and store that with a few 'recognised' words to provide access through an index. Printing is much better from the point of view of recognition, but is hopeless if you're in a hurry.
Two other pressure-sensitive devices which are growing in popularity are the Alps GlidePoint and the Perex TouchMate. The GlidePoint is a small pressure-sensitive tablet which lets you use a finger instead of a mouse or trackball. Taps of the finger are used to select and move screen items or two buttons are substituted for the mouse buttons. A third button can be programmed to do your bidding. The TouchMate, from Perex, is a pressure-sensitive platform on which you mount your computer screen. Any touch on the screen is sensed by the platform and translated into cursor co-ordinates.
Most of these pressure-sensitive devices involve tapping and touching. One which requires no physical contact with the user is the simple camera. It silhouettes the user or parts of the user onto the computer screen, where the user's shadow interacts with on-screen objects. I've seen people playing drums and doing finger painting using such systems. When you see the whole rig, it makes some sense , but when you don't notice the camera and just see an individual cavorting around in front of their computer; it all seems most peculiar. On a smaller scale, these camera-based systems can be programmed to recognise hand gestures, such as pointing, clenching or splaying, but they're unlikely to catch on in a big way when so many simpler alternatives exist.
Virtual reality will undoubtedly hit the big time for games and for telepresence activities such as remote surgery, as well as for exploring virtual worlds such as those generated using CAD programs. Typically, the input is from a dataglove which senses finger movement, or from a hand-held device containing pressure sensors and control buttons. Users wear a headset containing stereo screens and speakers while holding the controller out in mid air. The positions and attitudes of the headset and controller are used to determine the user's view of the virtual scene (see Reality Hits Home). As computer speeds increase, the precision of these devices and the realism of the displays should improve beyond today's fairly crude level. At present, the resolution of the displays is limited to 750 x 240, so photo realism is some way off.
The most common control device, the mouse, comes in one- two- or three-button varieties. Cynics believe that manufacturers chose the number of buttons in order to differentiate themselves from their competitors. Sun has three buttons, Microsoft has two, while Apple has only one.
You can't do much with a single button, so the Apple mouse is arguably more instinctive to use, whereas the others require more thought on the part of the user. Unfortunately, Apple has weakened its case by using special keys on the keyboard to amplify the effects of its single button. This seems somewhat less convenient than having additional buttons on the mouse.
LOOK WHO'S TALKING
The benefits of speech recognition are undeniable. Most computer users can talk, therefore speech is a natural way of entering information to the machine and issuing it with instructions. Because speech needs no physical contact, it would be possible to enter data and control a computer from a distance - from the other side of the room, over a radio microphone link or over the telephone. And speech can be used while the hands are otherwise occupied.
Speech offers a good way of activating menus and executing macros. Say 'File', and you're presented with a menu of sub options. It's relatively simple for a speech recogniser to match what you say next with one of the few commands on the submenu. However, while speech is good for getting the computer to carry out big tasks, more detailed instructions, such as those needed when editing a document, are still better carried out with a physical device. It's clearly impractical to sit at the computer muttering 'up, up, up, right, right, delete'.
The main objection to speech as input is the irritation this would inflict on other people. Imagine sitting on an aeroplane next to a passenger who decides to 'write' a report by talking during the flight. Such behaviour would be even more antisocial than tapping a keyboard. Speech as a means of communicating with a computer is probably best done in privacy, over the telephone or in the company of others doing the same thing. Uniform background noise isn't usually a problem, though sudden loud noises and exclamations and expletives from adjacent colleagues could be picked up.
Speech systems come in a variety of flavours. Most of those that work in real time rely on words being uttered separately with clear pauses between each one. Some allow continuous speech, but have trouble at the joins. A classic example of how confusion arises is the phrase 'a grey tape', which could be interpreted as 'a great ape'. At least by separating the words, this confusion is avoided.
The other primary way to reduce confusion is for the recognition system to work in a particular context. A number of suppliers offer systems designed to recognise medical, legal, radiological and engineering terms. If a particular profession is sizeable, known to be anti-keyboard, or its users are generally busy with their hands, then it may be worth the suppliers' while to create a special vocabulary for them.
Some companies offer continuous speech recognition devices. These make the best they can of the slurs and blurs in speech, sometimes by doing the processing off-line. In any event, users are expected to do some heavy editing on the end result. The results are reckoned to be much worse than the work of an audio typist of average ability. By contrast, the discrete word recognisers can achieve accuracy of 95 per cent and beyond, once trained to a user's voice. The training involves reading specific texts to the computer for an hour or so (see IBM VoiceType Dictation for details of one company's approach).
Some systems, such as those used to deal with telephone queries automatically, need to be able to recognise anybody's voice. If it doesn't understand, it will pass the caller to an operator or drop down into a press-the-buttons-on-your-phone mode. Such a system will be designed to recognise very few words at any point in the dialogue and will often discard words such as 'please send me details of your' while waiting for the key words 'notebook computers', or whatever. AT&T calls this 'word spotting'. It then records your spoken name and address for a human audio typist.
Apart from text entry and direct control, speech lends itself well to the delegation of responsibility to the computer. You could say 'call Derek' and leave your machine to look up the telephone number, dial it, call an alternative number if there's no reply and let you know when a connection has been made.
Once speech over the phone can be recognised, you'll be able to issue instructions from a distance. You could, for example, dictate a fax and tell your machine to 'add a cover sheet and fax it to Derek'. The same could apply to email.
Of course, the computer doesn't understand what you say in the same way as a human does. It simply stores information or reacts to a small set of commands. The artificial intelligence researcher's goal of a computer that truly understands continuous speech in real time is a long way off, which means that dreams of the computer as a companion, as a working partner, still belong in science fiction - but we're getting closer.
IN THE PIPELINE
University and laboratory researchers the world over would like to invent the next mouse, and many experimental devices have been proposed. Bill Buxton is a musician and widely published user interface expert. He's convinced that two hands are better than one and has proposed a number of companion devices for the mouse to help overcome its limitations.
When drawing, for example, he believes users that should be able to move the electronic paper with one hand, while mousing with the other. He's experimented with touch sensitive strips and slider bars for paper positioning and proved that productivity can be increased by between 15 and 25 per cent. Nevertheless, his two-handed approach has so far remained a laboratory curiosity.
Apple wanted to supplement the mouse functionality in design applications. In a furniture arranging application, the goal was to move things to the right position in the room and then rotate them to the correct orientation. A thumb wheel mouse was invented which, as the name suggests, had a milled wheel built into its left side. When the issue of 'handedness' was raised, the thumb wheel was replaced with a roller at the front of the mouse. Again, this particular rodent has still not escaped from the laboratory.
Cy Endfield and Chuck Moore, the creator of the Forth programming language, both had the idea of using a chord keyboard. Combinations of keys are pressed simultaneously to give one-handed data entry. In the early 1980s, Endfield's Microwriter was a compact notebook-sized machine. Later, the design was used in the Agenda personal organiser from Microwriter. A mnemonic system made it easy to remember the keystrokes but, apart from a few fanatics, it never really, caught on.
Experiments have also been conducted into using eye movement to move the cursor. Unfortunately, our gaze tends to wander, especially while we're thinking. So, unless the computer responds in some trivial way to the direction of our gaze, this technology has the potential to cause havoc. One system was described at a conference on Human Factors in Computing Systems held in 1990. The 'Gaze Responsive Self-disclosing Display' overcame this problem by measuring the frequency with which the user's gaze fell on a particular part of the screen and using this as a guide to what interested the viewer most at that point. When a particular threshold was reached, the software took appropriate action.
The example system was based on a displayed model of the planet of The Little Prince story by Antoine de Saint-Exupery. Synthesised speech told the story, but if the child viewing the screen kept looking at a particular part of the display, the voice would shift context and talk about the part in which the child was interested. So the child could direct the flow of the narrative merely by looking at objects of interest.
Other experiments have taken two technologies and blended them in an attempt to come up with a solution greater than the sum of its parts. One famous system combined speech with gesture tracking. The user would point to an object on a display screen and say 'put that', thus identifying the object and issuing the first part of a command. Then the finger would point to a new screen position and the user would say 'there'. The selected object would then be moved. Another experiment is investigating how speech recognition can be improved by incorporating the ability to lip read.
These and many other experiments have helped shape the thinking of the user interface experts. For every successful discovery, tens, maybe hundreds, are consigned to the scrapheap. The most successful user interfaces will, not surprisingly, be those that build on our natural abilities and past experiences. Don Norman is a Professor Emeritus at San Diego State University, an Apple Fellow and the author of Things That make Us Smart and The Design Of Everyday Things. He's much respected in human interface design circles, so perhaps it's fitting to give the last words to him.
He told PCPro: 'Sure, some marvellous new input device would be wonderful. It's the Holy Grail of Human-Computer Interaction. Problem is, there aren't any easy answers. People communicate among themselves by speech, gesture, body posture, as well as by written and printed material, drawings and music. Computers can barely type. They certainly can't understand language or emotions (body language and facial expressions). Your dream of a superior input device is a worthwhile one: I share the dream. Meanwhile, what do I recommend? - Learn how to type.'
IBM VOICETYPE DICTATION
IBM has been experimenting with speech technology for 22 years. Systems which once required huge computers can now run on the PC providing, of course, it has 12 or 16Mb of memory, a decent processor and hard disk space in three figures.
To train the IBM Voicetype Dictation system you read it 150 sentences through either the supplied head-mounted microphone or your own. The training environment needs to be similar to that of the working environment. All the recogniser card 'hears' is digitised sound, comprising background noise plus the speech itself. The speech card becomes familiar with the background noise and subtracts it from the speech recording before it's processed. Once compressed, the speech models fit on a single floppy disk and can be used on any machine running VoiceType. At the moment, the sampling frequency is 11 kHz, which is too high for reliable use over the telephone. IBM promises to bring this down to the telephone system's 8 kHz.
The IBM program can be used for command and control of the machine and its applications, as well as for its primary purpose of dictation. IBM claims accuracy immediately following training is 85 per cent, rapidly rising to around 95 per cent as the software refines its understanding of your accent and reams new words. The average person uses around 5,000 words regularly and the system has a basic vocabulary of 20,000, to which you can add specialised vocabularies to a maximum of 34,000 words in total. IBM sells professional vocabularies, starting with medical and legal, to extend the system. For instance, a Radiology supplementary dictionary costs £499.
A card in the computer (ISA, MCA or PCMCIA) digitises your speech and breaks it down into the 44 phonemes of the UK English language. These are then matched against your speech characteristics and the vocabulary to make a first guess at the word you uttered and display it on the screen. As succeeding words are uttered, the system repeats the process, but also starts up a statistical assessment system which weighs the likelihood of the current word being the right one in the context of the two preceding words. As you continue speaking, the words on the screen are, unnervingly, being altered. The final process is a grammar checker which discriminates between words that sound similar -'to', 'too', or 'two', for example.
Once a text is completed, it takes a few seconds for the software to catch up, then you need to check the accuracy. A common error results from forgetting to pause between two words, in which case, a typed or spoken correction is adequate. If the word was spoken correctly but guessed incorrectly, the recognition engine needs to be notified. A list of other words that closely match the original utterance are displayed and the user picks the correct one. If the word isn't there, it has to be typed. In the case of initials, such as 'AT&T', the software may never learn this as a word and may keep throwing it up as 'eighteen eighty'. An advanced correction menu instructs the computer to recognise the individual parts of the expression 'A T and T' and to store it in the vocabulary as 'AT&T'. This dialogue box can be used to create macro expressions which the software expands into sequences of words or commands. The system costs £775 for the ISA/MCA version and £855 for the PC Card version.
REALITY HITS HOME
Virtuality Group is a world leader in the design and manufacture of virtual reality systems for arcade use, and will soon reach businesses and homes through an agreement with IBM. Although different in appearance and robustness, both public and private systems use similar technology. One day, private systems may be freed from the constraints of being tethered, but this is unlikely to happen with public use systems because it prevents the equipment from being stolen.
The main components of a Virtuality system are a headset, a joystick and four types of controller card. These plug into ISA card slots in either an IBM 486 PC or a 19in rack-mounted industrial 486 machine. The processor itself does little more than manage the flow of signals across the bus. The four cards are where the real work takes place. These handle the real-time graphics, video output, a tracking board and a card that handles the delivery of sound and video to the headset. The position and attitude of the hand and head are measured electromagnetically using InsideTrack from Polhemus Technologies.
The headset and the joystick each contain a sugar cube-size bundle of three coils, set at 90 degrees to each other. A larger version of the coils is used to generate low power oscillating magnetic fields with a radius of 5ft. The smaller coils act rather like aerials, sending field intensity signals down the cable to the tracking board. The result is a system which knows the attitude (pitch, yaw and roll) and position (x, y and z) of the user's head and hand in 3D space.
The headset can receive mono or stereo graphics, although stereo requires extra processing cards. The joystick contains four pressure-sensitive finger points and a pair of controller buttons. Users of the IBM version will be able to exchange the button controls for a roller ball and an encircling button. A development system with 400MIPS graphics processing is likely to cost in the region of $50,000. Virtuality Group is on 0116 2542127.
WACOM ARTPAD
The Wacom ArtPad pressure-sensitive graphics tablet has a sensitive area of 191 x 1 75mm (see PC Pro, issue 3, page 130). It comes with a cordless, batteryless pen which can draw with an accuracy of +/- O.5mm and can sense changes in pen position as small as 0.01 mm. Its tip has a maximum travel of 40 microns. This gives the firmness and control of drawing with solid tools such as pencil or chalk, while providing the flexibility of an airbrush or felt marker. It plugs into a serial port on the PC or into the Apple Desktop Bus.
Under the surface of the tablet lies a printed circuit board with long single loop coils, pitched at 6.4mm and arranged in the x and y axes. These overlapping coils are energised in sequence by an alternating current. The pen contains a coil and a capacitor. When the pen is near the tablet, this circuit is energised, acting as a miniature transmitter. Wacom calls this process its 'Give and Take' system. The pen tip rests on the capacitor, which is slightly compressible. As the pen is pressed on the tablet, the capacitance is altered and the phase/frequency of the transmission changes. This makes the surface appear to be pressure sensitive.
The tablet switches back and forth between transmission and reception modes. When receiving, it reads out signals induced in the long coils. It also measures any phase shift caused by the squeezing of the capacitor. This way, it determines the exact pen location and the pressure being applied by the user. The Tablet costs £159 and is distributed by Letraset on 0171 928 3411.