Hawk Eye
D:RobertResearchExpertisehawkeye final submission. doc 14 November 2011 17:03 You cannot be serious! Public Understanding of Technology with special reference to `Hawk-Eye. ’ Harry Collins and Robert Evans An edited version of this paper will be published by Public Understanding of Science 17, 3, July 2008. Public understanding of science, though it approaches the specialist knowledge of experts only in rare circumstances, can be enhanced more broadly in respect of the processes of science and technology.
The public understanding of measurement errors and confidence intervals could be enhanced if `sports-decision aids’, such as the Hawk-Eye system, were to present their results in a different way. There is a danger that Hawk-Eye as used could inadvertently cause naive viewers to overestimate the ability of technological devices to resolve disagreement among humans because measurement errors are not made salient. For example, virtual reconstructions can easily be taken to show `exactly what really happened’.
Suggestions are made for how confidence levels might be measured and represented and `health warnings’ attached to reconstructions. A general principle for the use of sports decision aids is put forward. A set of open questions about Hawk-Eye is presented which, if answered, could help inform discussions of its use and accuracy. Keywords: Sports decision aids; Hawk-Eye; public understanding of technology; simulations; cricket; tennis 1. Introduction At the heart of the debate about public understanding of science is the relationship between esoteric knowledge and ubiquitous knowledge.
The debate turns on the extent to which the ubiquitous knowledge alone is enough to make judgements that touch on esoteric knowledge. The discredited `deficit model’ held that public discomfort with science and technology was caused by a lack of specialist knowledge: if only the public could share in the specialist knowledge of science then their values – in respect of such things as the desirability of new technologies – would align with those of the scientific community.
This was obviously incorrect since, apart from the difficulty of making the esoteric knowledge of specialists available to non-specialists, scientists themselves – who, by definition, understand more science than anyone – disagree about both truth and values. Social studies of science showed that science was itself shot through with ordinary thinking and decision-making: science is not an automatic procedure for generating either truth or uniform opinions – ordinary human judgment lies at its heart.
For the kind of science that is likely to cause debates in the public domain, this means that ordinary people’s judgment is not dissimilar to scientists’ judgment in some respects. This finding has given rise to a widespread tendency in contemporary science and technology studies (STS) to treat the public as more or less continuous with experts when it comes to technological decision-making in the public domain. [i] More recently this approach has been subject to a new challenge, concentrating on differentials, not in access to the truth, but in expertise and experience (Collins and Evans 2002, 2007).
This paper takes the position on public understanding that has been put forward in books such as The Golem series (Collins and Pinch, 1993, 1998, 2005). The position is that, first, there is a deficit in public understanding of the technicalities of science and, second, there is a deficit in understanding the processes of science; while not much can be done about the former, the latter can be, to some extent, remedied. The Golem series was intended to improve public understanding of scientific processes. Understanding the degree of certainty that attends any finding or claim is part of understanding the process of science.
Thus The Golem at Large (Collins and Pinch 1998) pointed to the uncertainty surrounding technologies that had been presented to the public as fully understood — such as the Patriot anti-missile missile and the space shuttle. [ii] Here we analyse a case of technology in the public domain which, like Patriot and the space shuttle, can give the impression that its outcomes are not subject to the errors that are always associated with measurement and prediction. Where the Hawk-Eye system differs from these cases, however, is in its impact and relative accessibility.
Because Hawk-Eye and similar technologies are becoming an increasingly common part of televised sports coverage then making their capacities and technological limits more clearly visible in their day-to-day use has the potential to promote, across a wide section of society, a more nuanced and widespread understanding of the statistics of uncertainty. To teach a wide public about the nuances of statistical decision-making under uncertainty may seem over-ambitious in the light of studies that show that, on the whole, the public have a poor grasp of statistical inference. [iii] Nevertheless, there are grounds for optimism.
We know that, when groups of the public have a burning interest in issues involving mathematical or statistical reasoning, high levels of understanding can be attained. Thus, Lave (1988) showed that the public are good at arithmetical calculations when it comes to supermarket transactions or weight-watching diets; gamblers with no significant education can develop a remarkable grasp of the statistics related to their passion; and some sports fans have a thorough knowledge of the statistics of their favoured team’s performance and how these relate to the performance of other teams. iv] The `circle is squared’ because participation is the key to gaining knowledge and the public become participants, or quasi-participants, in the statistical and mathematical domains in question rather than remaining mere onlookers. We propose that this potential for a still more widespread understanding of the domain of statistical uncertainty should be actualised through the presentation measurement errors and confidence intervals in the outputs of automated sports decision aids.
As the most casual observations make evident, committed sports fans watching television act as quasi-participants in the decisions made by umpires and referees! 2. Method The substance around which this paper turns is an analysis of technical aids designed to supplement or replace the decision-making of cricket and tennis umpires, football referees and the like. The device known as `Hawk-Eye’ is used as the principal illustrative example as it is most well known of the commercial systems and, as we discuss later on, it is currently being used to make decisions in major tennis competitions.
In addition, because it has been used for several years it has a well developed website and has been the subject a range of media coverage and one or two published articles. In what follows, nearly all the material we use in the analysis is drawn from such sources and so can be readily checked. Aside from some initial inquiries we were unable to obtain significant information directly from Hawk-Eye Innovations.
As explained below, we discovered toward the end of the analysis that a number of our questions and proposals had already been put by contributors to newspaper websites. We cannot find any detailed response to the newspaper queries either, suggesting that our experience is not unrepresentative. To gain as much information as we could, we used the major search engines and data bases systematically to search popular websites and the academic literature.
Specifically, three people spent a total of about 15 hours searching Google and Google Scholar (we looked at the first 20 pages that were returned) plus Web of Science (there were only three, irrelevant, hits) for articles relating to Hawk-Eye (spelt in various different ways). [v] Of the information we uncovered, the most useful was on newspaper websites, a discussion site called `Cricinfo’ and on Hawk-Eye Innovations’ own website.
We also examined the original patent application, which is available online from the European Patent Office (reference number WO 01/41884). We also discovered an article in Scientific American (Fischetti 2007) and a recently published analysis of line calls in tennis (Mather 2008). We found only one article about Hawk-Eye in an engineering journal but it was very hard to access in both senses of the term:[vi] it has been referred to in the main body of the paper and we attempt to explain its contents more fully in the paper’s final note.
We do not know how this paper, which reports development of a device under contract with Hawk-Eye Innovations, bears on the technology which is currently used but, in any case, no substantive difference would be made to the analysis if it included more reference to this paper unless it is true, as the paper implies, that Hawk-Eye takes its television feeds from existing TV network cameras; this would indicate lower camera frame rates than are discussed in this paper. [vii] Reliance on public domain data means that our information about how Hawk-Eye works is not complete.
We do our best to warn about the gaps in our knowledge by inserting throughout the paper the symbol, `{caveat n}. ’ The `n’ part of the symbol is a number indicating the particular pieces of missing information. These are recast as a series of questions in note 8, each cross-referenced to the paper by the number. [viii] The questions in note 8 are a subset of those we would have explored had our research project involved the kind of interactions with scientists and technologists that have supported our many case studies of science and technology over the years.
Note 8 lists, then, a series of open inquiries that others might like to pursue. 3. Hawk-Eye and measurement error To repeat, as it is currently used we believe that Hawk-Eye could inadvertently mislead the public about the degree of certainty that can be brought to a scientific measurement. We are not the first to notice this. For example, contributors to both the Timesonline and Daily Telegraph websites in July 2007 questioned whether Hawk-Eye can measure the position of a fast moving tennis ball to the nearest millimetre as is implied by some of its reports. ix] We, however, suggest that the potential for misunderstanding Hawk-Eye’s capabilities could be obviated by incorporating information about measurement error into the presentation and we develop the concerns of the contributors to the websites in a more systematic way by linking them to the concept of measurement error. Thus, humans and machines make errors of two kinds. `Systematic errors’ are repeated errors that have a similar effect each time; the causes of such errors can sometimes be understood and their impact can be predicted and compensated for. Random errors’ cannot be predicted except that their typical size, and the shape of the random distribution, can be estimated. They cannot be compensated for but they can be taken into account in assigning a degree of confidence to a measurement. Decision aid devices are often presented as though it must be a good thing if they make decisions that are more accurate than humans but we show that `more accurate’ can mean different things. To know what is meant by `more accurate’ both random errors and systematic errors in humans must be taken into account.
Such a consideration gives rise to what we call the `Automated Decision Principle,’ which is a proposal for how sports decision aids should be used and turns on consideration of both kinds of error. 4. What are sports decision aids? In sport, decision aids are meant to help referees and umpires make decisions. Sometimes the action is very fast, or the official’s view is obscured, and this makes it hard to make a good judgment; in these cases decision aids can help.
A well-known and relatively uncontroversial example is the television replays used by referees in rugby union to determine whether a ball has been fairly touched down so as to merit the award of a `try. ’ It could be said that a much simpler and much older aid is the string net which hangs beneath a basketball hoop; the purpose of this net is to slow and mark the descent of the ball so it is easy see that it has passed through the hoop.
Other such devices used in cricket to detect whether the ball has touched the bat include `Snicko,’ which detects slight sounds, and `Hot Spot’ which uses heat-sensitive cameras to indicate contact; in ice hockey the puck and the line interact electronically to indicate whether a goal has been scored. Here we examine more closely an innovatory design of decision aid utilising artificial intelligence techniques and known as `Hawk-Eye. ’ It is regularly used in cricket and tennis though in the former case it is television viewers alone who benefit from its contribution. x] 5. The technology of Hawk-Eye Hawk-Eye, as we understand it is a video processing system combining a number of cameras and a computer to store and process the data. We believe that the cameras – the patent application specifies six but also acknowledges that not all cameras will produce usable data – track the flight of the ball and that these camera feeds are then used by the computer to reconstruct the trajectory of the ball by analysing the pixels in each frame of each relevant camera feed. xi] The field of play is also modelled within the system, as are some of the rules relating to the game. By combining the trajectory of the ball with the model of the pitch and the database of rules, the path of the ball can be reconstructed against the background of the main features of the playing area as a virtual reality and a decision given (e. g. should the batsman be given out in the case of cricket or, in the case of tennis, did the ball land inside or outside the line). The reconstruction can be shown to television viewers.
It is this representation rather than the internal technicalities of Hawk-Eye that are of main interest when considering the public understanding of science. The cameras will have a limited frame rate and the position of the ball between frames has to be interpolated by the software. In the patent application, this process is not described in detail but it is clear that the analysis is statistical, with predicted and actual paths being compared in order to generate the final trajectory. The explanatory framework of this paper adopts certain simplifying conventions that, we believe, do not affect the principles of the argument.
In this we essentially follow Hawk-Eye’s own practice in describing its operation: its website presents the device as calculating a trajectory from a series of images of the ball captured in television frames. [xii] We are, then, unconcerned with the exact statistical algorithm used in the interpolation of the path of the ball. We take it that whatever the algorithm the accuracy of the final result has an upper limit set by the accuracy of the data. We take it that the crucial data concerning the path of the ball consist of a series of `data points. In the first instance information is generated from camera frames containing certain pixels which are taken to signify the ball and other pixels which are taken to signify the line, wicket, or whatever is needed to locate the ball-pixels in the space of the playing area. When we refer to a `data point’ we mean a three dimensional reconstruction of the position of the ball which will involve information from two or more cameras. The idea of a data point serves the analysis adequately whether or not such data points are ever actually constructed as the Hawk-Eye processor runs.
When we refer to `frame rate’ we are talking about individual cameras. It might be that the `effective frame rate’ of Hawk-Eye is higher than this because it combines frames from more than one camera to establish the trajectory of the ball in any one dimension. We simply do not know. Therefore we assume `effective frame rate’ is whatever frame rate we find mentioned in our sources. Any rough calculations we make are based on these seemingly plausible assumptions but they are open to correction {caveat 1}.
In our language, then, each effective frame provides a single `data point’ and we would expect the reconstructions to be more accurate with higher frame rates as this would minimise the distance between data points. In addition, having more data points would also allow more complex curves to be fitted. Thus, two data points could provide information that can be used to infer the straight-line direction and velocity of the ball (subject to errors). For trajectories that depart from a straight line, more data points would be needed {caveat 2}. [xiii]
In some circumstances, Hawk-Eye projects a hypothetical trajectory beyond its last data point. One such case is the `leg-before-wicket’ (lbw – see below) decision in cricket in which the continued trajectory of the ball beyond the last data point is projected forward and then represented on a virtual reality display that shows the television viewer what ‘would’ have happened had the ball not hit the batsman. We start with the lbw situation because the technicalities (if not the game) are relatively easy to understand and it was lbw decision that gave rise to our initial questions.
The second part of the technical analysis concerns the tennis line-calling system. 6. The lbw rule and the Hawk-Eye system Cricket is a complicated game. We must explain some of the rules here on the assumption that not every reader of this journal will know them. We will, however, assume that readers who are not familiar with cricket will know baseball. Cricket is like baseball in that it involves the equivalent of a pitcher and a batter, known, respectively, as the `bowler’ and the `batsman. [xiv] The bowler `bowls’ the cricket ball to the batsman and, as in baseball, the batsman tries to hit it. Unlike baseball, there is no limit to the number of balls the batsman may receive – on a good day a batsman may face hundreds of balls before being out. In international matches, one game may continue for up to five days. As in baseball, there are a number of ways of being out, such as when one of the `fielders’ catches the ball before it hits the ground. In cricket the batsman stands in front of a `wicket’ (otherwise know as `the stumps’) that he has to defend with his bat.
If the ball hits the wicket, the batsman is out – there is no equivalent in baseball. The wicket is a set of three vertical sticks or `stumps. ’ The wicket is 28 inches high and 9 inches wide overall. The top of each stump has a shallow groove cut at right angles to the direction from which the ball is coming; two smaller sticks, known as `bails,’ are carefully balanced in these grooves, the ends of the two bails touching each other where they meet in middle of the groove cut in the central stump (see Figure 2, below).
The working, and universally accepted, definition of `the ball hitting the wicket’ is that one or both of the delicately balanced bails fall to the ground – the wicket must be `broken. ’ On very rare occasions a ball grazes the stumps, or rolls very gently against them, but no bail falls; in such a case the batsman is not out because the wicket has not been broken. In cricket, the bowler nearly always directs the ball in such a way that it hits the ground before it reaches the batsman and it usually then bounces toward the batsman’s legs.
The batsman wears a `pad’ to protect each leg. Each pad is an armored sheath running from ankle to just above the knee. The ball is very hard, about as hard as wood at the beginning of the game, though it begins to soften slightly as the hours pass (the same ball is used for many hours before it is changed). The ball can sometimes be bowled at more than 90 mph. Allowing the ball to hit the pads is an integral part of the game. Clearly, the batsman would never be out if he simply stood in front of the wicket, kept his bat out of the way, and allowed the ball to hit him or his pads. xv] To make that impossible the notoriously complicated `lbw rule’ says that a batsman is out in certain restricted circumstances if the pads alone stop a ball that would otherwise hit the wicket – this counts as out in virtue of `leg before wicket. ’ In the normal way, the umpire, who stands at the point from which the bowler bowls the ball, is the sole judge of whether the ball (a) falls within the restrictions and (b) would have gone on to hit the wicket. The question that concerns us here is the decision about whether the ball would have gone on to break the wicket if it had not hit the batsman’s pads. [xvi]
One of the earliest uses of Hawk-Eye was to project the path of the ball after it hit the batsman’s pad in an attempt to judge whether it would have gone on to hit the wicket. Figure 1 is a two-dimensional schematic version of the situation from side-on. The ball, traveling from left to right, bounces and then hits the batsman’s pad-protected leg. The dotted portion of the trajectory is what has to be judged or estimated. Television viewers see a 3-dimensional virtual reality representation of the projected path of the ball against a virtual cricket field and they can see it either hitting or missing the wicket.
For a number of years after the introduction of Hawk-Eye, cricket commentators would simply remark on what Hawk-Eye showed on the screen, giving the impression, perhaps inadvertently, that the virtual reality represented exactly what would actually have happened had the pad not been struck. This is where our analysis of Hawk-Eye begins. [pic] Figure 1: A two dimensional schematic of a potential lbw situation A cricket ball is not uniformly spherical. Around its `equator’ it has a raised seam and the two `hemispheres’ become more asymmetrical as the game goes on.
The trajectory of the ball after it hits the ground can vary enormously. The bounce depends on the speed, the hardness and texture of the ball – which changes during the game, the state of the ground at the exact point of the bounce, the spin on the ball and the position of the seam. The `swing’ – which is the aerodynamically induced curve in the flight of the ball, which can be in any plane — depends on the ball’s speed, its spin, its state, its orientation, the orientation of the seam and the state of the atmosphere.
As a result, what happens to the ball after it bounces is not going to be fully predictable from its pre-bounce trajectory so that, as far as we can see, Hawk-Eye has to estimate the post-bounce trajectory largely or entirely from post-bounce behavior of the ball {caveat 3} for which it can gather data between the bounce and impact on the pad. This certainly seems to be the implication of the claim made by Paul Hawkins, the Director of Hawk-Eye Innovations, in response to a criticism of Dennis Lillee, the Australian fast bowler: … Hawk-Eye simply observes and then calculates the actual trajectory of the ball.
Whether the cause of this trajectory was due to atmospheric conditions, the wicket, or the ball hitting the seam is irrelevant from a Hawk-Eye perspective. Hawk-Eye just tracks what happened – it does not try to predict nor to answer why it happened. So, if the ball rears up unexpectedly after hitting the seam or a crack on the pitch, Hawk-Eye will track the trajectory off the pitch to predict the future course of the ball. Similarly, the tracking system will come into play if the ball shoots along the ground after hitting a dry spot on the pitch. (Hawkins quoted by S. Rajesh `Give Hawk-Eye a Chance,’ on Cricinfo website, December 18 2003. [xvii] Our concern in analyzing what Hawk-Eye can do is to understand more fully what it means to ‘track’ and ‘predict’ the path of the ball. Predictions are extrapolations and the accuracy of these extrapolations is limited by, among other things, the quality of the data. No measurement is ever exact. Heisenberg established this as a deep principle of physics with the `uncertainty principle,’ but here we are talking of macroscopic measuring processes such as are discussed by, say Thomas Kuhn in his 1961 paper on measurement and, of course, by physicists and most other scientists as a matter of ordinary fact in their day-to-day work.
As a result, it is normal in science to associate a measurement with an estimate of its potential error. A decision is not a measurement. A decision is binary like the `guilty/not guilty’ decision of an English jury; in cricket the batsman is either ‘OUT’ or ‘NOT OUT’. The process of what we will call `digitization’ is used to turn inexact measurements into discrete decisions. In most sports, the referees or umpires are the people who do the digitization and what we are discussing here could be described as technical aids to digitization.
The bails in cricket are one such aid to digitization. As discussed above, it can sometimes be difficult to tell whether the ball has touched the wicket or not and the falling of a bail converts this uncertainty into one of two discrete possibilities which have merely to be `read off’ by the umpire. In the case of lbw decisions the bails can’t help because the flight of the ball has been stopped before it gets to the wicket and so the wicket is never broken.
Instead, the umpire has to judge whether the bails would have been dislodged had the ball continued on its trajectory and passed ‘through’ the batsman instead of being stopped by his legs. Here Hawk-Eye’s reconstruction and extrapolation appears to have the potential to take on the role of the bails by showing what would have happened. In practice, however, Hawk-Eye’s digitization cannot correspond exactly with that of the bails in all cases because there are some circumstances in which Hawk-Eye will not be able to predict with certainty whether the ball really would break the wicket if it had not hit the batsman’s pad.
To take the extreme circumstance where the ball just touches the wicket, whether the bails would fall would depend on how firmly they are sitting in their grooves, how rigidly the stumps are held in the ground (the tapered part at the base of each stump is pushed into the turf to hold them upright), or whether the bails and stumps are wet or dry. Again, the ball is not exactly spherical so whether it would cause a bail to fail can depend on its orientation as it passes the wicket.
We assume that predicting all these things is beyond the capacity of both Hawk-Eye {caveat 4} and human umpires, but all we are doing in this paragraph is establishing the principle of the imperfection all such measurements. The usual way of coping with measurement error in experimental science is to report a confidence interval of the errors. The width of the confidence interval is function of two things: the confidence level which is chosen by the experimenter and the dispersion of the distribution of errors.
If the dispersion of errors is known, then each prediction can be associated with a confidence interval defined by a chosen level of confidence such as 95% or 99%: the first would mean that it is estimated that there is only a 5% chance that the error is greater than the outer limits indicated by the 95% confidence interval; the second would mean that there is only a 1% chance that the error is greater than the (wider) limits indicated by the 99% confidence interval. [xviii] To anticipate, we have not found any detailed indications of the dispersions of Hawk-Eye’s errors in the public domain.
The nearest we could find are the following quotations from Paul Hawkins when interviewed by S. Rajesh on the `Cricinfo’ website: … Hawk-Eye has shown that balls pitched on roughly the same area on the wicket have passed the stumps at widely varying heights. And in tests conducted, thousands of deliveries were bowled from a bowling machine and filmed by Hawk-Eye. The camera feeds were cut about two metres from the stumps, approximately the point where the batsman would normally intercept the ball. When the ball hit the wicket, Hawk-Eye was able to determine, to within about 5 mm, the point of impact. “Hawk-Eye requires between 1 to 2 feet of travel after the ball has pitched to be able to accurately track the ball out of the bounce (this is significantly less than an umpire requires). In instances when this does not happen, a Hawk-Eye replay is not offered to TV. (18th December 2003)[xix] … in most cases Hawk-Eye’s output is accurate to within five millimetres in predicting the path of the ball. The accuracy levels are highest when the ball has travelled a fair distance after pitching, but even when the point of contact is very close to the pitch of the ball, the accuracy levels are still within 20mm. (13th June 2006 )[xx] We don’t know if the developers of Hawk-Eye have attempted to estimate the dispersion of their errors in a way that is more exact than is indicated in the above quotations but we have not found any such report {caveat 5}. As the quotations intimate, the size of the error will be affected by the length of the bounce-to-pad trajectory but it will also depend on other factors. The bounce-to-pad trajectory is not of fixed length because the position of the bounce is variable and the batsman can move forward or back.
The length of the projected pad-to-wicket trajectory also depends on where the batsman is standing when struck. The accuracy of Hawk-Eye’s estimates are likely to have a direct dependence on the distance between the bounce and the moment of impact on the pad (the longer the better) and an inverse dependence on the distance between the impact on the pad and the wicket (the shorter the better) {caveat 6}. In the case of the human umpire making an lbw decision it is acknowledged that the accuracy of the judgment is affected by how close the batsman is to the wickets when the pads are struck by the ball.
If the batsman whose pads are struck is well forward in his stance then he is rarely given out. In this way, human judges deliberately introduce a systematic error into their judgments that favors the batsman — the so-called `benefit of the doubt’ rule. The importance of this rule will become clear later. None of the information that we have found in the public domain gives any indication of overall dispersion of Hawk-Eye’s errors and the only indication about how the error increases with short bounce-to-pad trajectory that we can find is given in the quotation above.
We can find no systematic information about how the size of the error relates to position at which the ball bounced, the speed of the ball, the length of pad to wicket trajectory, the length of bounce to pad trajectory, the degree of spin, the degree of swing, the nature of the pitch surface, and the nature of the atmosphere. Although the patent application does acknowledge that the distance of the camera to the ball is important (e. g. frame sizes are set to make the ball appear as large as possible) and that the position of the sun matters (e. g. t is important to distinguish the ball from its shadow) it does not tell us how variations in these or other parameters affect the accuracy with which the position of the ball can be tracked. Unfortunately, the importance of accounting for these errors is greatest when the decision is most difficult. Whenever the projected point of impact is well away from the edge of the wicket there is unlikely to be any real doubt about the ‘correct’ decision and so Hawk-Eye’s errors are probably `hardly worth reporting. ’[xxi] Where the judgment is much more difficult — for both human and machine — then the potential error becomes crucial.
To understand the issues it is necessary, first, to acknowledge that there is a distribution of measurement error and, second, describe the characteristics of this distribution (its mean and dispersion). At worst these should be described for the general case but it would be much better to provide separate analysis for each of the main conditions that can affect the accuracy of the prediction. For example, it would be nice to know the dispersion of errors associated with fast balls and slow balls, with different lengths and ratios of the various trajectories and, perhaps, with different condition of the pitch, the atmosphere, and the ball. e. g. is Hawk-Eye likely to make bigger errors in conditions when the ball tends to swerve as it travels through air after hitting the wicket? The frame rate of the cameras will affect the accuracy of the prediction. Though the technical article referred to above suggests that Hawk-Eye takes its feed from standard broadcast cameras (which would have a frame rate of around 30 per second) we will assume that the frame rate is 120 fps, as reported in 2004 on the Cricinfo website {caveat 2}. [xxii] In this case, if the ball is traveling at 80mph, it would travel one foot between frames.
This would make sense of Hawk-Eye’s claim to need between one and two feet to make a prediction. In the worst case scenario, a ball traveling at 120ft per second with a frame rate of 120 per second, would need a minimum of two feet to provide three data points which should be enough to calculate some kinds of curved trajectory (subject to errors) {caveat 7}. Fortunately this lack of sure knowledge of frame rate and method of calculation does not affect the general principle of the argument {see caveats}.
How error could be measured in the case of lbw We do not know if data from the tests involving `thousands of balls’ (see above) have been preserved {caveat 8}. If they have they might give an initial indication of the distribution of errors. The beginnings of a more complete analysis could be made if, in a test like this, the cut-off point for the camera feed was systematically varied and the bounce point of the ball was systematically varied so that a more complete range of potential errors was analysed.
In a still better test these parameters, and other parameters that affect the post-bounce behavior of the ball, would be recorded and measured so that the likely size of error could be reported for different circumstances. Some of this may be beyond the ability of Hawk-Eye to measure but if it is then it could be clearly stated {caveat 8}. Our own preference would be as follows. The error should be estimated from empirical tests either in the way suggested above or in some other way. Reports on the method of testing and its degree of completeness should be made readily available.
Subsequently, using whatever knowledge of dispersion of measurement error was available, confidence levels would be associated with Hawk-Eye reconstructions in real time {caveat 9}. The graphic shown to television viewers could be adjusted either to show something like an `error-bar,’ or `error circle,’ around the projected position of the ball, or to indicate it in some other way such as numerical confidence level for the prediction that wicket would have been struck/not struck. Figure 2 (the putative errors are not drawn to scale), indicates some of the possibilities though no doubt it could be improved upon. [xxiii]
If such changes were implemented, commentators might remark, `Hawk-Eye was 99. 9% sure the ball was going to hit the wicket so the umpire was right,’ or `Hawk-Eye was only 90% sure the ball was going to hit the wicket – the umpire should not have given it out,’ or some such. This way, not only would Hawk-Eye’s abilities be presented in a clearer and less easily misunderstood way, but the technology itself could fulfill a valuable role in educating the public about the way uncertainties are turned into decisions. What we are suggesting is not much more than is mentioned in the Hawk-Eye patent that we have found.
There it is claimed that the HIT-MISS decision made by the apparatus is based on: `whether the probability of the ball going on to hit the stumps is high e. g. above a given probability threshold (p8). ’ We are asking for a more complete explanation of what the threshold is, what the probability is on any occasion, and for this information to be offered to the public. [pic] Figure 2: Some possible ways of indicating Hawk-Eye’s possible measurement errors[xxiv] 7. Tennis line calls In tennis, unlike cricket where it is used only to enhance television coverage, Hawk-Eye is now being used to take decisions.
In high level tennis tournaments players can make `challenges’ to the umpire’s decisions and, if the player’s challenge is supported by Hawk-Eye, the original decision is over-turned. In the summer of 2007 Hawk-Eye figured in at least three disputed line calls in which the ball was called OUT but, after a challenge from one of the players, was subsequently called IN by Hawk-Eye. Two of these are worth reporting in detail as they bear on the argument that follows: Disputed line call A — Dubai As reported in the Gulfnews website (March 03, 2007), in a match in Dubai etween Nadal and Youzhny, a challenge made by Youzhny was supported by Hawk-Eye: World No 2 Rafael Nadal has questioned the efficiency of the new Hawk Eye line calling technology. Thursday’s first set between Nadal and Youzhny ended in a controversy with the tie-break score at 6-5 in favour of the Russian. Nadal thought a ball from the Russian had landed wide. So the Hawk Eye was pressed into service and it showed the ball had skimmed the line. But Nadal, chair umpire Roland Herfel of Germany and even Youzhny believed that the ball had landed wide after watching the Hawk Eye.
But officials are bound to accept the Hawk Eye ruling. “The mark of the ball was still on court and it was outside. But in the challenge it was in, so that’s unbelievable. The Hawk Eye system is not perfect,” fumed Nadal. “I told the chair umpire: ‘Look, the ball is out’ and he said: ‘I know’. … Even Youzhny agreed the ball appeared to have gone out. “I saw the mark, but I just challenged because it was a very important point,” the Russian said. [xxv] Disputed line call B — Wimbledon The technology was also central to a disputed line call in the Wimbledon men’s final between Federer and Nadal.
Nadal hit a ball which appeared to television viewers, to the umpire, and to Federer as impacting well behind the baseline, but Hawk-Eye called it IN. Federer appealed to the umpire but the umpire accepted the Hawk-Eye judgement. The following is the account from the Daily Telegraph newspaper website. The story is dated 10/07/2007. Federer, a tennis conservative, has always been against the introduction of Hawk-Eye, and he was as angry as he had ever been on Centre Court when an ‘out’ call on one of Nadal’s shots was successfully challenged by the Spaniard in the fourth set.
The Hawk-Eye replay suggested that the ball had hit the baseline; Federer thought otherwise. It was then that Federer asked umpire Carlos Ramos whether the machine could be turned off. Ramos declined but also seemed to suggest that he had thought the ball had landed long. The Hawk-Eye review gave Nadal a break point, which he converted for a 3-0 lead, and Federer continued to complain during the change of ends. “How in the world was that ball in? S***. Look at the score now. It’s killing me, Hawk-Eye is killing me,” the Swiss said.
So, a system which was introduced to prevent McEnroe-style rants at officialdom actually left one of the sport’s gentlest champions fuming. [xxvi] Technology and tennis In some ways the technical aspect of the tennis case is easier than the lbw case because Hawk-Eye has more data points to go on: it can follow the ball’s trajectory right up to the point of impact and, sometimes, beyond. (Though we do not know if it uses post-bounce data points in its calculations {caveat 10}). Given that no combination of cameras provides infinite frame rate the computers still have to project forward (and back? to generate the virtual trajectory. Again, we don’t know frame rates so we don’t know how much projection has to be done. Tennis balls can be served at up to 150mph (c. 220 feet per second) so in this respect the problem is worse for tennis than for cricket (we don’t know how fast balls travel in the course of a rally). In the case of tennis there exists no traditional physical method for digitizing line-calls which is intrinsic to the game as with the bails in cricket. Decisions are traditionally made on the basis of fallible human observation – which is to say that the digitization is normally done by human beings.
It seems quite likely that Hawk-Eye could do better than a human umpire in most circumstances but, once more, this is not the same as saying it is always correct. [xxvii] Again, an argument from first principles suggests that it is bound to make occasional mistakes. Hawk-Eye reports on its own website that the mean error in the position of the tennis ball as measured by its system is 3. 6mm. Again, what it does not report is the distribution and dispersion of errors or the conditions under which errors are greater or smaller. Here, if we take the 3. mm as the `mean deviation’ of the errors, we can do some simple `as if’ calculations {see caveats}. These calculations do not necessarily bear on the actual performance of Hawk-Eye because we have too little information but they indicate the kind of thinking and calculating that might be done. These calculations assume that the distribution of Hawk-Eye’s errors is the normal distribution and that there is no systematic error in Hawk-Eye’s measurements as systematic error is normally understood (but see section 8 below). Dispersions of errors are usefully reported in terms of the `standard deviation. If the distribution of the errors was the well-known and frequently encountered `normal distribution,’ {caveat 5} then if the mean deviation is 3. 6mm the standard deviation would be about 3. 6mm x 1. 25 = 4. 5mm. [xxviii] Because, in a normal distribution, 95% of the points lie within approximately 2 standard deviations of the mean and 99% lie within about 2. 6 standard deviations, we can estimate some putative confidence intervals. In this case we could say that in the 5% of Hawk-Eye’s predictions (that is 1 in 20), the error could be greater than about 9mm and in 1% it could be greater than 11. mm. The physics of the situation means that there could be an absolute upper cut off point for the errors and this could be smaller than the calculation from an assumed normal distribution would imply, but we have no firm information as to whether this is the case. Even if the numbers we have calculated are correct this would not mean that Hawk-Eye’s call would be wrong every time it makes a significant mistake. This is because rightness and wrongness in terms of the binary decision (IN or OUT) depends on the direction of the error.
Nevertheless, if the figures were correct it would be likely to be wrong on some of those occasions and the incidents described above could have been such occasions. According to Hawk-Eye Innovations’ own website, in the case of the Federer-Nadal call, Hawk-Eye called the ball IN by only 1mm: the possibility for mistakes is clear even if we look no further than the 3. 6mm mean deviation {see caveats}. But what does the mean error of 3. 6mm in tennis imply? Is this the mean measured for all shots including, say, lobs and low fast drives or serves {caveat 11}?
Just as in the case of varying kinds of ball in cricket, it seems likely that the error will not be equally dispersed in tennis for different kinds of shot. For example, it is likely to be greatest in the direction of travel of a fast moving ball. In this case velocity across the line of travel is zero, or almost zero, but in the direction of travel small errors in measurement will make a big difference to the position of the ball. In Figure 3 the back line of a court is shown with a ball clipping the back edge. Uncertainty is indicated by a dotted circle surrounding the topmost ball.
The diagram is only very roughly to scale at best but the circle is meant to show the error associated with between 2 and 3 standard deviations assuming a normal distribution and the other assumptions made above. In other words, on these assumptions, between one time in 20 and one time in 100 the actual position of the ball will be nestled up somewhere against the inside of the dotted circle. The lower ball shows roughly what the error would look like if it was concentrated into the direction of flight of the ball.
Again, scale is not accurate and the degree of elongation of the oval might be exaggerated but we cannot tell in the absence of more information. If the 3. 6mm mean is in fact averaged over all kinds of shot it could be that in the case of fast drives or serves the elongation should be even more exaggerated. Here we are not in a position to make a positive claim about these things, merely indicate possibilities that have not been discussed in the public domain as far as we can see. [pic] Figure 3: Is the error concentrated in the direction of travel of the ball and to what extent?
In an initial email exchange with Hawk-Eye Innovation’s Tennis Operations Manager, we were referred to the International Tennis Federation if we wished to understand the methods of testing Hawk-Eye’s errors. The International Tennis Federation (ITF) provides details of its testing procedures for automated line-callers on a website. We understand the ITF has the true position of the ball measured with very high-speed cameras. The crucial passages read as follows: A4. 5 Accuracy and Reliability The decision-making success rate (i. e. in” or “out” decisions) for all balls bouncing between 100 mm inside the line and 50 mm outside the line should be 100% with a tolerance of ± 5 mm. The average absolute discrepancy for all impacts on a single line on court should be no more than 5 mm. The maximum discrepancy between the system’s measure of the distance from the line and the true distance should be 10 mm for all impacts. The system should be capable of making the correct in/out decision if a ball legally crosses a line from outside to inside. We found these rules difficult to understand.
Initially, we could not understand how in/out decision-making can be 100% accurate if there is a tolerance of 5mm. On the face of it, these statements seem incompatible – a ball could be 5mm out and still be called IN. We thought that even if we forget about distribution of errors and just accept the 5mm at face value, if Hawk-Eye was taking its measure of accuracy from the ITF {caveat 12} the Federer-Nadal disputed ball might well have been OUT by nearly a quarter of an inch, even though Hawk-Eye called it IN. If we accept Hawk-Eye Innovations’ own figure of 3. mm average error and its claim that the ball was 1mm IN, the possibility for a mistake is still obvious. The ITF appeared to agree and in response to our inquiries (all of which took place on 26th Jan 2008), their spokesman said: … in general, if the ball landed sufficiently close to the edge of the line, there is a chance that Hawk-Eye could make the wrong call. Hawk-Eye Innovations’ own website contains a discussion of this specific line call. The introductory paragraph remarks: This document provides more information about the line call that Roger Federer questioned during the Wimbledon Men’s Singles Final on Sunday 8th July.
Whilst it is unable to prove conclusively that the ball was 1mm IN as shown by Hawk-Eye, it can show that 1mm IN is a likely [sic]. [xxix] The ITF was also able to clear up at least some of our confusion in a speedy way. Here is the gist of the initial response from the ITF. All decisions made by a line-calling system (“in” or “out”) must be correct, unless the ball lands within 5 mm of the outside edge of the line, when an incorrect decision is allowed, providing that the absolute error in the system’s measured impact location is no more than 10 mm. Example 1. True impact location: 4 mm “out”.
System’s measured impact location: 2 mm “in”. Outcome: Acceptable (wrong decision, but absolute discrepancy < 10 mm). Example 2. True impact location: 4 mm “out”. System’s measured impact location: 8 mm “in”. Outcome: Unacceptable (wrong decision, absolute discrepancy > 10 mm). In sum, the ITF accepts errors of up to 10 mm for individual impacts, and the system may still pass the accuracy test overall (contingent on meeting the other performance criteria). Incidentally, we asked the ITF how many impacts were involved in their tests. They explained: Over the full evaluation, at least 80, and normally 100-120.
Of these, around 10% land within 5 mm of the line. Thus, it could be that the ITF tested Hawk-Eye’s performance in the crucial zone around the edge of a line on only around 6 to 15 impacts of ball with court. To conclude on random error in tennis, it seems to us that the contribution of Hawk-Eye would be much better understood if, just as in cricket, it were admitted that on a few occasions it will be wrong and that each prediction were associated with a confidence interval. Each line call provided by Hawk-Eye in real time should be associated with a claim about the confidence.
Figure 4 suggests ways in which these possibilities could be indicated to the public (though the error circle might need to be elongated as in Figure 3). [pic] Figure 4: Possible ways of indicating possible error in tennis line-calls 8. Decision-maker or decision aid? Whilst random error can be dealt with by estimating confidence intervals, the case of systematic error is more complex. Whilst the common sense idea might be to off-set it in such a way as to return the mean error to zero, applying this principle in the case of tennis raises more subtle questions about the role of human and machine judges.
The Hawk-Eye Innovations’ website makes the claim, supported by stills from a high-speed camera, that the human eye, and television replays, can be systematically misleading under certain circumstances. [xxx] The ball may just touch the line but skid so that it is still in contact with the ground when it bounces upward well beyond the line, giving the impression that it did not, in fact, touch the line at all. The website seems to show, then, that it is possible for the ball to appear OUT on a television replay or to the naked eye but still be just IN.
Barring unknown sources of human or machine `malfunction,’ the disagreement between humans and Hawk-Eye in the case of the Wimbledon dispute has, then, two possible explanations. The first is the one given in the preceding sections, namely that it was a random measurement error in Hawk-Eye {see caveats}. The second, suggested in Hawk-Eye Innovations’ own analysis, is that the disagreement results from a systematic error in human judgment. A different way of looking at the problem comes from the sociology of technology. xxxi] It might be that Hawk-Eye could become defined as the `decision-maker’ rather than the `decision aid’ such that questions concerning its accuracy would no longer be relevant. This is already beginning to happen in tennis, though at the moment it occurs only when a player makes `a challenge’ and Hawk-Eye’s reconstruction fulfills the same definitive role as the bails in cricket. Making Hawk-Eye the authority in this way could resolve a number of problems by providing readily acceptable explanations for otherwise borderline decisions.
Thus, in cricket, it is not unknown for the ball to brush against the wicket but fail to dislodge the bails. No one claims the bails are `inaccurate’ because every one accepts that bail displacement is the digital definition hitting the wicket. Similarly, if Hawk-Eye’s decisions were to be made the defining criterion of lbw, IN or OUT in tennis, and so forth, players would come to talk in terms of `bad luck’ if a call went against their own judgment rather than `inaccuracy,’ just as they now talk of bad luck rather than inaccuracy if a bail does not fall when the ball gently touches a stump.
The question is: how would this change the game? In cricket there could be substantial changes. For instance, it has been argued that if Hawk-Eye’s `face value’ lbw projections were taken as the defining criterion, many cricket games in which it was used would be much shorter, perhaps leading to a financial crisis. As discussed above, in lbw decisions made by umpires, a systematic error is deliberately introduced via the well-established rule that the batsman gets the benefit of the doubt.
Since there is a lot of doubt in human lbw decisions, it is often quite hard to get a decision made in the bowler’s favor. It has been suggested that Hawk-Eye’s projections, if taken literally, would greatly increase the number of lbw decisions unless batsmen started to play differently {caveat 13}. [xxxii] In tennis similar considerations apply though they are of a more subtle nature. Let us assume that the Federer-Nadal ball was actually 1mm IN as Hawk-Eye called it.
That is, let us suppose there was no significant random error in this case but that a fast traveling ball had distorted and skidded on hitting the surface such that though there was a small fraction of a second when the trailing portion of the skin of the ball was in contact with the line, no human eye would spot it. Let us suppose, as Hawk-Eye Innovations’ website argues, that in most such cases humans are likely to call the ball OUT. It follows that in nearly all pre-Hawk-Eye matches such a case would have been called OUT – as the umpire did in fact first call it.
Assuming there is such a systematic bias, the introduction of Hawk-Eye as the authority would change the game of tennis meaning, among other things, that umpires and players would be less confident in stopping a rally when the ball appeared OUT to the naked eye. This raises the question: Do we (either as supporters, viewers or players) want the games to change in these ways? In passing it should be noted that the question of whether Hawk-Eye would bring about such a change {see caveats} is now being obscured.
Those who watched the Australian Open tennis championship in the early part of 2008 will have noticed that where Hawk-Eye was used to make a decision a television replay was not offered – it was Hawk-Eye or nothing. The viewer had nothing to go on as regards any question of whether Hawk-Eye was right or not. This seems an unnecessary restriction and allowing viewers to see both the normal replay and the Hawk-Eye reconstruction would allow for a more informed debate about which kind of decision-maker is preferable.
As it is, this debate has been closed off because only one set of evidence is available. How should we use sports decision aids? Understanding random and systematic errors is, then, intimately related to changes in sport. In cricket we have seen the possibility that if Hawk-Eye’s decisions were taken at face value and used to replace the umpire in the matter of lbw, it might cause a financial crisis. In tennis the potential changes concern players and umpires as they make decisions about whether to stop a rally if the ball appears OUT to the naked eye.
In both cricket and tennis, and presumably other sports not yet considered, it would mean that rules would be applied differently in the top echelons of the game, where the devices were in use, as compared to the lower echelons; top level games would become still more different from lower level games than they are now. Finally, the game as seen by the viewer would become different to the game as seen by the technology. The last paragraph invites a resolution in the form of three rules of thumb: Don’t make the cure worse than the disease.
Minimize the gulf between the technically assisted game and the non-technically assisted game. Don’t disappoint the spectator. ’ Applying these principles suggests that technological decision aids should be adjusted to make the same systematic errors as are made by existing human judges so the game changes as little as possible. When it comes to random errors things are different. No one likes to see games decided by bizarre decisions made by harassed umpires and referees but the spectacle is becoming more and more common thanks to television replays.
There is little doubt that devices such as Hawk-Eye are more reliable than human decision-makers where `reliable’ means they will make the same decision again and again in the same circumstances and they are not likely to make bizarrely wrong decisions as a result of lapses of attention. Even forgetting about the bizarre, devices like Hawk-Eye are almost certain to make smaller random errors than referees and umpires in circumstances where the ball is not close to a critical edge and as computer processing capacity and speeds increase they should improve further. A possible counter example would be the ball that bounces on or just in front of the batsman’s foot in cricket in which case Hawk-Eye cannot gather any data but if the batsman is well back a human umpire may be able to make a sound `OUT’ decision {caveat 14}). Combining these concerns leads to an over-arching rule covering both random and systematic error which we will call `The Automated Decision Principle’: THE AUTOMATED DECISION PRINCIPLE Use automated sports decision aids to reproduce human systematic error while minimizing random error; explain what is done and assign confidence levels to automated judgments.
This approach maintains human systematic errors as an integral part of the game. In contrast, as things stand, the role of sports decision devices is implicitly defined as to make more accurate decisions than umpires and referees – which implies correcting both random and human systematic errors. The point is made clear in the case of the Federer-Nadal challenge. Under the Automated Decision Principle, even if we neglect random error and accept that the ball really was one millimeter IN, it would still have been called OUT because all humans would see it as OUT.
Figure 5 summarises the way human systematic error could be reproduced by Hawk-Eye-like devices: [pic] Figure 5: Possible ways to reproduce human systematic error On the left of Figure 5 is shown a wicket. To reproduce the `benefit of the doubt’ rule it would only be necessary for Hawk-Eye to make its decisions on a smaller virtual set of stumps indicated by the shaded box. Balls predicted to hit the wicket in the area outside that box would count as `NOT OUT’ on the basis of benefit of the doubt. The same result could be produced by giving a high cut-off point – such as 99. % to the confidence demanded before an `OUT’ decision was made. [xxxiii] On the right of Figure 5 is shown a tennis line at the rear of the court with a shaded area. If it was predicted that the skin of the ball touched this shaded area, the call would still be `OUT. ’ Unfortunately, in tennis the matter is more complicated because the size of the shaded area should depend on the speed of the ball. For example, in the case of a lob the shaded area would be smaller if our analysis is correct {see caveats}.
The actual decision about how to apply these rules would have to be made by the various sports’ governing bodies. How large should the virtual wicket be, or what would be the correct cut-off point for confidence level in case of lbw, if the human benefit-of-the-doubt rule were to be reproduced? To do it properly some observation and analysis of existing human practices would be a good idea. In tennis, high-speed cameras could be used to measure human judges’ propensities to call OUT even when balls moving at various speeds just touch the line.
This information would help to establish an appropriate rule for Hawk-Eye-like devices. This is how systematic error could be handled. In the case of random error the matter is more complicated. In the case where a ball impacted well away from the in-out edge, decision-making devices could, essentially, replace human line callers. They would improve upon human line-callers because they could obviate cases where the human caller was momentarily unsighted or when a bizarre call is made for some reason.
Where the ball was close to the in-out edge something more sophisticated would be needed even after systematic error had been discounted. Imagine that random measurement error could be reliably estimated and consider a line call where the confidence was less than, say, 99%. The tennis authorities could adopt a `benefit of the doubt rule’ – always in or always out. Or they could have the call decided at random – as effectively happens now but without acknowledgement or systematic understanding of the bias in the errors. Or they could ask for the point to be replayed. Most neutral viewers of the Nadal-Federer match would probably have been happier to see that point replayed rather than called IN. ) Exactly what the cut-off point should be, and what the rule should be, or whether there should be a series of rules for different cut-off points, ball speeds, and surfaces, is not something that can be properly thought through without knowing the dispersion of errors in the technological assistant. It might even be that the umpire should retain the final say in these cases, using the reported confidence associated with each automated call as an aid.
To proceed in this way, at the very least the information indicated by the questions listed in note 8 would need to be provided or technical explanations given for its irrelevance. Or the doubts should be resolved by estimating error in some other demonstrable and accountable way and reporting the levels of error clearly. 9. Artificial Intelligence, micro-worlds and virtual realities The difference between Hawk-Eye and human judgment can be understood in a more generalized way that pertains to entire enterprise known as `artificial intelligence. This is the difference between the real world and what has been called a `micro-world. ’[xxxiv] Hawk-Eye called the Federer-Nadal ball IN by one millimeter. Such a call could be made only in a `micro-world’ – the world of Hawk-Eye’s virtual reality. In real life, the edge of a line painted on grass cannot be defined to an accuracy of one millimeter. First because grass and paint are not like that, and second, because, even given perfect paint and a perfect surface to draw on, the apparatus used to paint the line is unlikely to maintain its accuracy to one millimeter over the width of the court.
Furthermore, tennis balls are furry and it is not clear that their edges can be defined to an accuracy of one millimeter. In short, in the real world of tennis we do not quite know what `touching the line’ means. In the real world of tennis it is also possible that a ball that touches the perfectly defined virtual line in the supposedly equivalent micro-world {caveat 15} might not touch the fuzzy edged and not-exactly straight real line actually painted on the court. In short, at Wimbledon there is no such thing as `in by one millimeter. ’
A frequently encountered mistake in artificial intelligence is to take micro-worlds to stand for, or even to be superior to, real worlds and to take possibilities that could pertain in a micro-world (stacking of blocks by an automatic crane, exact measurement, exact machine translation, exact speech-transcription, and so forth) to pertain in the real world. To some extent this may be happening here. The micro-world ethos would certainly encourage the claim that where Hawk-Eye’s decisions differ systematically from those of humans, it is Hawk-Eye that should be taken as the authority because it is the `more accurate. In sum, uncritical acceptance of the artificial intelligence approach directs the use of sports decision devices away from the Automated Decision Principle. 10. Public Understanding A device like Hawk-Eye, which is squarely in the public domain, should be properly understood by the public whose lives it affects. Furthermore, devices like Hawk-Eye could have a valuable role to play in public education the benefits of which would spread to all technological decision-making in the public domain.
It is vital that people understand uncertainty and come to understand that some decisions that are made for the best are bound to turn out to be wrong because of the levels of uncertainty that attend every decision. This paper has set out some of the ways in which this public understanding could be enhanced – by clearly stating