When we choose to say there are "a lot" of gorillas, or "17 gorillas,"
or "around 20," we are making a choice based on our interests and
abilities. Those interests and abilities are systematically expressed
in the numbers we choose to send out into the world. The Internet provides
us with a large and diverse database of the stuff people have sent out
into the world, and it naturally includes large numbers of numbers.
Since 1997, we have collected at intervals a novel set of data on the
popularity of numbers: by performing a massive automated Internet search
on each of the integers from 0 to 1,000,000 and counting the number
of pages which contained each, we have obtained a picture of the Internet
community's numeric interests and inclinations. The interactive visualization
which accompanies this statement attempts to make some of the more striking
trends visible. Our data can be explored point by point, or viewed in
larger sets, and may lend some insight into the cognitive structure
of numeracy, culture, and memory.
Certain patterns are made readily visible in the data browser, such
as people's preference for multiples of 10, or reduplicative numbers
such as 1010, 1111, 1212, etc. While American zip codes do not present
a similarly coherent visual pattern, they can nevertheless be quite
prominent—and unlike viewing cities from space, where only the larger
cities are visible, here, both the larger and the more interesting places
are brighter. Further highlights in the data reveal more fleeting reflections
of our activities: bright spots such as those for 80486 and 68040 reveal
our interest not only in technology, but tell us about the state of
that technology at the time. Other points indicate our interest, or
lack of interest, in history. And popular culture inevitably makes its
presence felt, but perhaps less than we might expect.
Linguists have noted that the ideal construct of proper "Language"
diverges considerably from its actual use, suggesting that the linguistic
"ideal" may be a pale abstraction of a far more nuanced and textured
practice. Although we like to think of the use of numbers as objective
and removed from our personal lives, it appears they also display elements
of "practice." The denizens of the number line are not the mere automatons
or corporate tools we have made them out to be: each has a personality,
talents, communities, and sometimes a little je ne sais quois.
They reflect us. This unusual reflection is the focus of this project.
NUMBERS ARE TOOLS
In learning how to abstract, we learn that all information is potentially
expressible in numbers. The ability to abstract from perceived phenomenon
(such as a group of cows, or the effects of gravity) to descriptions
of the physical world (such as 23 or 9.8s^2) has allowed us to see commonalities
in phenomena that may first have appeared to be distinct.
One consequence of abstraction is that we must ignore the individual
characteristics of the entities we abstract. As a result, the numbers
we use to codify these abstractions must also lack character. Twenty-three
cows may be better (or worse) than three cows, but "23," is not better
than "3." Both numbers are simply descriptors, which inherit their meaning
solely from taking part in fixed systems of fixed relations with other
numbers. Apart from the existence of the numeric system (and the numbers'
participation in it) individual numbers have no meaning.
Thus, our number system is seen as an objective tool—a tool that does
not reflect human preference, emotion, or inconsistency. As such it
is a tool used not to express ourselves, but is reserved only to describe
the world around us. We do not write poetry with numbers, nor do we
express our personal doubts or prejudices through them ...except as
our humanity is projected onto the emotionless toil of mathematical
proof, ledger balances, or pedagogical exercises. But like every symbiotic
couple, the tool we would like to believe is separate from us (and thus
objective) actually provides an intricate reflection of our thoughts,
interests, and capabilities. One intriguing result of this symbiosis
is that the numeric system we use to describe patterns, is actually
used in a patterned fashion to describe.
We are imperfect users of our perfect tool. Buildings often skip the
13th floor, there is no year 0, and our only contact with very large
numbers comes from government debt, numbers which remain unreal to us
for their very size. We spend most of our time using numbers not for
calculating, or even measuring, but in acts of remembering, guessing,
and simplifying. The secret lives of numbers presented here, deals less
with the apparently inviolate laws we have contrived for the staid little
number, than with our own natures expressed through their quixotic use.
This secret life tells us not only about the number system we have fashioned,
but also about our cognitive, professional, and creative liaisons with
the various inhabitants of the number line.
THE DATA ITSELF
The characteristics of the data we describe below include aspects of
the population as a whole, patterns of smaller groups, and individual
charm. All tell us something about our culture as expressed through
The first and most striking characteristic of the data is the overall
distribution of the numbers' occurrences. Instead of the uniform distribution
one might expect if every number were equally useful, we see an exponential
drop-off in popularity beginning with the number 1. These earliest,
and most popular individuals aren't a glamorous set, but instead see
their popularity rise from their accessibility. They are the first numbers
we learn, and are the easiest to understand and use. For these reasons,
with every increase in magnitude along the number line, the numbers
see a sharp drop in this kind of basic popularity.
There are, however, certain numbers further down the line that enjoy
great popularity in spite of their greater number of digits. These numbers
comprise some of the basic "royal families" of the number
line: the base-2, base-10, base-12, and base-60 families. While most
of these families have their niches in technology, time-based media
and the English measurement system, the most prominent of these families
is the one which clearly reflects our biology. The 10 family can be
seen everywhere: numbers at multiples of 10, (and powers of ten, like
100, 1000, etc.) enjoy a popularity far greater than their neighbors
throughout the data. Our biases for "rounding" suggest that most of
these numbers' high standing comes at the direct expense of their nearest
neighbors. The positively ignored 49949 is one such serf, apparently
yielding its worldly recognition to its more prominent neighbor 50000.
In addition to the population at large, and the most prominent families
within it, some numbers enjoy a certain degree of popularity owing to
their occupation. Here are just a few examples; many more can yet be
found in the data:
90-99, and 1990-2002 (no time like the present)
90210 (the television show)
The Dorks & Techies
68040, 68030, 68000 (Macintosh)
286, 386, 486, 8086, 80286,80386, 80486 (its competitor)
2,4,16,32,64,128,256,512,1024 (base-2, now RAM sizes)
2400, 4800, 9600, 19200, 38400 (baud rates)
8859 (from the ISO-8859 character set)
The Responsible Citizens
1040, 1041 (Uncle Sam loves you)
10036, 26161, 13131, 77058… (American zip codes)
800, 888, 877 (toll-free phone number prefixes)
52062, 52064, 52066 (German postal codes)
98, 99 (why don't things ever just cost $1.00 even?)
900 (sex sells)
A LOOK AT OURSELVES
Moving up a level from the individual points and patterns covered in
the data, what can we deduce about ourselves by examining the kinds
of interests we display? The explosion of occurrences of numbers in
the range from 1990-2002 points not only to the growth of the Web during
this period, but also to a kind of temporal narcissism. We are most
interested in the year in which we live, and are less interested in
events in the past regardless of their import.
Furthermore, historical years before the 1990s do not have magnitudes
that reflect an atemporal vantage point, but appear instead to be talked
about less, the further they fall from the present. (Perhaps this trend
would not be visible if we were to do these searches on Web sites devoted
only to history.) These phenomena point to the possibility of measuring
the longevity of our memory, or the degree to which we care about how
historical events may have shaped our present lives. In the slope of
our curve between 1600 and 2002, we see an image of our cultural rate
The discussion above appears to suggest some degree of historical ignorance
on the Web. Can we observe any awareness of historical context—a context
which is actually present in our collective consciousness—on the Web?
In the periodic gathering of this data, a vernacular history suggests
itself. From the information we have gathered, we can construct a time
line of historic individuals searched for by year of birth, death, or
significant event, to create a history that consists solely of individuals
holding the public interest in any given year. This history could be
reconstructed every year with hopes of observing shifts in this expressed
historical context over time. A preliminary look at the names that accompany
number searches tells us that we may not only retrieve biographical
(Sartre, 1905-1980), and historical information (Columbus, 1492) but
also glimpses into how people are feeling about the individuals singled
out (Bill Gates, 666).
Further comparisons over time are possible, and are likely to yield
more interesting results than a look at any single slice of time can
provide. Numeric searches on a defined subset of Web pages, such as
those devoted to medicine, history, physics, or literature, may generate
further insights. Differentiating country of origin may also prove interesting.
On the most basic level, arranging the data using different parameters
may shed light on patterns not visible in the current arrangements.
The numeric system has helped document the regularity and periodicity
inherent in our environments and ourselves for millenia. In allowing
us to examine our own patterns of use, we hope this data will be used
to shed some light on our cultural biases and numeric capacities. We
also hope to underscore the influence technology has had in changing
the set of numbers we can and cannot imagine.
Benford, F. 1938. The law of anomalous numbers. Proceedings
of the American Philosophical Society 78:551.
Boyle, J. 1994. An application of the Fourier series to the most
significant digit problem. American Mathematical Monthly 101(November):879.
Hill, T.P. 1998. The first digit phenomenon. American Scientist
Newcomb, S. 1881. Note on the frequency of the use of digits in natural
numbers. American Journal of Mathematics 4:39.
Raimi, R.A. 1976. The first digit phenomenon. American Mathematical