Chapter 08 - Statistical Methodology
Pedagogical Anthropology - English Restoration
## Chapter 08 - Statistical Methodology
Having taken measurements with the rigorous technical precision that is to-day demanded by anthropometry, we should know how to extract from these figures certain *laws*, or at least certain statistical conclusions.
There are two principal methods of regrouping the figures:—*mean* *averages* and *seriations*.
**Mean Averages.**—Averages are obtained, as is a matter of common knowledge and practice, by taking the sum of all the figures and dividing the result by the number of data. The general formula is as follows:
(a+b+c+d)/(1+1+1+1)
When comparative figures are given, as, for example, those recorded by Quetélét for the stature, the diameters of the head, etc., such figures are always mean averages.
Such averages may be more or less general. We might, for example, obtain a mean average of the stature of Italians, and this would be more general than the mean stature for a single region of Italy, and this again more general than the mean stature for a city, or for some specified social class, etc.
It is interesting to know how the mean will be affected, according to the number of individuals examined, because it is obvious that the mean stature of Italians cannot be based upon measurements of *all* Italians, but upon a larger or smaller number of individuals. Now, if we take various different numbers of individuals, shall we obtain different mean statures? And if so, what number of subjects must we have at our disposal in order to obtain a constant medial figure, and hence the one that represents the *real mean average*? It has been determined that a relatively small number will suffice to give the mean, if the measurements are taken with uniform method and from the same class of subjects (sex, age, race, etc.); for the cranium, 25 subjects are sufficient, and for the stature, 100 subjects.
This method furnishes us with an abstract number, insofar as it does not correspond to any *real individual*, but it serves to give us the synthetic idea of an entirety. In anthropology we need this sort of fundamental synthesis before proceeding to individual analysis for the purpose of interpreting a specified person.
Now, it is evident that the figures representing the mean stature for each region in Italy give us a basis for judging of the distribution of this important datum, while an accumulation of a hundred thousand individual figures would lead to nothing more profitable than confusion and weariness.
The following table, however, is quite clear and instructive:
MEAN STATURE IN ITALY
(According to Departments)
| Departments | Stature in centimetres |
| ----------- | ---------------------- |
| Piedmont | 162.7 |
| Liguria | 163.7 |
| Lombardy | 163.6 |
| Venetia | 165.4 |
| Emilia | 164.0 |
| Tuscany | 164.3 |
| Marches | 162.4 |
| Umbria | 162.7 |
| Latium | 162.5 |
| Abruzzi and Molise | 160.6 |
| Campania | 161.3 |
| Apulia | 160.4 |
| Basilicata | 158.9 |
| Calabria | 159.4 |
| Sicily | 161.1 |
| Sardinia | 158.9 |
Yet the interpretation of such a table is not simple; it is necessary to read the numbers, to remember them in their reciprocal relation; and it demands effort and time to acquire a *clear and* *synthetic* idea of the distribution in Italy of this one datum, *stature*.
On the other hand, we must lose as little time and spare our forces as far as possible. The value of positive methodology lies in the extent to which it accomplishes these two subjects.
Geographical charts serve the purpose of this desired simplification. Let us take an outline map of Italy, divide it into regions, and *colour* these different regions darker or lighter, in proportion as the stature is higher or lower.
The gradations and shadings in colour will tell us at a single glance, and without any fatigue on our part, what the table of figures reveals at the cost of a very perceptible effort. Little squares must be added on the margin of the chart, corresponding to the gradations in colour, and opposite them the figures which they respectively indicate—after the fashion in which the scale of reduction is given in every geographical map. In this way we may *study* these charts, and their examination is pleasant and interesting, while it successfully associates the two ideas of an "anthropometric datum" and of a "region," a result which a series of figures, pure and simple, could not achieve.
We have seen Livi's charts of Italy, both for stature and for the cephalic index. Analogous charts may be constructed for all the different data, for example, the colour of the hair, the shape of the nose, the facial index, etc. In the same manner we may proceed to a still more analytical distribution of anthropometric data among the different provinces of a single *region*. For example, I myself prepared charts of this sort for the stature, the cephalic index and the pigmentation of the population of Latium.
Sometimes we want to see in one single, comprehensive glance, the *progress* of some anthropological datum; for instance, in its development through different ages. Quétélet's series of figures for growth in stature, in weight, in the diameters of the head, the cranial circumference, etc., offer when read the same difficulty as the similar tables of distribution according to regions. On the contrary, we get a synthetic, sweeping glance in *diagrams*, such as the one which shows the growth of stature in the two sexes. The method of constructing such diagrams is very simple, and is widely employed. When we wish to represent in physics certain phenomena and laws; or in hygiene, the progress of mortality through successive years, etc., we make use of the method of diagrams.
Let us draw two fundamental lines meeting in a right angle at *A* (Fig. 151): *AS* is known as the *axis of the abscissæ*; *AO*, the *axis of the ordinates*. We divide each of these lines into equal parts. Let us assume that the divisions of *AS* represent the years of age, and those of *AO* the measurements of stature in centimetres; and since the new-born child has an average height of 50 cm., we may place 50 as the initial figure. From the figure *O* (age) and from 50 cm. (measure), we erect perpendiculars meeting at *a*, where we mark the point. At the age of one year the average stature is about 70 cm., accordingly we erect perpendiculars from 1 (age) and from 70 (measure), obtaining the point *c*. Since the stature at two years is about 80 cm. the same procedure gives us the point *e*. Since the stature at the age of three is about 86 cm., I erect the perpendicular from a level slightly higher than half-way between 80 and 90, obtaining the point *i*; and so on, for the rest. Meanwhile we begin to be able to see at a glance that the stature increases greatly in the first year and that thereafter the intensity of its growth steadily diminishes.
![](https://www.gutenberg.org/files/46643/46643-h/images/page394.png =524x405)
Fig. 151
If we unite the points thus constructed, the line of representation is completed.
The verticals 0*a*, 1*c*, 2*e*, etc., are the *ordinates*, and the horizontals 50*a*, 70*c*, etc., are the abscissæ of the line of representation; and since it is constructed along the intersections of these lines, they are for that reason collectively called *coordinates*. It is usual in constructing these diagrams to mark the coordinates in such a way that they will not be apparent, instead of which only the axes and the line representing the development of the phenomenon are shown (Fig. 152).
Sometimes a different method of representing the phenomenon graphically is followed, namely, by tracing the successive series of distances developed on the ordinates (Fig. 153); in which case the characteristic arrangement of the lines causes this to be known as the *organ-pipe* method.
![](https://www.gutenberg.org/files/46643/46643-h/images/page395fig152.png =513x510)
Fig. 152.
![](https://www.gutenberg.org/files/46643/46643-h/images/page395fig153.png =481x448)
Fig. 153.
The diagram for the growth in stature, given earlier in this volume, is constructed according to the method shown in Fig. 151. When there are a great number of data to represent, which overlap and interweave, this method of graphic representation still lends itself admirably to the purpose; in such a case we shall have a number of broken lines, either parallel or intersecting, which may be distinguished by different colours or different methods of tracing (dots, stars, etc.), so that they may interweave without becoming confused, thus giving us at a glance the development of several phenomena at once (for example, total stature and sitting stature, length of upper and lower limbs, in one and the same diagram).
For the purpose of practice, a graphic representation of the changes in ponderal weight through the different ages may be constructed in class. The figures for stature and weight at each age should be read aloud; one student can find the corresponding *ponderal index* in the tables, while another constructs the graphic line upon the blackboard.
In this manner we can see better than by reading the figures, how the ponderal index increases during the first year and becomes much higher during early infancy; and then how it diminishes up to the age of puberty, holding its ground with slight oscillations during the puberal period; after which it again increases when the individual begins to *fill out* after the seventeenth year, and once again later when he takes on flesh, to fall off again during the closing years, when old age brings lean and shrunken limbs.
Seriation.—Another method of rearranging the figures is that of *seriation*. Let us assume that we are taking the average of a thousand statures, or of hundreds of thousands. We will try to find some means of simplifying the calculation. Since the individual oscillations of stature are contained within a few centimetres and the individuals amount to thousands, large numbers will be found to have the same *identical* statures. Accordingly, let us rearrange the individuals according to their stature, obtaining the following result:
| Stature in metres | Number of individuals |
| ----------------- | --------------------- |
| 1.50 | 20 |
| 1.55 | 80 |
| 1.60 | 140 |
| 1.61 | 200 |
| 1.62 | 300 |
| 1.63 | 450 |
| 1.70 | 100 |
| 1.75 | 80 |
| 1.80 | 10 |
By multiplying the 1.50 by 20, 1.55 by 80, etc., and by adding the results, we shall have simplified the process for obtaining the sum total which must then be divided by the number of individuals.
Well, while doing this for the purpose of simplifying the calculation, we have hit upon the method of distributing the individuals in a *series*, that is, we have regrouped the corresponding figures according to *seriation*.
Seriation has been discovered as a method of *analysing* the mean average, and it demonstrates three things: first, the extent of oscillations of anthropologic data, a thing which the mean average completely hides,—indeed, we have seen in the case of the cephalic index the mean averages oscillate between 75 and 85, when calculated for the separate regions, while, in the case of individuals, the oscillations extend from 70 to 90; secondly, it shows the numerical prevalence of individuals for the one or the other measurement; third, and finally, seriation reveals a law, to us, namely, that the distribution of individuals, according to anthropological data, is not a matter of chance; there is a prevalence of individuals corresponding to certain average figures, and the number of individuals diminishes in proportion as the measurements depart from the mean average, equally whether they increase or diminish.
I take from Livi certain numerical examples of serial distribution:
| Stature in inches | Number of observations |
| ----------------- | ---------------------- |
| 60 | 6 |
| 61 | 26 |
| 62 | 32 |
| 63 | 26 |
| 64 | 160 |
| 65 | 154 |
| 66 | 191 |
| 67 | 128 |
| 68 | 160 |
| 69 | 89 |
| 70 | 45 |
| 71 | 7 |
| 72 | 6 |
| 73 | 3 |
| 74 | 1 |
Although these figures are not rigorously exact, there is a certain numerical prevalence of individuals in relation to the stature of 66 inches, and above and below this point the number of individuals diminishes, becoming very few toward the extremes.
The lack of exactness and of agreement in serial distribution is due to the numerical scarcity of individuals. If this number were doubled, if it were centupled, we should see the serial distribution become systematised to the point of producing, for example, such symmetrical series as the following:
| 1 | 1 | 1 |
| - | - | - |
| 12 | 16 | 15 |
| 66 | 120 | 105 |
| 220 | 560 | 455 |
| 495 | 1,820 | 1,365 |
| 792 | 3,368 | 3,003 |
| 924 | 8,008 | 5,005 |
| —— | 11,440 | 6,435 |
| 792 | 12,870 | ——— |
| 495 | ———— | 6,435 |
| 220 | 11,440 | ——— |
| 66 | 8,008 | 5,005 |
| 12 | 3,368 | 3,003 |
| 1 | 1,820 | 1,365 |
| | 560 | 455 |
| | 120 | 105 |
| | 16 | 15 |
| | 1 | 1 |
This law of distribution is one of the most widespread laws; it ordains the way in which the characteristics of animals and plants alike must behave; and the statistical method which is beginning to be introduced into botany sheds much light upon it.
![](https://www.gutenberg.org/files/46643/46643-h/images/page398.png =572x314)
Fig. 154.
This law may be represented graphically by arranging the anthropologic data on the abscissæ (*e.g.*, those of stature), and the number of individuals on the ordinates.
In such cases we have a curve with a maximum central height and a symmetrical bilateral diminution (Fig. 121): this is the curve of Quétélet.
Or better yet, it is known as *Quétlét's binomial curve*, because this anthropologist was the first to represent the law graphically and to perceive that its development was the same as that so well known in mathematics for the coefficients in Newton's binomial theorem.
Newton's binomial theorem is the law for raising any binomial to the *n*th power, and is expanded in algebra as follows:
(*a*+*b*)*n* = *an*+ *na*(*n*-1)*b*+ (*n*(*n*-1)/2)*a*(*n*-2)*b*2+ ((*n*(*n*-1)(*n*-2))/(2.3))*a(n*-3)*b*3+ ((*n*(*n*-1)(*n*-2)(*n*-3))/(2.3.4))*a*(*n*-4)*b*4+ ((*n*(*n*-1)(*n*-2)(*n*-3)(*n*-4))/(2.3.4.5))*a*(*n*-5)*b*5+ ... +*b*n
substituting for *n* some determined coefficient, for example, 10, the binomial would develop, in regard to its coefficients, after the following fashion:
(*a*+*b*)10 = *a*10+10×*a*9*b*+ ((10.9)/2)*a*8*b*2+ ((10.9.8)/(2.3))*a*7*b*3+ ((10.9.8.7)/(2.3.4))*a*6*b*4+ ((10.9.8.7.6)/(2.3.4.5))*a*5*b*5+ ((10.9.8.7.6.5)/(2.3.4.5.6))*a*4*b*6+ ((10.9.8.7.6.5.4)/(2.3.4.5.6.7))*a*3*b*7+ ((10.9.8.7.6.5.4.3)/(2.3.4.5.6.7.8))*a*2*b*8+ ((10.9.8.7.6.5.4.3.2)/(2.3.4.5.6.7.8.9))*ab*9+ *b*10.
Whence it appears that, after performing the necessary reductions, the coefficients following the central one diminish symmetrically in the same manner as they increased: that is, according to the selfsame law that we meet in the anthropological statistics of seriations.
Indeed, here is the binomial theorem with the reductions made:
(*a*+*b*)10 = *a*10+10×*a*9*b*+ ((10.9)/(2))*a*8*b*2+ ((10.9.8)/(2.3))*a*7*b*3+ ((10.9.8.7)/(2.3.4))*a*6*b*4+ ((10.9.8.7.6)/(2.3.4.5))*a*5*b*5+ ((10.9.8.7)/(2.3.4))*a*4*b*6+ ((10.9.8)/(2.3))*a*2*b*7+ ((10.9)/(2))*a*2*b*8+10×*ab*9+*b*10.
And after calculating the coefficients, we obtain the following numbers in a symmetrical series:
* 10
* 45
* 120
* 210
* 252
* 210
* 120
* 45
* 10
This is why the curve of Quétélet is called *binomial*.
Let us assume that we wish to represent by means of Quétélet's curves, two seriations, for instance in regard to the stature of children of the same race, sex and age, but of opposite social conditions: the poor and the rich.
These two curves of Quétélet's, provided that they are based upon an equal and very large number of individuals, will be identical, because the law itself is universal. Only, the curve for the rich children will be shifted along toward the figures for high statures, and that for the poor children toward the low statures.
![](https://www.gutenberg.org/files/46643/46643-h/images/page401.png =603x394)
↣ Statures ↣ (Ascending Series)
**Fig. 155.**
At a certain point *A* the two curves meet and intersect, each invading the field of the other: so that within the space *ABC* there are individual rich children who are shorter than some of the poor, and individual poor children who are taller than some of the rich: *i.e.*, the conditions are contrary to those generally established by the curve as a whole. This rule also, of the intersection of binomial curves, is of broad application; whenever a general principle is stated, *e.g.* that the rich are taller than the poor, it is necessary to understand it in a liberal sense, knowing that wherever we should descend to details, the opposite conditions could be found (superimposed area *ABC*). For all that, the principle as a whole does not alter its characteristic, which is a differentiation of diverse types (for example, the tall rich and the short poor). The same would hold true if we made a comparison of the stature of men and women; the curve for men would be shifted toward the higher figures and that for women toward the lower, but there would be a point where the two curves would intersect, and in the triangle *ABC* there would be women taller than some of the men, and men shorter than some of the women. The differences have reference to the numerical *majority* (the high portions of the curves) which are clearly separated from each other, like the tops of cypress trees which have roots interlacing in the earth. Now, it is the *numerical prevalence* of individuals, in any mixed community, that gives that community its distinctive type, whether of class or of race. If we see gathered together in a socialistic assemblage a proletarian crowd, suffering from the effects of pauperism, the majority of the individuals have stooping shoulders, ugly faces and pallid complexions; all this gives to the crowd a general aspect, one might say, of physical inferiority. And we say that this is the type of the labouring class of our epoch in which labour is proletarian—a type of caste. On the other hand, if we go to a court ball, what strikes us is the numerical prevalence of tall, distinguished persons, finely shaped, with velvety skin and delicate and beautiful facial lineaments, so that we recognise that the assemblage is composed of privileged persons, constituting the type of the aristocratic class. But this does not alter the fact that among the proletariat there may be some handsome persons, well developed, robust and quite worthy of being confounded with the privileged class; and conversely, among the aristocrats, certain undersized individuals, sad and emaciated, with stooping shoulders and features of inferior type, who seem to belong to the lower social classes.
For this same reason it is difficult to give *clear-cut* limits to any law and any distinction that we meet in our study of life. This is why it is difficult in zoology and in botany to establish a system, because although every species differs from the others, in the salience of its characteristics and the numerical prevalence of individuals very much alike, none the less every species grades off so insensibly into others, through individuals of intermediate characteristics, that it is difficult to separate the various species sharply from one another. It is only the treetops that are separate, but at their bases life is intertwined; and in the roots there is an inseparable unity. The same may be said when we wish to differentiate normality from pathology and degeneration. The man who is clearly sane differs beyond doubt from the one who is profoundly ill or degenerate; but certain individuals exist whose state it would be impossible to define.
Now, while seriations analyse certain particularities of the individual distribution, by studying the actual truth, mean averages give us only an abstraction, which nevertheless renders distinct what was previously nebulous and confused in its true particulars. The synthesis of the mean average brings home to us forcibly the true nature of the characteristics in their general effect. The analysis of the seriation brings home to us forcibly the truth regarding this effect when we observe it in the actuality of individual cases.
"When, from the topmost pinnacle of the Duomo of Milan or from the hill of the Superga," says Levi in felicitous comparison, "we contemplate the magnificent panorama of the Alpine chain, we see the zone of snow distinguished from that free from snow by a line that is visibly horizontal and that stretches evenly throughout the length of the chain. But if we enter into the Alpine valleys and try to reach and to touch the point at which the zone of snow begins, that regularity which we previously admired disappears before our eyes; we see, at one moment, a snow-clad peak, and at the next another free from snow that either is or seems to be higher than the former."
Now, through the statistics of mean averages, we are able to see the general progress of phenomena, like the spectator who gazes from a distance at the Alpine chain and concludes that the zone of snow is above and the open ground is below; while, by means of seriation, we are in the position of the person who has entered the valley and discovers the actuality of the particular details which go to make up the uniform aspect of the scene as a whole. Both aspects are true—just as both of those statistical methods are useful—for they reciprocally complete each other, concurring in revealing to us the laws and the phenomena of anthropology.