Duda-Hart splitting criteria


As I emphasised earlier, it is extremely important to eliminate the possibility that the c. 180 clusters that three click coda seemed to sort into were not artificially generated. A first thought of what might be triggering the high degree of splitting could be a faulty recording devise or one with an embedded truncation error. In this day and age that seems unlikely, but still worth keeping in mind. A more likely reason for it would be faulty software, but how could a software package fail that badly on the simplest possible data set?

To investigate further I obtained the analytic software from Mike, and checked if it was in standard use for use for the analysis of sperm whale coda. It was. The software package was designed such that the time intervals between successive clicks were its raw data. This gives a data array of n dimensions, where n is one less than the number of clicks in each coda type. This data can be analysed in that form. Alternatively, and more commonly, it is normalised such that the entire coda length is taken as the unit time. The software does the normalisation so that for, say, the three dimensions of four click coda we would have the following transformation

(x,y,z) -> (x’,y’,z’) , where x’ = x/(x+y+z), y’ = y/(x+y+z), and z’ = z/(x+y+z)

Notice that it still looks as if we have three dimensions, but we don’t since the last point is completely defined by the first two such that (x’,y’,z’) = (x’,y’,1-x’-y’). Thus the software should analyse it as (x’,y’)  not (x’,y’,z’). Software is often written piecemeal so could the error be as simple as that? If it was, this would explain why the three click coda was the worst effected.

By far the easiest way to test the software is to generate a Monty Carlo of the test data sample size, but I struck a problem. The software required MATLAB in order to run. If I was attached to a university, that would be fine, but as a private individual it would cost me 2,000 – 10,000 USD. Considering I only wanted it to do a hour long preliminary check, this was out of the question.

Fortunately, the foremost expert in the field of coda analysis, is not just exceptional at understanding the proper use of statistics, he is one of those few in science who also comphrehends stastistical modeling on a fundamental level. Given that his knowledge of sperm whales was peerless also, I decided to put the question directly to him. The answer I received was a jaw dropper. He knew off them, thought the poly-modal clustering was real, but was ignoring their significance as pertains to language. I will go into this in more detail later, but for now I have this to say…

I am not at all sure enough time and effort has been spent on eliminating the possibility that the coda breakdown of coda under D-H analysis is at least partially due to software problems. The number of 1D clusters, given the sample size, is roughly what I would expect (for back-of-the-envelope reasons I will not go into) if the normally distributed variable (x) was mistakenly analysed as the data pair (x,1-x). So if you are a statistics or mathematics student with time on your hands, let me know. I can give you the code, then within a couple of days, we will know if this is a factor or not. If not then CONTINUE (when I have added the next page)