Friday Fisking: WW and ERVs

Yes, it’s the return of Friday Fisking. My first target is a chap calling himself William Wallace over at ERV. Some time ago, William left a comment in which he expressed skepticism that ERV data is justifiably used to support common descent. About a week later, he announced that he had made a model based on random insertion, and then asked for some help in creating an equivalent common descent model, with an aside that the results of the random model “doesn’t look good for your side.” This announcement was met with great derision, with calls for William to explain his conclusion. I directly addressed his question about his common descent model, pointing out where his assumption was incorrect. Even so, I remained puzzled as to how he achieved the results he claimed for both models.

Recently, it was brought up in relation to another post about ERVs. Although William refused to give any details about his model, he claimed that his results were unsurprising if you understood that math. He eventually offered to give me details via email. After a series of exchanges, I received enough information to be confident about the basics of his model.

At issue was whether, using 14 ERVs, can a nested hierarchy be created solely by random insertion without common descent. What William did was generate a series of datasets using a pseudo-random number generator. Each dataset consisted of ten species, each assigned what he calls an ERV ID hash. Basically, for each species, it sounds like he is using the pRNG to generate an integer between 0-16,383. Those familiar with binary numbers should immediately see what is going on. That range of integers can be created using 14 bits. Each bit, therefore, represents whether or not a given ERV is present in the species. For example, say the pRNG picks 14,849. Lets convert that to binary:

11101000000001

This species would have ERVs 1, 2, 3, 5, and 14, but none of the others. Repeat this for the remaining species, and you have one dataset. William sent me a sample output of his model:

                   SPECIES  4_    _Species 5
                              \  /
                   SPECIES 3_  | |  _Species 6
                             \ | | /
                 SPECIES 2_  | | | |  _Species 7
                           \ | | | | /
               SPECIES 1_  | | | | | |  _Species 8
                         \ | | | | | | /
             SPECIES 0_  | | | | | | | |  _Species 9
                       \ | | | | | | | | /
    ERV ID             | | | | | | | | | | (hits)
--------------------   - - - - - - - - - - ------
                   1   0   2 3 4 5   7   9 (7)
                   2   0   2 3 4 5   7   9 (7)
                   3   0   2 3 4 5   7   9 (7)
                   4       2 3 4 5   7   9 (6)
                   5       2 3 4 5   7   9 (6)
                   6         3 4 5   7   9 (5)
                   7         3 4 5   7   9 (5)
                   8           4 5   7   9 (4)
                   9           4 5   7     (3)
                  10           4     7     (2)
                  11           4     7     (2)
                  12           4           (1)
                  13           4           (1)
                  14           4           (1)

As you can see, a distinct nested hierarchy is present. This result is not surprising, nor is his admission that he has to look closely to find this (chosen because it mimics the 7 species nested hierarchy he was emulating) and other hierarchies in the data. With small species counts and small binary trait counts, nested hierarchies can occasionally arise from purely random assortments. The probability decreases quickly as the number of species and traits increases. It is well-known, and part of the reason phylogenies based on morphological features in particular try to include as many traits as possible. It is also why we talk about consensus phylogenies, because we compare numerous phylogenies and try to find a best fit.

So William’s results are, on their face, unsurprising. They also illustrate that we need to exercise some caution when discussing phylogenies. Then again, this is not particularly noteworthy, and it’s nothingwe didn’t already have a firm grasp on.

The problem for William is that this doesn’t actually model what he claims it models. Contrary to his claim, this does not model random insertion of ERVs. An early response to his announcement nailed his error:

And Willy, what assumptions are you making about _where ERVs integrate in the genome_?

The answer is, William assumed that each of the 14 ERVs can only insert into a single location. But that is not true. ERVs randomly insert into the genome, though there is often a bias for where a certain ERV can insert. But there are millions and even billions of potential sites. When we talk about different species having the same ERVs, they are not considered the same ERV unless they are sited in the exact same spot as well as being nearly identical in structure. And there’s some hidden information in that last sentence. How do we know that these insertions are in the exact same spot? Because we have already done a phylogeny, one in which almost all the genome matches, and found some spots that differ in a specific manner. His model, if he wished to make it even somewhat realistic, should have used 50,000 bits instead of 14. The probability of finding a nested hierarchy with that many bits is astronomically low when all you have is random insertion, let alone one that matches the consensus.

That he used only 14 bits also explains why ‘recent’ insertions were dominating ‘early’ insertions. When there are only 14 places to mutate, it doesn’t take long for a back-mutation to occur.

A model is only as good as it’s assumptions. Unfortunately for William, his assumptions were so erroneous that they rendered his model useless for its intended purpose.

About these ads

8 Responses to “Friday Fisking: WW and ERVs”

  1. LanceR, JSG Says:

    Thank you! I for one appreciate when those better educated than I lean in and “take one for the team” by wading through this sort of garbage.

  2. eddie Says:

    All this for the sake of WW trying to show an absence of common ancestry while still having a model in which an organism’s genes evolve. ERV’s are not just about changing genes during a single lifetime.
    And it doesn’t even begin to address the many other issues where fundies’ biology would differ from evolution and natural selection. For them, despite any mutation (by ERV or otherwise) and it’s associated phenotypic response, an organism is only going to get sick if the sky fairy is somehow offended by it. And reproductive success is, to them, measured against a fitness landscape of their faux morality. And even that doesn’t matter because there is no heritable trait other than original guilt. Sheesh.

  3. LanceR, JSG Says:

    Color me surprised that Limp Willy has not tried to defend his position further. I am shocked, SHOCKED that he would dump and run from further debate!

    No. Not really. Anyone wanna take bets on whether he ever broaches this subject again?

  4. W. Kevin Vicklund Says:

    I would be surprised if he didn’t broach it again. He’ll probably wait a month or two, though.

  5. charles soper Says:

    I searched for the pol gene of CERV1 (4544..5611 from NCBI’s AY692036.1) in the macaque genome and then took contiguous sequences of the hits from this with adjacent non CERV containing sequence and blasted it against Pan Trog.
    In the macaque contig NW_001112574.1 the a portion of CERV1 pol gene is found at 7352782-7351802(-), (though there’s a gap in pol sequence from its 487th to 664th nucleotide).
    Two non CERV containing adjacent fragments (I called a and b) from the macaque contig are also with the pol gene in the chimp contig for chromosome 2, Contig33.125 (AACZ03012828.1).
    The fragments a and b are found at 7347266-7347577 and 7349461-7349794 in NW_001112574.1. None of 94 other chimp contigs shared either of these two non-CERV fragments from the macaque genome and CERV1 pol sequence.
    A third non CERV fragment from the same macaque contig (which I called c) 7346538-7350059, near the pol sequence, was not in the chimp contig, but showed widespread distribution throughout the genome of several primates (nearly 3 k hits in man and 3,048 in orangutan). It was inside the regions of 15 different macaque genes.

    The second interesting site was shared between the gorilla contig CABD02426596.1 and chimp AADA01328632.1 . Most of the shared sequence is from CERV1, but there is a non CERV1 region from 6129 to 6395 in the gorilla sequence also found in the chimp sequence at 800 to 1065 (Blast scores: 375 375 100% 3e-107 92%).

  6. William Wallace Says:

    Wow. I just stumbled upon this place. The lawn needs mowing.

    The answer is, William assumed that each of the 14 ERVs can only insert into a single location.

    What evidence do you have that I assumed that?

    …ERV ID hash…

    Okay, now you’re getting closer. Strange, though, since you didn’t point out that your summary contradicted your assumption regarding my assumption.

    if he wished to make it even somewhat realistic, should have used 50,000 bits instead of 14.

    Where did you get 14 bits?

    In any event, the hypothesis tested is whether or not a pseudo random process can independently generate nested hierarchies consistent with common descent. It can, and it did. The same process also generated nested hierarchies that were not consistent with common descent.

    That is, other nested hierarchies also existed, that contradicted the hierarchy suggested by speciation events, and by the cherry picked nested hierarchy. It wasn’t surprising to me. It wasn’t surprising to you (though I am not convinced you understand).

    Yet when we (Darwinists) search for evidence of speciation in a process that is believed to be random, and we find a supporting and possibly cherry picked nested hierarchy, we simply advertise our results in support of or preconceived ideas, waving our hands wildly as we assert that coincidence is just too unlikely.

    Even though improbability doesn’t rule out coincidence, a more convincing argument could be made from your side if: A. All of the genomes of the species considered were completely mapped out. B. Every ERV match were cataloged. and C. Every single match only supported the nested hierarchy consistent with speciation, and no match supported any other nested hierarchy.

    I’m not a biologist, but it looks like Charles Soper might have just made the point disputed.

  7. W. Kevin Vicklund Says:

    Note: I saw your comment, but we have a guest this week. I might be able to compose an answer Saturday.

  8. Nobody Says:

    Hi William,

    I’m happy to chat about this, and specify a model for presence or absence of ERVs in extant species assuming common descent. First, is Kevin’s description of your model accurate? If so, it should be quite simple to switch from just simulating data to actually computing a likelihood function, and being able to compare the models on some real data.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

%d bloggers like this: