We want this:
We could do something like a regular expression, but that's overkill. Note that the alpha part is variable length, so we have to be a little bit smart. But these are so regular, it's easy. Also, UniFrac wants an underscore, so we add that.
Just do this from the command line:
If you count the sequences, you should have 1124.