Name gender calculatorConsider the name Lucifer. Although it refers to Satan, who is traditionally regarded as male, it has a very female ring to it in English because it sounds like the female names Lucy and Jennifer. This program takes a statistical approach to categorising names as male or female based on their similarities to existing English names. How it worksThe name gender calculator uses existing English names to generate its initial statistics. I obtained lists of 2,925 male and 4,971 female given names from the Consortium for Lexical Research. The algorithm hinges upon trigraphs, or letter triplets. For example, the name Peter is split into the trigraphs pet, ete, and ter. (Non-alphabetic characters usually function as word separators, so Jean-Paul is not considered to contain anp or npa.) For each trigraph, a male/female score is given based on that trigraph's occurrences in existing names, and finally the scores are averaged into a single percentage. I used a modified version of the program to test the algorithm on all of the existing names. The results were pleasing, with 90% of male and 56% of female names correctly categorised, i.e. a 68% success rate (5,381 correct out of 7,894). Why was it more accurate with male names? Perhaps partly because there are more trigraphs unique to male names (2,352) than there are to female ones (2,090). One problem with this method is that it does not deal well with male names that can be extended into their female forms. For example, Paul is classified as female because there are many female derivatives that contain it (Paula, Pauletta, Paulina, Pauline, and so on). For invented names, though, the gender calculator gives a pretty good approximation of how male or female the name sounds. Try it with names from fantasy or science fiction, and you'll probably be impressed.
|