├── .gitignore ├── LICENSE ├── README.md ├── examples ├── Allosaurus.txt ├── Python.txt └── Yeti.txt ├── nb.py ├── requirements.txt └── sample-data ├── cryptids ├── Ahool.txt ├── Bigfoot.txt ├── Chupacabra.txt ├── Jackalope.txt ├── Kraken.txt ├── Loch-Ness-Monster.txt ├── Megaconda.txt ├── New-Nessie.txt └── Skunk-Ape.txt └── dinos ├── Brachiosaurus.txt ├── Compsognathus.txt ├── Gallimimus.txt ├── Parasaurolophus.txt ├── Stegosaurus.txt ├── Trex.txt ├── Triceratops.txt └── Velociraptor.txt /.gitignore: -------------------------------------------------------------------------------- 1 | *.pyc 2 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | The MIT License (MIT) 2 | 3 | Copyright (c) 2014 yhat 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | 23 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Naive Bayes in Python Source 2 | 3 | 4 | ```bash 5 | $ git clone git@github.com:yhat/python-naive-bayes.git 6 | $ cd python-naive-bayes 7 | $ python nb.py 8 | ('File Name :', u'examples/Allosaurus.txt') 9 | ('Score(dino) :', 1.7777297062779534e+26) 10 | ('Score(crypto):', 1617656.2932267354) 11 | ('File Name :', u'examples/Python.txt') 12 | ('Score(dino) :', 5482.325210726829) 13 | ('Score(crypto):', 832.0706697339581) 14 | ('File Name :', u'examples/Yeti.txt') 15 | ('Score(dino) :', 2601.7664705783586) 16 | ('Score(crypto):', 25239.08993198242) 17 | ``` 18 | -------------------------------------------------------------------------------- /examples/Allosaurus.txt: -------------------------------------------------------------------------------- 1 | Allosaurus /ˌælɵˈsɔrəs/ is a genus of large theropod dinosaur that lived 155 to 150 million years ago during the late Jurassic period (Kimmeridgian to early Tithonian[1]). The name "Allosaurus" means "different lizard". It is derived from the Greek ἄλλος/allos ("different, other") and σαῦρος/sauros ("lizard / generic reptile"). The first fossil remains that can definitely be ascribed to this genus were described in 1877 by paleontologist Othniel Charles Marsh, and it became known as Antrodemus. As one of the first well-known theropod dinosaurs, it has long attracted attention outside of paleontological circles. Indeed, it has been a top feature in several films and documentaries about prehistoric life. 2 | 3 | Allosaurus was a large bipedal predator. Its skull was large and equipped with dozens of large, sharp teeth. It averaged 8.5 m (28 ft) in length, though fragmentary remains suggest it could have reached over 12 m (39 ft). Relative to the large and powerful hindlimbs, its three-fingered forelimbs were small, and the body was balanced by a long and heavily muscled tail. It is classified as an allosaurid, a type of carnosaurian theropod dinosaur. The genus has a complicated taxonomy, and includes an uncertain number of valid species, the best known of which is A. fragilis. The bulk of Allosaurus remains have come from North America's Morrison Formation, with material also known from Portugal and possibly Tanzania. It was known for over half of the 20th century as Antrodemus, but study of the copious remains from the Cleveland-Lloyd Dinosaur Quarry brought the name "Allosaurus" back to prominence, and established it as one of the best-known dinosaurs. 4 | 5 | As the most abundant large predator in the Morrison Formation, Allosaurus was at the top of the food chain, probably preying on contemporaneous large herbivorous dinosaurs and perhaps even other predators. Potential prey included ornithopods, stegosaurids, and sauropods. Some paleontologists interpret Allosaurus as having had cooperative social behavior, and hunting in packs, while others believe individuals may have been aggressive toward each other, and that congregations of this genus are the result of lone individuals feeding on the same carcasses. It may have attacked large prey by ambush, using its upper jaw like a hatchet. -------------------------------------------------------------------------------- /examples/Python.txt: -------------------------------------------------------------------------------- 1 | Python reticulatus, also known as the (Asiatic) reticulated python,[4] is a species of python found in Southeast Asia. Adults can grow to 6.95 m (22.8 ft) in length[5] but normally grow to an average of 3–6 m (10–20 ft). They are the world's longest snakes and longest reptile, but are not the most heavily built. Like all pythons, they are nonvenomous constrictors and normally not considered dangerous to humans. Although large specimens are powerful enough to kill an adult human, attacks are only occasionally reported. 2 | 3 | An excellent swimmer, Python reticulatus has been reported far out at sea and has colonised many small islands within its range. The specific name, reticulatus, is Latin meaning "net-like", or reticulated, and is a reference to the complex color pattern.[6] -------------------------------------------------------------------------------- /examples/Yeti.txt: -------------------------------------------------------------------------------- 1 | The Yeti (/ˈjɛti/)[3] or Abominable Snowman (Nepali: हिममानव, lit. "mountain man") is an ape-like cryptid taller than an average human that is said to inhabit the Himalayan region of Nepal and Tibet.[4] The names Yeti and Meh-Teh are commonly used by the people indigenous to the region, and are part of their history and mythology. Stories of the Yeti first emerged as a facet of Western popular culture in the 19th century. 2 | 3 | The scientific community generally regards the Yeti as a legend, given the lack of conclusive evidence,[5] but it remains one of the most famous creatures of cryptozoology. In 2014, however, two hair samples taken from remote regions of the Himalayas have been found to show a 100 per cent genetic match to a prehistoric polar-bear-like creature that existed more than 40,000 years ago. An Oxford scientist prepares expedition to find it.[6] -------------------------------------------------------------------------------- /nb.py: -------------------------------------------------------------------------------- 1 | import re 2 | import string 3 | 4 | def remove_punctuation(s): 5 | """See: http://stackoverflow.com/a/266162 6 | """ 7 | exclude = set(string.punctuation) 8 | return ''.join(ch for ch in s if ch not in exclude) 9 | 10 | def tokenize(text): 11 | text = remove_punctuation(text) 12 | text = text.lower() 13 | return re.split("\W+", text) 14 | 15 | def count_words(words): 16 | wc = {} 17 | for word in words: 18 | wc[word] = wc.get(word, 0.0) + 1.0 19 | return wc 20 | 21 | s = "Hello my name, is Greg. My favorite food is pizza." 22 | count_words(tokenize(s)) 23 | {'favorite': 1.0, 'food': 1.0, 'greg': 1.0, 'hello': 1.0, 'is': 2.0, 'my': 2.0, 'name': 1.0, 'pizza': 1.0} 24 | 25 | from sh import find 26 | 27 | # setup some structures to store our data 28 | vocab = {} 29 | word_counts = { 30 | "crypto": {}, 31 | "dino": {} 32 | } 33 | priors = { 34 | "crypto": 0., 35 | "dino": 0. 36 | } 37 | docs = [] 38 | for f in find("sample-data"): 39 | f = f.strip() 40 | if not f.endswith(".txt"): 41 | # skip non .txt files 42 | continue 43 | elif "cryptid" in f: 44 | category = "crypto" 45 | else: 46 | category = "dino" 47 | docs.append((category, f)) 48 | # ok time to start counting stuff... 49 | priors[category] += 1 50 | text = open(f).read() 51 | words = tokenize(text) 52 | counts = count_words(words) 53 | for word, count in list(counts.items()): 54 | # if we haven't seen a word yet, let's add it to our dictionaries with a count of 0 55 | if word not in vocab: 56 | vocab[word] = 0.0 # use 0.0 here so Python does "correct" math 57 | if word not in word_counts[category]: 58 | word_counts[category][word] = 0.0 59 | vocab[word] += count 60 | word_counts[category][word] += count 61 | 62 | for f in find("examples"): 63 | f = f.strip() 64 | if not f.endswith(".txt"): 65 | # skip non .txt files 66 | continue 67 | new_doc = open(f).read() 68 | words = tokenize(new_doc) 69 | counts = count_words(words) 70 | import math 71 | 72 | prior_dino = (priors["dino"] / sum(priors.values())) 73 | prior_crypto = (priors["crypto"] / sum(priors.values())) 74 | 75 | log_prob_crypto = 0.0 76 | log_prob_dino = 0.0 77 | for w, cnt in list(counts.items()): 78 | # skip words that we haven't seen before, or words less than 3 letters long 79 | if w not in vocab or len(w) <= 3: 80 | continue 81 | 82 | p_word = vocab[w] / sum(vocab.values()) 83 | p_w_given_dino = word_counts["dino"].get(w, 0.0) / sum(word_counts["dino"].values()) 84 | p_w_given_crypto = word_counts["crypto"].get(w, 0.0) / sum(word_counts["crypto"].values()) 85 | 86 | if p_w_given_dino > 0: 87 | log_prob_dino += math.log(cnt * p_w_given_dino / p_word) 88 | if p_w_given_crypto > 0: 89 | log_prob_crypto += math.log(cnt * p_w_given_crypto / p_word) 90 | d_rate = prior_dino * p_w_given_dino / (p_w_given_dino * prior_dino + p_w_given_crypto * prior_crypto) 91 | c_rate = prior_crypto * p_w_given_crypto / (p_w_given_dino * prior_dino + p_w_given_crypto * prior_crypto) 92 | #print("Bayes Problisitic in two groups for ", w) 93 | #print("In dino group: ",prior_dino * p_w_given_dino / (p_w_given_dino * prior_dino + p_w_given_crypto * prior_crypto)) 94 | #print("In crypto group: ",prior_crypto * p_w_given_crypto / (p_w_given_dino * prior_dino + p_w_given_crypto * prior_crypto)) 95 | print("File Name :", f) 96 | print("Score(dino) :", math.exp(log_prob_dino + math.log(prior_dino))) 97 | print("Score(crypto):", math.exp(log_prob_crypto + math.log(prior_crypto))) 98 | -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | sh==1.12.8 2 | -------------------------------------------------------------------------------- /sample-data/cryptids/Ahool.txt: -------------------------------------------------------------------------------- 1 | The ahool is a flying cryptid, supposedly a giant bat,[1] or by other accounts, a living pterosaur or flying primate.[2] Such a creature is unknown to science and there is no objective evidence that it exists as claimed. 2 | 3 | Like many cryptids, it is not well documented, and little reliable information - and in this case, no material evidence - exists. Named for its distinctive call A-hool (other sources render it ahOOOooool), it is said to live in the deepest rainforests of Java. It is described as having large dark eyes, large claws on its forearms (approximately the size of an infant), and a body covered in gray fur. Possibly the most intriguing and astounding feature is that it is said to have a wingspan of 3 m (10 ft). This is almost twice as long as the largest (known) bat in the world, the common flying fox. 4 | 5 | According to Loren Coleman and Jerome Clark,[3] it was first described by Dr. Ernest Bartels.[4] Bartels published regular accounts of his work while exploring the Salak Mountains on the island of Java. 6 | 7 | One speculation on its existence by the cryptozoologist Ivan T. Sanderson is that it might be a relative of Kongamato in Africa.[5] Others have suggested it were a living fossil pterosaur, on account of its supposedly leathery wings[citation needed]. As is known today, most pterosaurs seem to have had wings that were covered with a downy fluff to prevent heat loss; this may or may not have been necessary in a tropical environment depending on these animals' metabolism. On the other hand, there might be an entirely mundane explanation: 8 | 9 | Two large earless owls exist on Java, the Spotted Wood-owl (Strix seloputo) and the Javan Wood-owl (Strix (leptogrammica) bartelsi[6]). They are intermediate in size between the Spotted Owl of North America or the Tawny Owl of Eurasia, and an eagle owl (horned owl), being 40–50 cm (16–20 in) long and with a wingspan of perhaps 1.20 meters (4 ft). Despite this discrepancy, wingspans are usually overestimated[verification needed]in flying animals not held in hand (see also Thunderbird), especially by frightened observers. 10 | 11 | Size nonwithstanding, the Javan or Bartels's Wood-owl seems an especially promising candidate to resolve the ahool enigma:[7] it has a conspicuous flat "face" with large dark eyes exaggerated by black rings of feathers and a beak that protrudes but little, and it appears greyish-brown when seen from below. Its call is characteristic, a single shout, given intermittently, and sounding like HOOOH!.[8] Like most large owls, it is highly territorial in breeding season and will frighten away intruders by mock attacks from above and behind. Its flight, being an owl, is nearly completely silent, so that the victim of such sweeps usually only becomes aware of the owl when it homes in, diving with outstretched talons (held at "breast" height to the observer), and they would just have time to duck away. The Javan Wood-owl is a decidedly rare and elusive bird not often observed even by ornithologists,[9] as it hides during day. It is found in remote montane forest at altitudes of probably around 1,000-1,500 meters, and does not well tolerate human encroachment, logging and other disturbances. 12 | 13 | From its appearance and behavior, the Javan Wood-owl matches the characteristics of the ahool surprisingly well, despite the cryptid at first glance giving the impression of a mammal. Observer error due to the circumstances of being dive-bombed in a remote gloomy forest by a fierce snarling and clawing bird may well account for the apparent discrepancies. Notwithstanding, the wood-owls of Java are not generally mentioned in cryptozoological discussions of the ahool, and most authors of cryptozoologial works seem to be entirely unaware of the birds' existence. Be that as it may, it is not resolved how well the owls are known to locals, especially the local name - if any - and whether they are present in locations of ahool reports would seem to be highly relevant. It is also possible[verification needed] that the cry and the flying animal are not identical; even the local population is sometimes unaware which jungle animal makes which vocalization (see for example Satanic Eared-nightjar). -------------------------------------------------------------------------------- /sample-data/cryptids/Bigfoot.txt: -------------------------------------------------------------------------------- 1 | Bigfoot is described in reports as a large hairy ape-like creature, in a range of 2–3 m (6.6-9.8 ft) tall, weighing in excess of 500 pounds (230 kg), and covered in dark brown or dark reddish hair.[5][7] Purported witnesses have described large eyes, a pronounced brow ridge, and a large, low-set forehead; the top of the head has been described as rounded and crested, similar to the sagittal crest of the male gorilla. Bigfoot is commonly reported to have a strong, unpleasant smell by those who claim to have encountered it.[8] The enormous footprints for which it is named have been as large as 24 inches (60 cm) long and 8 inches (20 cm) wide.[7] While most casts have five toes — like all known apes — some casts of alleged Bigfoot tracks have had numbers ranging from two to six.[9] Some have also contained claw marks, making it likely that a portion came from known animals such as bears, which have five toes and claws.[10][11] Proponents claim that Bigfoot is omnivorous and mainly nocturnal.[12] -------------------------------------------------------------------------------- /sample-data/cryptids/Chupacabra.txt: -------------------------------------------------------------------------------- 1 | The chupacabra or chupacabras (Spanish pronunciation: [tʃupaˈkaβɾas], from chupar "to suck" and cabra "goat", literally "goat sucker") is a legendary cryptid rumored to inhabit parts of the Americas, with the first sightings reported in Puerto Rico.[1] The name comes from the animal's reported habit of attacking and drinking the blood of livestock, especially goats. 2 | 3 | Physical descriptions of the creature vary. It is purportedly a heavy creature, the size of a small bear, with a row of spines reaching from the neck to the base of the tail. 4 | 5 | Eyewitness sightings have been claimed as early as 1995 in Puerto Rico, and have since been reported as far north as Maine, and as far south as Chile, and even being spotted outside the Americas in countries like Russia and The Philippines, but many of the reports have been disregarded as uncorroborated or lacking evidence. Sightings in northern Mexico and the southern United States have been verified as canids afflicted by mange.[2] Biologists and wildlife management officials view the chupacabra as a contemporary legend.[3] -------------------------------------------------------------------------------- /sample-data/cryptids/Jackalope.txt: -------------------------------------------------------------------------------- 1 | The jackalope is a mythical animal of North American folklore (a so-called "fearsome critter") described as a jackrabbit with antelope horns or deer antlers and sometimes a pheasant's tail (and often hind legs). The word "jackalope" is a portmanteau of "jackrabbit" and "antelope". 2 | 3 | The story of the jackalope was popularised in Wyoming in the 1930s after a local hunter used taxidermy skills to graft deer antlers onto a jackrabbit carcass, selling the creature to a local hotel. It is possible that the tales of jackalopes were inspired by sightings of rabbits infected with the Shope papilloma virus, which causes the growth of horn- and antler-like tumors in various places on the rabbit's head and body.[1][2] The concept of an animal hybrid occurs in many cultures, such as the griffin and the chimera, and horned hares were described in medieval and early Renaissance texts. -------------------------------------------------------------------------------- /sample-data/cryptids/Kraken.txt: -------------------------------------------------------------------------------- 1 | Kraken (/ˈkreɪkən/ or /ˈkrɑːkən/)[1] is a legendary sea monster of large proportions that is said to dwell off the coasts of Norway and Greenland. The legend may have originated from sightings of giant Octopus that are estimated to grow to 9–11 m (30–40 ft) in length, including the tentacles.[2][3] The sheer size and fearsome appearance attributed to the kraken have made it a common ocean-dwelling monster in various fictional works. -------------------------------------------------------------------------------- /sample-data/cryptids/Loch-Ness-Monster.txt: -------------------------------------------------------------------------------- 1 | The Loch Ness Monster is a cryptid, a creature whose existence has been suggested but has not been discovered or documented by the scientific community.[4] It is reputedly a large unknown animal that inhabits Loch Ness in the Scottish Highlands. It is similar to other supposed lake monsters in Scotland and elsewhere, though its description varies from one account to the next. Popular interest and belief in the animal's existence has varied since it was first brought to the world's attention in 1933. Evidence of its existence is anecdotal, with minimal and much-disputed photographic material and sonar readings. 2 | 3 | The most common speculation among believers is that the creature represents a line of long-surviving plesiosaurs.[5] The scientific community regards the Loch Ness Monster as a modern-day myth, and explains sightings as including misidentifications of more mundane objects, outright hoaxes, and wishful thinking.[6] Despite this, it remains one of the most famous examples of cryptozoology. The legendary monster has been affectionately referred to by the nickname Nessie[b] (Scottish Gaelic: Niseag)[7] since the 1940s.[8] -------------------------------------------------------------------------------- /sample-data/cryptids/Megaconda.txt: -------------------------------------------------------------------------------- 1 | Reports of giant anacondas date back as far as the discovery of South America, when sightings of anacondas upwards of 50 metres (167 feet) began to circulate amongst colonists, and the topic has been a subject of debate ever since among cryptozoologists and zoologists. Anacondas can grow to sizes of 6 metres (20 ft) and beyond,[1] and 150 kilograms (330 lbs.) in weight.[2] Although some python species can grow longer,[2] the anaconda, particularly the green or common anaconda, is the heaviest and largest in terms of diameter of all snakes, and it is the second-longest extant snake in the world behind the reticulated python.[1][2] The longest reputably-measured and confirmed anacondas are about 7.5 metres (25 feet) long.[3] Lengths of 50–60 feet have been reported for this species, but such extremes lack verification. The only real reliable claims that can be found describe measured anacondas ranging from 26 to 39 feet, although these remain unverified.[3] -------------------------------------------------------------------------------- /sample-data/cryptids/New-Nessie.txt: -------------------------------------------------------------------------------- 1 | The Zuiyo-maru carcass (ニューネッシー Nyū Nesshii?, lit. "New Nessie") is a creature caught by the Japanese fishing trawler Zuiyō Maru (瑞洋丸?) off the coast of New Zealand in 1977. The carcass's peculiar appearance led to speculation that it might be the remains of a sea serpent or prehistoric plesiosaur. 2 | 3 | Although several scientists insisted it was "not a fish, whale, or any other mammal",[1] analysis later indicated it was most likely the carcass of a basking shark by comparing the number of sets of amino acids in the muscle tissue.[2][3] Decomposing basking shark carcasses lose most of the lower head area and the dorsal and caudal fins first, making them resemble a plesiosaur. -------------------------------------------------------------------------------- /sample-data/cryptids/Skunk-Ape.txt: -------------------------------------------------------------------------------- 1 | The Skunk Ape, also known as the Swamp Ape, Stink Ape, Florida Bigfoot, Myakka Ape, and the Myakka Skunk Ape, is a hominid cryptid said to inhabit Florida,[1] as well as North Carolina and Arkansas, although reports from Florida are more common. It is named for its appearance and for the unpleasant odor that is said to accompany it. According to the United States National Park Service, the Skunk Ape does not exist.[2] Reports of the Skunk Ape were particularly common in the 1960s and 1970s. In the fall of 1974, numerous sightings were reported in suburban neighborhoods of Dade County, Florida, of a large, foul-smelling, hairy, ape-like creature, which ran upright on two legs. 2 | 3 | Skeptical investigator Joe Nickell has written that some of the Skunk Ape reports may represent sightings of the black bear (Ursus americanus) and it is likely that other sightings are hoaxes or misidentification of wildlife.[3] -------------------------------------------------------------------------------- /sample-data/dinos/Brachiosaurus.txt: -------------------------------------------------------------------------------- 1 | Brachiosaurus /ˌbrækiəˈsɔrəs/ is a genus of sauropod dinosaur from the Jurassic Morrison Formation of North America. It was first described by Elmer S. Riggs in 1903 from fossils found in the Grand River Canyon (now Colorado River) of western Colorado, in the United States. Riggs named the dinosaur Brachiosaurus altithorax, declaring it "the largest known dinosaur". Brachiosaurus had a disproportionately long neck, small skull, and large overall size, all of which are typical for sauropods. However, the proportions of Brachiosaurus are unlike most sauropods – the forelimbs were longer than the hindlimbs, which resulted in a steeply inclined trunk, and its tail was shorter in proportion to its neck than other sauropods of the Jurassic. 2 | 3 | Brachiosaurus is the namesake genus of the family Brachiosauridae, which includes a handful of other similar sauropods. Much of what is known by laypeople about Brachiosaurus is in fact based on Giraffatitan brancai, a species of brachiosaurid dinosaur from the Tendaguru Formation of Tanzania that was originally described by German paleontologist Werner Janensch as a species of Brachiosaurus. Recent research shows that the differences between the type species of Brachiosaurus and the Tendaguru material are significant enough that the African material should be placed in a separate genus. Several other potential species of Brachiosaurus have been described from Africa and Europe, but none of them are thought to belong to Brachiosaurus at this time. 4 | 5 | Brachiosaurus is one of the rarer sauropods of the Morrison Formation. The type specimen of B. altithorax is still the most complete specimen, and only a relative handful of other specimens are thought to belong to the genus. It is regarded as a high browser, probably cropping or nipping vegetation as high as possibly 9 metres (30 ft) off of the ground. Unlike other sauropods, and its depiction in the film Jurassic Park, it was unsuited for rearing on its hindlimbs. It has been used as an example of a dinosaur that was most likely ectothermic due to its large size and the corresponding need for forage, but more recent research finds it to have been warm-blooded. -------------------------------------------------------------------------------- /sample-data/dinos/Compsognathus.txt: -------------------------------------------------------------------------------- 1 | Compsognathus (/kɒmpˈsɒɡnəθəs/;[1] Greek kompsos/κομψός; "elegant", "refined" or "dainty", and gnathos/γνάθος; "jaw")[2] is a genus of small, bipedal, carnivorous theropod dinosaurs. Members of its single species Compsognathus longipes could grow to the size of a turkey. They lived about 150 million years ago, the Tithonian age of the late Jurassic period, in what is now Europe. Paleontologists have found two well-preserved fossils, one in Germany in the 1850s and the second in France more than a century later. Today, C. longipes is the only recognized species, although the larger specimen discovered in France in the 1970s was once thought to belong to a separate species and named C. corallestris. 2 | 3 | Many presentations still describe Compsognathus as "chicken-sized" dinosaurs because of the small size of the German specimen, which is now believed to be a juvenile. Compsognathus longipes is one of the few dinosaur species for which diet is known with certainty: the remains of small, agile lizards are preserved in the bellies of both specimens. Teeth discovered in Portugal may be further fossil remains of the genus. 4 | 5 | Although not recognized as such at the time of its discovery, Compsognathus is the first theropod dinosaur known from a reasonably complete fossil skeleton. Until the 1990s, it was the smallest known non-avialan dinosaur; earlier it was the closest supposed relative of the early bird Archaeopteryx. -------------------------------------------------------------------------------- /sample-data/dinos/Gallimimus.txt: -------------------------------------------------------------------------------- 1 | Gallimimus (/ˌɡælɨˈmaɪməs/ gal-i-my-məs; meaning "chicken or rooster mimic") is a genus of ornithomimid theropod dinosaur from the late Cretaceous period (Maastrichtian stage) Nemegt Formation of Mongolia. With individuals as long as 8 metres (26 ft),[1] it was one of the largest ornithomimosaurs.[2] Gallimimus is known from multiple individuals, ranging from juvenile (about 0.5 metres tall at the hip) to adult (about two metres tall at the hip). -------------------------------------------------------------------------------- /sample-data/dinos/Parasaurolophus.txt: -------------------------------------------------------------------------------- 1 | Parasaurolophus (/ˌpærəsɔːˈrɒləfəs/ parr-ə-saw-rol-ə-fəs or /ˌpærəˌsɔrəˈloʊfəs/ parr-ə- sawr-ə-loh-fəs; meaning "near crested lizard" in reference to Saurolophus) is a genus of ornithopod dinosaur that lived in what is now North America during the Late Cretaceous Period, about 76.5–73 million years ago.[2] It was a herbivore that walked both as a biped and a quadruped. Three species are recognized: P. walkeri (the type species), P. tubicen, and the short-crested P. cyrtocristatus. Remains are known from Alberta (Canada), and New Mexico and Utah (USA). The genus was first described in 1922 by William Parks from a skull and partial skeleton found in Alberta. 2 | 3 | Parasaurolophus was a hadrosaurid, part of a diverse family of Cretaceous dinosaurs known for their range of bizarre head adornments. This genus is known for its large, elaborate cranial crest, which at its largest forms a long curved tube projecting upwards and back from the skull. Charonosaurus from China, which may have been its closest relative, had a similar skull and potentially a similar crest. Visual recognition of both species and sex, acoustic resonance, and thermoregulation have been proposed as functional explanations for the crest. It is one of the rarer hadrosaurids, known from only a handful of good specimens. -------------------------------------------------------------------------------- /sample-data/dinos/Stegosaurus.txt: -------------------------------------------------------------------------------- 1 | Stegosaurus (/ˌstɛɡɵˈsɔrəs/, meaning "roof lizard" or "covered lizard" in reference to its bony plates[1]) is a genus of armored stegosaurid dinosaur. They lived during the Late Jurassic period (Kimmeridgian to early Tithonian), some 155 to 150 million years ago in what is now western North America. In 2006, a specimen of Stegosaurus was announced from Portugal, showing that they were present in Europe as well.[2] Due to its distinctive tail spikes and plates, Stegosaurus is one of the most recognizable dinosaurs. At least three species have been identified in the upper Morrison Formation and are known from the remains of about 80 individuals.[3] 2 | 3 | A large, heavily built, herbivorous quadruped, Stegosaurus had a distinctive and unusual posture, with a heavily rounded back, short forelimbs, head held low to the ground and a stiffened tail held high in the air. Its array of plates and spikes has been the subject of much speculation. The spikes were most likely used for defense, while the plates have also been proposed as a defensive mechanism, as well as having display and thermoregulatory functions. Stegosaurus had a relatively low brain-to-body mass ratio. It had a short neck and small head, meaning it most likely ate low-lying bushes and shrubs. It was the largest of all the stegosaurians (bigger than genera such as Kentrosaurus and Huayangosaurus) and, although roughly bus-sized, it nonetheless shared many anatomical features (including the tail spines and plates) with the other stegosaurian genera. -------------------------------------------------------------------------------- /sample-data/dinos/Trex.txt: -------------------------------------------------------------------------------- 1 | Tyrannosaurus (/tɨˌrænəˈsɔrəs/ or /taɪˌrænəˈsɔrəs/ ("tyrant lizard", from the Ancient Greek tyrannos (τύραννος), "tyrant", and sauros (σαῦρος), "lizard"[1]) is a genus of coelurosaurian theropod dinosaur. The species Tyrannosaurus rex (rex meaning "king" in Latin), commonly abbreviated to T.rex, is one of the most well-represented of the large theropods. Tyrannosaurus lived throughout what is now western North America, which then was an island continent named Laramidia. Tyrannosaurus had a much wider range than other tyrannosaurids. Fossils are found in a variety of rock formations dating to the Maastrichtian age of the upper Cretaceous Period, 67 to 66 million years ago.[2] It was among the last non-avian dinosaurs to exist before the Cretaceous–Paleogene extinction event. 2 | 3 | Like other tyrannosaurids, Tyrannosaurus was a bipedal carnivore with a massive skull balanced by a long, heavy tail. Relative to its large and powerful hind limbs, Tyrannosaurus fore limbs were short but unusually powerful for their size and had two clawed digits. Although other theropods rivaled or exceeded Tyrannosaurus rex in size, it was the largest known tyrannosaurid and one of the largest known land predators. In fact, the most complete specimen measures up to 12.3 m (40 ft) in length,[3] up to 4 metres (13 ft) tall at the hips,[4] and up to 6.8 metric tons (7.5 short tons) in weight.[5] By far the largest carnivore in its environment, Tyrannosaurus rex may have been an apex predator, preying upon hadrosaurs, ceratopsians, and possibly sauropods,[6] although some experts have suggested the dinosaur was primarily a scavenger. The debate about whether Tyrannosaurus was an apex predator or scavenger was among the longest ongoing feud in paleontology; however, most scientists now agree that Tyrannosaurus rex was an opportunistic carnivore, acting as both a predator and a scavenger.[7] It is estimated to be capable of exerting one of the largest bite forces among all terrestrial animals.[8][9] 4 | 5 | More than 50 specimens of Tyrannosaurus rex have been identified, some of which are nearly complete skeletons. Soft tissue and proteins have been reported in at least one of these specimens. The abundance of fossil material has allowed significant research into many aspects of its biology, including its life history and biomechanics. The feeding habits, physiology and potential speed of Tyrannosaurus rex are a few subjects of debate. Its taxonomy is also controversial, as some scientists consider Tarbosaurus bataar from Asia to be a second species of the Tyrannosaurus and others maintaining the Tarbosaurus is a separate genus of dinosaur. Several other genera of North American tyrannosaurids have also been synonymized with Tyrannosaurus. -------------------------------------------------------------------------------- /sample-data/dinos/Triceratops.txt: -------------------------------------------------------------------------------- 1 | Triceratops (three-horned face in Greek") is a genus of herbivorous ceratopsid dinosaur that first appeared during the late Maastrichtian stage of the late Cretaceous period, about 68 million years ago (Mya) in what is now North America. It is one of the last known non-avian dinosaur genera, and became extinct in the Cretaceous–Paleogene extinction event 66 million years ago.[1] The term Triceratops, which literally means "three-horned face", is derived from the Greek τρί- (tri-) meaning "three", κέρας (kéras) meaning "horn", and ὤψ (ops) meaning "face".[2][3] 2 | 3 | Bearing a large bony frill and three horns on its large four-legged body, and conjuring similarities with the modern rhinoceros, Triceratops is one of the most recognizable of all dinosaurs and the best known ceratopsid. It shared the landscape with and was probably preyed upon by the fearsome Tyrannosaurus,[4] though it is less certain that the two did battle in the manner often depicted in traditional museum displays and popular images. 4 | 5 | The exact placement of the Triceratops genus within the ceratopsid group has been debated by paleontologists. Two species, T. horridus and T. prorsus, are considered valid although many other species have been named. Research published in 2010 suggested that the contemporaneous Torosaurus, a ceratopsid long regarded as a separate genus, represents Triceratops in its mature form.[5][6] The view was immediately disputed[7][8][9] and examination of more fossil evidence is expected to settle the debate. 6 | 7 | Triceratops has been documented by numerous remains collected since the genus was first described in 1889, including at least one complete individual skeleton.[10] Paleontologist John Scannella observed: "It is hard to walk out into the Hell Creek Formation and not stumble upon a Triceratops weathering out of a hillside." Forty-seven complete or partial skulls were discovered in just that area during the decade 2000–2010.[11] Specimens representing life stages from hatchling to adult have been found.[12] 8 | 9 | The function of the frills and three distinctive facial horns has long inspired debate. Traditionally these have been viewed as defensive weapons against predators. More recent theories, noting the presence of blood vessels in the skull bones of ceratopsids, find it more probable that these features were primarily used in identification, courtship and dominance displays, much like the antlers and horns of modern reindeer, mountain goats, or rhinoceros beetles.[13] The theory finds additional support if Torosaurus represents the mature form of Triceratops, as this would mean the frill also developed holes (fenestrae) as individuals reached maturity, rendering the structure more useful for display than defense.[5] -------------------------------------------------------------------------------- /sample-data/dinos/Velociraptor.txt: -------------------------------------------------------------------------------- 1 | Velociraptor (/vɨˈlɒsɨræptər/; meaning "swift seizer")[1] is a genus of dromaeosaurid theropod dinosaur that lived approximately 75 to 71 million years ago during the later part of the Cretaceous Period.[2] Two species are currently recognized, although others have been assigned in the past. The type species is V. mongoliensis; fossils of this species have been discovered in Mongolia. A second species, V. osmolskae, was named in 2008 for skull material from Inner Mongolia, China. 2 | 3 | Smaller than other dromaeosaurids like Deinonychus and Achillobator, Velociraptor nevertheless shared many of the same anatomical features. It was a bipedal, feathered carnivore with a long tail and an enlarged sickle-shaped claw on each hindfoot, which is thought to have been used to tackle prey. Velociraptor can be distinguished from other dromaeosaurids by its long and low skull, with an upturned snout. 4 | 5 | Velociraptor (commonly shortened to "raptor") is one of the dinosaur genera most familiar to the general public due to its prominent role in the Jurassic Park motion picture series. In the films it was shown with anatomical inaccuracies, including being much larger than it was in reality and without feathers. Some of these inaccuracies, along with the head's larger dome in the movies may suggest that the dinosaurs in the movies were actually modeled on Deinonychus.[3] Velociraptor is also well known to paleontologists, with over a dozen described fossil skeletons, the most of any dromaeosaurid. One particularly famous specimen preserves a Velociraptor locked in combat with a Protoceratops. --------------------------------------------------------------------------------