Linguistic Data Resources on the Internet
A topically organized list of language data resources on the Internet.
Texts
Electronic Text Centers
- Center for Electronic Texts in the Humanities (CETH)
- CTI Centre for Textual Studies at Oxford University
- Directory of Electronic Text Centers Worldwide
- Electronic Text Center (ETC) at Elmer Holmes Bobst Library, New York University
- Electronic Text Center at UVirginia combines an on-line archive of thousands of SGML-encoded electronic texts (some of which are publicly available) with a library-based Center housing hardware and software suitable for the creation and analysis of text
- The Electronic Text (Etext) Pages
- Electronic Text Service at Columbia University
- LETRS: Library Electronic Texts Resource Service
- The Survey of English Usage
Digital Libraries
- Cornell Digital Library, Prototype
- Digital Library Initiative at UIUC
- Stanford University Digital Libraries Project
- UC Berkeley Digital Library Project
- University of Michigan Digital Library Project
- University of Virginia Electronic Text Library
Text Collections
- Aboriginal Studies Electronic Data Archive
- Alex: A Catalog of Electronic Texts
- AMALGAM, Automatic Mapping Among Lexico-Grammatical Annotation Models
- ARTFL Project, University of Chicago, Project for American and French Research on the Treasury of the French Language
- British National Corpus, corpora page.
- CCALAS, Centre for Computer Analysis of Language and Speech
- CCAT: Classical Studies and Religious Studies at UPenn
- CELT: Corpus of Electronic Texts, contemporary and historical Irish documents
- Corpus Cyrillo-Methodianum Helsingiense, an electronic corpus of Old Church Slavonic texts
- Corpus Linguistics, by Michael Barlow at Rice University, includes a list of corpora by language.
- Dante Project
- The Data Archive at the Univ. of Essex, computer-readable data in the social sciences and humanities
- ECI Multilingual Corpus
- E-Text Archives
- Goteborg Language Bank of Swedish
- ICAME Collection of English Language Corpora
- ICAME (Text Corpora) via Web
- International Corpus of English
- IPL Reading Room Public Online Texts
- Japanese Text Initiative
- The Labyrinth (medieval studies)
- Linguistic Data Consortium
- Literature, Electronic Books and Journals Directory via Rice Univ.
- Online Book Initiative e-texts
- On-line Books Page
- On-line books FAQ
- Oxford Text Archive (OTA)
- Penn-Helsinki Parsed Corpus of Middle English, a database of 510,000 words of syntactically parsed Middle English text for use by historical linguists
- Perseus Project, Classical Greek texts both in Greek and in English translation
- Philosophy Etexts
- Project Gutenberg
- Project Libellus (Classics)
- Spanish corpora
- SUSANNE Corpus:
- UMich Humanities Text Initiative
- WWW-to-PAT Gateway: exploiting an SGML-aware system through the Web
Dictionaries, Lexica, and Lexical Resources
Indexes and General
- A Web of On-line Dictionaries
- CELEX - The Dutch Center for Lexical Information
- Electronically Available Dictonaries and Corpora
- E-LEX: Discussing Design of Electronic Dictionaries
- Language Dictionaries and Translators
- Language Representation Database
- Lexicography e-mail discussion list
- Lexicool.com, Directory of Translation Dictionaries
- Linguistic Bibliography, compiled by Koninklijke Bibliotheek - National Library of the Netherlands
- List of Dictionaries
- Online Language Dictionaries and Translators
- Special Interest Group on the Lexicon of the ACL
Collections
- CHILDES, Child Language Data Exchange System
- EDICTA: Early Dictionaries
- Lexica from CLR (Consortium for Lexical Research)
- Lexica available from the UMich Linguistics Archive
- The Moby lexicon project (word lists, part-of-speech, thesaurus, etc.)
- Pedro's Dictionaries (Pedro M. Coutinho)
- travlang's Translating Dictionaries (German, Dutch, French, Spanish, Danish, Portuguese, etc.)
- Wordlists via Oxford
Individual Resources
- Jeffrey's Japanese/English Dictionary Server.
- ARIES Natural Language Tools, a lexical platform for the Spanish language
- ARTFL
Project Reference Collection French, English, and South Asia
dictionaries
- ARTFL Project: French Verb Conjugation.
- ARTFL Project: TLF Dictionary Form, Jean Nicot's Thresor de la langue frangaise (1606) Dictionary. Provided to ARTFL by Professor T.R. Wooldridge of the University of Toronto.
- ARTFL Project: Webster's Revised Unabridged Dictionary, 1913 Edition
- BioTech's Biotechnology Dictionary
- COBUILD English Dictionary
- COMLEX Syntax, a monolingual English Dictionary consisting of 38,000 head words intended for use in natural language processing
- Turkish-English dictionary
- CoreLex, systematic polysemy and underspecification
- EURODICAUTOM, a database of official and technical terms
- An 'English-Romanian Dictionary of Equivalent Proverbs' (second edition) from De Proverbio, University of Tasmania, Australia
- English-Urdu Dictionary
- English verb index from English Verb Classes and Alternations,
by Beth Levin
Download file: evca.zip [28K] - English Wordlists via SIL
- English wordlist with part-of-speech tags
Download file: keiras.zip [51K] - Gamilaraay Dictionary (Australian indigenous language)
- The Kamusi Project/ Internet Living Swahili Dictionary Project, Yale University, Martin Benjamin, General Editor
- LOGOS: Translations, Deja Vu, and Dictionary
- Old English resources
- Perseus Project, Greek and Latin lexica
- Roget's Thesaurus version 1.02. Provided by MICRA Inc and the Gutenberg Project
- Spanish wordlist, 90,000+ entires
Download file: span-lex.zip [261K] - The Survey of English Usage based at the University College London
- Thesaurus Linguae Latinae
- Thesaurus Linguae Graecae
- Visual Thesaurus displays interrelationships between words and meanings as spatial maps.
- Word Lists of English, Spanish, Basque, and French translated to Occitan
