Lecture 20: Data Repositories, Corpora, and Tools for Text Mining
Further Reading:
Behavior Research Methods: Volume 51, Issue 4, August 2019; Special Issue on Big Data. Guest Editors: Gary Lupyan and Rob Goldstone
Goldstone, R. L., & Lupyan, G. (2016). Discovering psychological principles by mining naturally occurring data sets. Topics in cognitive science, 8(3), 548-568.
Jones, M. N. (Ed.). (2016). Big data in cognitive science. Psychology Press.
Paxton, A., & Griffiths, T. L. (2017). Finding the traces of behavioral and cognitive processes in big data and naturally occurring datasets. Behavior Research Methods, 49(5), 1630-1638.
=======
General Sample of Archived Data Repositories:
Norms and Behavioral Databases:
Linguistic/Experiential Corpora:
Sample of Simple Text-Mining Tools:
Semantic Modeling Tools:
Image Databases:
Networks/Misc:
Behavior Research Methods: Volume 51, Issue 4, August 2019; Special Issue on Big Data. Guest Editors: Gary Lupyan and Rob Goldstone
Goldstone, R. L., & Lupyan, G. (2016). Discovering psychological principles by mining naturally occurring data sets. Topics in cognitive science, 8(3), 548-568.
Jones, M. N. (Ed.). (2016). Big data in cognitive science. Psychology Press.
Paxton, A., & Griffiths, T. L. (2017). Finding the traces of behavioral and cognitive processes in big data and naturally occurring datasets. Behavior Research Methods, 49(5), 1630-1638.
=======
General Sample of Archived Data Repositories:
- APA's Open Psychology Laboratory
- OSF Open Data Repository
- Databrary (For archived data in Human Development)
- Neurosynth (For meta-analysis of neuroimaging literature)
- Re3data (Registry of Research Data Repositories).
- Data.gov, comprises data, tools, and resources to conduct research, develop web and mobile applications and design data visualizations.
- World Bank Open Data.
- Academic Torrents
- Google Dataset Search Engine
- Generic databases/repositories: Zenodo, Figshare, Dryad, Pangaea.de, Mendeley Data, Datahub.io, Harvard Dataverse, data.opendatasoft.com (+10,000 open datasets).
Norms and Behavioral Databases:
- Semantic Priming Project
- English Lexicon Project
- University of South Florida Free Association Norms
- Buchanan's Listing of Psycholinguistic Databases
- WordBank (Stanford)
- Small World of Words Project
Linguistic/Experiential Corpora:
- Linguistic Data Consortium
- Brigham Young Corpus Mining Tools
- ACL Data Repository
- Web-as-Corpus Initiative (WaCky)
- Latest Dump from English Wikipedia
- Google Open Data: Wikipedia relations, YouTube feature vectors, ClueBase, etc.
- SUBTLEXus (Subtitle Corpus)
- Bergelson SEEDLingS data
- Talkbank
- CHILDES Corpora
- Human Speechome Project
Sample of Simple Text-Mining Tools:
- Suite of Automatic Linguistic Analysis Tools (Kris Kyle)
- Linguistic Inquiry and Word Count (LIWC) and Free Version
- Text Analysis,Crawling and Interpretation Tool
- CohMetrix
Semantic Modeling Tools:
- Original LSA Web Interface
- Spacy (Python)
- Gensim
- Word2vec pretrained embeddings (many languages)
Image Databases:
Networks/Misc: