Skip to main content

Text Mining Tools and Methods: Start Here

This guide contains resources for researching with text mining

Text mining overview

What is text mining?

Text mining refers to a practice that involves using computers to discover information in large amounts of both structured and unstructured text., Accordingly, unstructured text is data not formatted according to an encoding structure like HTML or XML, while the structured text is seemingly organized into CSV files or SQL database. 

Examples of data used for text mining include Twitter, journal and news articles, blog posts, and email.

Researchers use text mining tasks such as:

  • frequency calculation
  • theme summary
  • sentiment analysis
  • entity extraction
  • document summarization

By using these methods, researchers can make connections and draw conclusions about the content of large text corpora. 

The image on the right is one example of what you can do with text mining. 

Text mining goals

Why do text mining?

Text mining helps researchers detect patterns and connections in large volumes of textual material.

The objective in text mining is to find previously unknown data, something that has yet to be understood and could not have written down yet. Text mining enables researchers to draw conclusions from large volumes of material they would not be able to otherwise read, synthesize, and incorporate into their scholarship.

Researchers in fields ranging from chemical sciences to the humanities have begun using text mining to detect patterns and discover unknown information. 

List of text mining software

On-line Text Mining / Text Analytics Tool

Commercial Text Mining / Text Analytics Software

  • ActivePoint, offering natural language processing and smart online catalogues, based contextual search and ActivePoint's TX5(TM) Discovery Engine.
  • Aiaioo Labs, offering APIs for intention analysis, sentiment analysis and event analysis.
  • IBM SPSS Predictive Analytics suite for data and text mining.
  • IKANOW Infinit.e, all-in-one big data analytics solution for harvesting and analyzing both structured and unstructured data, including social media data from Twitter, Facebook, and Google+.
  • SAS Text Miner, provides a rich suite of text processing and analysis tools.
  • Semantex from Janya Inc., enterprise-class information extraction system, detecting entities, attributes, relationships and events.
  • Skyttle API, a SaaS platform for sentiment analysis and keyword extraction. Supports English, French, German and Russian. See online demo at www.skyttle.com/demoin.

Free and Open-Source Text Mining / Text Analytics Software

  • Aika, an open-source library for mining frequent patterns within text, using ideas from neural nets and grammar induction.
  • Coding Analysis Toolkit (CAT), free, open source, web-based text analysis tool.
  • Data Science Toolkit, includes geo, text, NLP, and sentiment analysis tools.
  • Datumbox, a free API and many functions for Sentiment Analysis, Language Detection, Topic Classification and easily building intelligent apps.
  • FreeLing, an open source language analysis tool suite, GNU GPL.
  • GATE, a leading open-source toolkit for Text Mining, with a free open source framework (or SDK) and graphical development environment.

Ask Me

Kevin Walker's picture
Kevin Walker
Contact:
Kevin W. Walker, Ph.D.
Head of Assessment & Govt. Information

The University of Alabama Libraries
Box 870266
Tuscaloosa, AL 35487-0266
205-348-1357
Subjects:Data Services