HTML-TM - About

HTML-TM (HyperText Markup Language for Text Mining) is a tool designed to facilitate the exploration and analysis of large text datasets. It provides an interactive and user-friendly interface for navigating through words, documents, and their relationships within the corpus. Below, the key components and functionalities of the HTML-TM tool are described.

Key Components of HTML-TM

1. WORDS.html

The WORDS.html file serves as the primary entry point for exploring words within the dataset. It contains a structured table with the following columns:

2. TEXTS.html

The TEXTS.html file is designed for exploring documents within the corpus. It includes a table with the following columns:

Related Documents Page

Both WORDS.html and TEXTS.html direct users to a Related Documents page when clicking on the "Related Doc." link. This page contains a table with the following columns:

Search Functionality

HTML-TM provides a table search tool that allows users to filter rows using complex queries, including logical operators (AND/OR), column-specific searches, and regular expressions. Detailed information can be accessed via the Help button located next to the search button on the WORDS.html and TEXTS.html pages.

Authors and Institution

The HTML-TM tool is developed by:

Affiliations:

  1. Laboratory of Artificial Intelligence Applied to Bioinformatics, Federal University of ParanĂ¡, Curitiba, PR, Brazil
  2. Graduate Program in Bioinformatics, Federal University of ParanĂ¡, Curitiba, PR, Brazil