ホーム > サービス(体験する) > NDL Ngram Viewer > NDL Ngram Viewer (English)

NDL Ngram Viewer (English)

https://lab.ndl.go.jp/ngramviewer/

Overview

NDL Ngram Viewer enables the visualization and enumeration of the frequency of occurrence of query words and phrases ("keywords") by publication date from OCR-generated text data.

The vertical axis of the visualization graph can be switched between two different axes:

  • frequency of occurrence, that indicates how many times the keyword appears in each publication year;
  • occurrence ratio, that indicates frequency of occurrence divided by the total number of ngrams per publication year.

As of January 2023, the search targets are approximately 1.7 billion keywords derived from the OCR text data of approximately 970,000 books and 1.32 million periodicals from the digitized materials provided in the NDL Digital Collections.

NDL Ngram Viewer was newly developed by the R&D Office at the NDL as an experimental service utilizing full-text data, which is the product of the OCR text conversion project in FY2021.

The datasets used by this service are available through the following link:

(Reference)

Number of books and periodicals by year of publication

The following graph shows the number of books and periodicals by decade. It excludes approximately 14,000 items for which the year of publication is unknown.

10年ごとの出版年を横軸にとった際の資料点数の棒グラフの画像(仮)

For more detailed information on the number of materials by publication year, see the "Books" and "Periodicals" columns in the Details of materials to be converted to OCR text(Excel File 13KB) on the OCR text conversion of digitized materials in FY2021 page.

How to search

Two types of queries are supported.

N.B. "Multiple keyword queries" and "regular expression queries" cannot be used together.

1. Multiple keyword query

By separating multiple keywords with slashes (/), their frequency of occurrence can be simultaneously visualized in order to compare them.

2. Regular expression query

Search using regular expressions is available. Keywords matching the regular expression are enumerated and displayed in order of total frequency.

3-character strings containing any one character between "平" and "盛", such as "平重盛" or "平清盛," are listed as search results.

This query returns as search results strings containing any two characters (except hiragana and katakana) before "温泉," such as "登別温泉" or "有馬温泉."

This returns strings containing "春の海ひねもす" followed by "のたり" for any number of repetitions, such as “春の海ひねもす," “春の海ひねもすのたり," “春の海ひねもすのたりのたり," etc.

Functionalities of visualization graph

Image of visualization graph

1. Switching between frequency and ratio of occurrence

By clicking the switch at the top of the graph, you can shift the vertical axis of visualization graph between "frequency of occurrence," which represents the number of occurrences per publication year, and "occurrence ratio," which represents the frequency of occurrence divided by the total number of ngrams per publication year.

2. Adjustment of the number of displayed items

By sliding the "Number of visualizations" bar at the top of the graph, you can adjust the number of keywords that match your query from the top 1 to 10 in total frequency (5 by default).

3. Deletion of individual sparklines

Clicking on a legend hides the sparkline of the clicked one. At this time, the display range of the graph will be transformed to match the rest of the legend. The display range of the graph is automatically adjusted.

The following example shows a graph in which the legend for "Kusatsu Onsen" is clicked and hidden.

a graph in which the legend for Kusatsu Onsen is clicked and hidden.

4. Search from data points

By clicking on a data point on the graph, you can perform a keyword search on our full-text search services for that publication date.

If you select "図書・雑誌 (Books & Periodicals)," "図書のみ (Books only)" or "雑誌のみ (Periodicals only)," you will be linked to search results in the NDL Digital Collections, including materials for which the copyright protection period has not yet expired.

If you select "著作権保護期間満了図書のみ (Only books whose copyright protection period has expired)," you will be directed to the search results in the Next Digital Library.

Shown below is a screenshot when clicking on the 1935 data point for the "登別温泉" sparkline, and the search results on the NDL Digital Collections. Narrow your search to Noboribetsu Onsen in 1935. Result of narrow your search to Noboribetsu Onsen in 1935.

Features in search results list

The top 10,000 hit keywords are displayed at the bottom of the page in order of total frequency.

List of keywords matching [^あ-んア-ン]{2,2}温泉

On the right of each hit keyword, a link to the search results for the query keyword in the NDL's full-text search services is displayed.

If you select "図書・雑誌 (Books & Periodicals)," "図書のみ (Books only)" or "雑誌のみ (Periodicals only)," you will be linked to search results in the NDL Digital Collections, including materials for which the copyright protection period has not yet expired.

If you select "著作権保護期間満了図書のみ (Only books whose copyright protection period has expired)," you will be directed to the search results in the Next Digital Library.

2. Downloading search results

By clicking the "Download Results" button, you can perform a bulk download of the search results. The format is tab-separated text. From left to right, "Keyword", "Total_Frequency", followed by a table of frequencies with the column name of publication year.

Terms of Use

NDL Ngram Viewer is subject to the terms and conditions of the copyright license set forth in the Creative Commons Attribution License 4.0 International (CC BY). For details on use, please refer to the Terms of Use of the National Diet Library's website.