NDL Ngram Viewer (English)
■ https://lab.ndl.go.jp/ngramviewer/
Overview
NDL Ngram Viewer enables the visualization and enumeration of the frequency of occurrence of query words and phrases ("keywords") by publication date from OCR-generated text data.
The vertical axis of the visualization graph can be switched between two different axes:
- frequency of occurrence, that indicates how many times the keyword appears in each publication year;
- occurrence ratio, that indicates frequency of occurrence divided by the total number of ngrams per publication year.
As of January 2023, the search targets are approximately 1.7 billion keywords derived from the OCR text data of approximately 970,000 books and 1.32 million periodicals from the digitized materials provided in the NDL Digital Collections.
NDL Ngram Viewer was newly developed by the R&D Office at the NDL as an experimental service utilizing full-text data, which is the product of the OCR text conversion project in FY2021.
The datasets used by this service are available through the following link:
- NDL Ngram Data (https://github.com/ndl-lab/ndlngramdata)
(Reference)
- 青池亨. E2533 - NDL Ngram Viewerの公開:全文テキストデータ可視化サービス カレントアウェアネス-E, No.442, 国立国会図書館, 2022-09-01. (in Japanese)
- 国立国会図書館電子情報部電子情報企画課次世代システム開発研究室. 蔵書の新たな探索方法を創る ―NDLのOCRテキスト化―. 国立国会図書館月報. 2022, (739), pp.15-19. (in Japanese)
Number of books and periodicals by year of publication
The following graph shows the number of books and periodicals by decade. It excludes approximately 14,000 items for which the year of publication is unknown.
For more detailed information on the number of materials by publication year, see the "Books" and "Periodicals" columns in the Details of materials to be converted to OCR text(Excel File 13KB) on the OCR text conversion of digitized materials in FY2021 page.
How to search
Two types of queries are supported.
N.B. "Multiple keyword queries" and "regular expression queries" cannot be used together.
1. Multiple keyword query
By separating multiple keywords with slashes (/), their frequency of occurrence can be simultaneously visualized in order to compare them.
Example 1: "モダンボーイ/モダンガール"
Example 2: "見れる/見られる"
2. Regular expression query
Search using regular expressions is available. Keywords matching the regular expression are enumerated and displayed in order of total frequency.
- Example 1: "平.盛"
3-character strings containing any one character between "平" and "盛", such as "平重盛" or "平清盛," are listed as search results.
- Example 2: "[^あ-んア-ン]{2,2}温泉"
This query returns as search results strings containing any two characters (except hiragana and katakana) before "温泉," such as "登別温泉" or "有馬温泉."
- Example 3: "春の海ひねもす(のたり)*"
This returns strings containing "春の海ひねもす" followed by "のたり" for any number of repetitions, such as “春の海ひねもす," “春の海ひねもすのたり," “春の海ひねもすのたりのたり," etc.
Functionalities of visualization graph
1. Switching between frequency and ratio of occurrence
By clicking the switch at the top of the graph, you can shift the vertical axis of visualization graph between "frequency of occurrence," which represents the number of occurrences per publication year, and "occurrence ratio," which represents the frequency of occurrence divided by the total number of ngrams per publication year.
2. Adjustment of the number of displayed items
By sliding the "Number of visualizations" bar at the top of the graph, you can adjust the number of keywords that match your query from the top 1 to 10 in total frequency (5 by default).
3. Deletion of individual sparklines
Clicking on a legend hides the sparkline of the clicked one. At this time, the display range of the graph will be transformed to match the rest of the legend. The display range of the graph is automatically adjusted.
The following example shows a graph in which the legend for "Kusatsu Onsen" is clicked and hidden.
4. Search from data points
By clicking on a data point on the graph, you can perform a keyword search on our full-text search services for that publication date.
If you select "図書・雑誌 (Books & Periodicals)," "図書のみ (Books only)" or "雑誌のみ (Periodicals only)," you will be linked to search results in the NDL Digital Collections, including materials for which the copyright protection period has not yet expired.
If you select "著作権保護期間満了図書のみ (Only books whose copyright protection period has expired)," you will be directed to the search results in the Next Digital Library.
Shown below is a screenshot when clicking on the 1935 data point for the "登別温泉" sparkline, and the search results on the NDL Digital Collections.
Features in search results list
The top 10,000 hit keywords are displayed at the bottom of the page in order of total frequency.
1. Hyperlinks to NDL Digital Collection or Next Digital Library
On the right of each hit keyword, a link to the search results for the query keyword in the NDL's full-text search services is displayed.
If you select "図書・雑誌 (Books & Periodicals)," "図書のみ (Books only)" or "雑誌のみ (Periodicals only)," you will be linked to search results in the NDL Digital Collections, including materials for which the copyright protection period has not yet expired.
If you select "著作権保護期間満了図書のみ (Only books whose copyright protection period has expired)," you will be directed to the search results in the Next Digital Library.
2. Downloading search results
By clicking the "Download Results" button, you can perform a bulk download of the search results. The format is tab-separated text. From left to right, "Keyword", "Total_Frequency", followed by a table of frequencies with the column name of publication year.
Terms of Use
NDL Ngram Viewer is subject to the terms and conditions of the copyright license set forth in the Creative Commons Attribution License 4.0 International (CC BY). For details on use, please refer to the Terms of Use of the National Diet Library's website.