Teppei Nakano

Teppei Nakano

Waseda University

H-index: 6

Asia-Japan

About Teppei Nakano

Teppei Nakano, With an exceptional h-index of 6 and a recent h-index of 4 (since 2020), a distinguished researcher at Waseda University,

His recent articles reflect a diverse array of research interests and contributions to the field:

Deep Multi-stream Network for Video-based Calving Sign Detection

Voice or Content?—Exploring Impact of Speech Content on Age Estimation from Voice

Narrow Down Forecast Range: Using Knowledge of Past Operations and Attribute-Dependent Thresholding in Good Fishing Ground Prediction

Video Surveillance System Incorporating Expert Decision-making Process: A Case Study on Detecting Calving Signs in Cattle

クラウドソーシングにおける動的タスク発注モデルの教師なし学習

Worker Filtering Criteria for Subjective Evaluation of Synthesized Voice Sound Quality Using Crowdsourcing

PostMe: Unsupervised Dynamic Microtask Posting For Efficient and Reliable Crowdsourcing

クラウドソーシングを用いた合成音声の音質主観評価のためのワーカ選抜基準

Teppei Nakano Information

University

Waseda University

Position

___

Citations(all)

176

Citations(since 2020)

93

Cited By

114

hIndex(all)

6

hIndex(since 2020)

4

i10Index(all)

4

i10Index(since 2020)

1

Email

University Profile Page

Waseda University

Top articles of Teppei Nakano

Deep Multi-stream Network for Video-based Calving Sign Detection

Authors

Ryosuke Hyodo,Teppei Nakano,Tetsuji Ogawa

Journal

arXiv preprint arXiv:2302.08493

Published Date

2023/1/10

We have designed a deep multi-stream network for automatically detecting calving signs from video. Calving sign detection from a camera, which is a non-contact sensor, is expected to enable more efficient livestock management. As large-scale, well-developed data cannot generally be assumed when establishing calving detection systems, the basis for making the prediction needs to be presented to farmers during operation, so black-box modeling (also known as end-to-end modeling) is not appropriate. For practical operation of calving detection systems, the present study aims to incorporate expert knowledge into a deep neural network. To this end, we propose a multi-stream calving sign detection network in which multiple calving-related features are extracted from the corresponding feature extraction networks designed for each attribute with different characteristics, such as a cow's posture, rotation, and movement, known as calving signs, and are then integrated appropriately depending on the cow's situation. Experimental comparisons conducted using videos of 15 cows demonstrated that our multi-stream system yielded a significant improvement over the end-to-end system, and the multi-stream architecture significantly contributed to a reduction in detection errors. In addition, the distinctive mixture weights we observed helped provide interpretability of the system's behavior.

Voice or Content?—Exploring Impact of Speech Content on Age Estimation from Voice

Authors

Yuta Ide,Naohiro Tawara,Susumu Saito,Teppei Nakano,Tetsuji Ogawa

Published Date

2023/9/4

To investigate the impact of speech content on age estimation accuracy from voice data, we created a corpus of speech utterances featuring identical content spoken by individuals of varying ages. Subsequently, we analyzed the age estimation outcomes derived from this dataset. Previous studies have identified biases in age-labeled speech corpora regarding speaker age and vocabulary usage. Given that speech content typically varies with the speaker's age during conversations, it's plausible that age estimation results could be influenced by speech content. To address this concern, we developed a dataset in which speakers of different ages delivered speech content that was consistent across all speakers and tailored to the characteristics of each age group. We estimated the speakers' ages both manually, through crowdsourcing, and automatically and then conducted a significance test to assess whether …

Narrow Down Forecast Range: Using Knowledge of Past Operations and Attribute-Dependent Thresholding in Good Fishing Ground Prediction

Authors

Haruki Konii,Teppei Nakano,Yasumasa Miyazawa,Tetsuji Ogawa

Published Date

2023/6/5

In this study, an attempt has been made to highly narrow down the results of predicting good fishing grounds. In the automatic prediction of good fishing grounds using meteorological and oceanographic information, there is a trade-off between reducing the number of missed detections of good fishing grounds and narrowing down the forecast range. For example, the forecast range will inevitably widen if good fishing grounds are tried to be detected perfectly, which has a negative impact on fishers’ operational decision-making. This study, therefore, attempts to introduce techniques to narrow down the forecast range, such as the use of past operational information and stricter thresholding for judgment, to the convolutional autoencoder-based good fishing ground prediction. Experimental comparisons conducted using actual catch information in bullet tuna trolling demonstrated the effectiveness of the proposed …

Video Surveillance System Incorporating Expert Decision-making Process: A Case Study on Detecting Calving Signs in Cattle

Authors

Ryosuke Hyodo,Susumu Saito,Teppei Nakano,Makoto Akabane,Ryoichi Kasuga,Tetsuji Ogawa

Journal

arXiv preprint arXiv:2301.03926

Published Date

2023/1/10

Through a user study in the field of livestock farming, we verify the effectiveness of an XAI framework for video surveillance systems. The systems can be made interpretable by incorporating experts' decision-making processes. AI systems are becoming increasingly common in real-world applications, especially in fields related to human decision-making, and its interpretability is necessary. However, there are still relatively few standard methods for assessing and addressing the interpretability of machine learning-based systems in real-world applications. In this study, we examine the framework of a video surveillance AI system that presents the reasoning behind predictions by incorporating experts' decision-making processes with rich domain knowledge of the notification target. While general black-box AI systems can only present final probability values, the proposed framework can present information relevant to experts' decisions, which is expected to be more helpful for their decision-making. In our case study, we designed a system for detecting signs of calving in cattle based on the proposed framework and evaluated the system through a user study (N=6) with people involved in livestock farming. A comparison with the black-box AI system revealed that many participants referred to the presented reasons for the prediction results, and five out of six participants selected the proposed system as the system they would like to use in the future. It became clear that we need to design a user interface that considers the reasons for the prediction results.

クラウドソーシングにおける動的タスク発注モデルの教師なし学習

Authors

柳澤遼, 斎藤奨, 中野鐵兵, 小林哲則, 小川哲司

Journal

電子情報通信学会技術研究報告; 信学技報

Published Date

2022/6/27

抄録 (和) クラウドソーシングによるアノテーション品質を効率的に担保する枠組みとして, データの難易度に応じて発注数を最適化する動的タスク発注モデルを教師なしで学習する方式を提案した. クラウドソーシングにより収集される回答は誤りを含むため, サンプルごとに複数のワーカから回答を収集し, 多数決などにより集約することで信頼性を担保する. このとき, 一般的に多数決を行うワーカ数が多いほど高品質なラベルが得られるが, 発注数の増加に伴い発注費用も増大するため, 最終的なラベルの精度を高く維持したまま, 多数決を行うワーカ数を削減することが望ましい. そこで, 複数ワーカによる回答のばらつきが小さいほど多数決の信頼性が高いという仮定のもと, 回答のばらつきが十分小さくなるまでワーカへの発注を続ける動的タスク発注モデルに着目し, ラベルの誤りと発注費用を最小にするようなモデルパラメータを教師なしで学習する方式を提案した. 家畜の監視画像に対するアノテーションタスクにおいて提案方式の有効性を検証したところ, 教師あり学習と同等の性能を達成す …

Worker Filtering Criteria for Subjective Evaluation of Synthesized Voice Sound Quality Using Crowdsourcing

Authors

Moe Yaegashi,Susumu Saito,Teppei Nakano,Tetsuji Ogawa

Journal

IEICE Technical Report; IEICE Tech. Rep.

Published Date

2022/6/10

(in English) We investigate the effect of filtering criteria of crowdworkers on the subjective evaluation results of synthesized voice using crowdsourcing. Currently, crowdsourcing has been used for subjective evaluation of synthesized voice. Although it is desirable to remove workers who do not satisfy the client's requirements, worker filtering criteria have not yet been defined. In this study, we focused on subjective evaluation of sound quality (amount of distortion) and examined filtering criteria. In the filtering test, the comparison task was designed so that attributes other than intonation and sound quality were identical in order to enable evaluation of the ability to distinguish differences in sound quality. In order for the worker to understand the difference in sound quality intuitively, we showed the workers the highly distorted voice several times repeatedly at the beginning of the evaluation. We conducted sound quality …

PostMe: Unsupervised Dynamic Microtask Posting For Efficient and Reliable Crowdsourcing

Authors

Ryo Yanagisawa,Susumu Saito,Teppei Nakano,Tetsunori Kobayashi,Tetsuji Ogawa

Published Date

2022/12/17

Even after over a decade of many crowdsourcing researches, we have no standard framework for low-cost quality assurance in crowdsourced data annotation. This paper proposes an unsupervised learning method for dynamic microtask posting which allows each microtask to adjust their own number of collected responses based on the data difficulty. Since crowdsourced data labels are likely to contain errors, researchers often employ majority voting that aggregates responses from multiple workers to calculate a final l abel. T his t echnique, h owever, i nvolves a trade-off between label accuracy and cost. This paper presents a dynamic microtask posting model that reduces the total number of collected responses while maintaining the labeling accuracy; we also aim to obtain the model with an “unsupervised” approach, which does not require training through experience of microtask posting for data labeled with …

クラウドソーシングを用いた合成音声の音質主観評価のためのワーカ選抜基準

Authors

八重樫萌絵, 斎藤奨, 中野鐵兵, 小川哲司

Journal

研究報告音声言語情報処理 (SLP)

Published Date

2022/6/10

論文抄録クラウドソーシングを用いた合成音声の主観評価において, クラウドワーカの選抜基準が評価結果に与える影響を調査した. 現在, 合成音声の主観評価においてクラウドソーシングの利活用が進んでいる. その際, 所望する条件を満たすワーカに対してのみ評価を依頼できることが望ましいが, 合成音声の主観評価においてそのようなワーカ選抜基準は確立されていない. それに対し本研究では, 音質 (歪みの度合い) の評価に焦点を当て, 合成音声主観評価のためのワーカ選抜基準について検討を行った. 選抜試験では, 音質の違いを聞き分ける能力を評価可能にするために, 抑揚, 音質以外の属性は同一となるように比較タスクを設計した. さらに, ワーカが音質の違いを直感的に理解できるように, 選抜試験の冒頭で歪みの多い音声を複数回連続して提示した. Amazon Mechanical Turk 上で音質評価実験を行い, i) 歪みの量に着目して評価をしているか (意図理解度), ii) 回答に整合性はあるか (回答整合率), iii) 確信をもって回答しているか (回答確信度), といった選抜基準が主観評価結果に与える影響を調査した. その結果, 意図理解や回答に対する確信の度合いの測定はワーカの選抜において有効であり, そのためには, 意図理解に役立つサンプル (ここでは, 音質が悪い音声) を数サンプル用意して比較タスクに含めれば良いことが明らかになった.

Do You Know How Humans Sound? Exploring a Qualification Test Design for Crowdsourced Evaluation of Voice Synthesis Quality

Authors

Moe Yaegashi,Susumu Saito,Teppei Nakano,Tetsuji Ogawa

Published Date

2022/11/7

This paper explores the effect of crowd worker filtering criteria on the crowdsourced subjective evaluation of synthesized voice. Currently, crowdsourcing is being used for the subjective evaluation of synthesized voice. In this case, it is important to remove workers who do not satisfy the requester's requirements, but effective worker filtering criteria remain unexplored. In this study, we focused on sound quality evaluation and explored effective worker filtering criteria for subjective evaluation of synthesized voice. To filter workers who can evaluate sound quality, we designed a task that compares pairs of synthesized voices with different amounts of distortion and selects less distorted voices. In this task, some pairs included obviously distorted voices to measure the degree of attention to distortion in the evaluation of each worker. The following three criteria for worker filtering were defined: whether the evaluation focused …

Sequential fish catch counter using vision-based fish detection and tracking

Authors

Riko Tanaka,Teppei Nakano,Tetsuji Ogawa

Published Date

2022/2/21

An attempt has been made to develop a system for sequentially counting the number of fish caught using images taken on board. Fish catch counting for each local sea area contributes to fishery resource management and decision support for efficient operation. In this case, visual information is helpful for an intuitive explanation. The developed system consists of fish detection, fish tracking, and overdetected track deletion: to count fish robustly to its movement around on a deck, the fish detection stage attempts to absorb changes in the appearance of the fish, while the tracking stage dares not to use the appearance information to prevent the tracks from being unduly disconnected. Experimental comparisons using onboard video data of bullet tuna trolling demonstrated that the system could count fish with 89% precision and 87% recall.

Can Humans Correct Errors From System? Investigating Error Tendencies in Speaker Identification Using Crowdsourcing.

Authors

Yuta Ide,Susumu Saito,Teppei Nakano,Tetsuji Ogawa

Published Date

2022/9/18

An attempt was made to clarify the effectiveness of crowdsourcing on reducing errors in automatic speaker identification (ASID). It is possible to efficiently reduce errors by manually revalidating the unreliable results given by ASID systems. Ideally, errors should be corrected appropriately, and correct answers should not be miscorrected. In addition, a low false acceptance rate is desirable in authentication, but a high false rejection rate should be avoided from a usability viewpoint. It, however, is not certain that humans can achieve such an ideal SID, and in the case of crowdsourcing, the existence of malicious workers cannot be ignored. This study, therefore, investigates whether manual verification of error-prone inputs by crowd workers can reduce ASID errors and whether the resulting corrections are ideal. Experimental investigations on Amazon Mechanical Turk, in which 426 qualified workers identified 256 speech pairs from VoxCeleb data, demonstrated that crowdsourced verification can significantly reduce the number of false acceptances without increasing the number of false rejections compared to the results from the ASID system.

Inlier modeling-based good fishing ground detection for efficient bullet tuna trolling using meteorological and oceanographic information

Authors

Yuka Horiuchi,Teppei Nakano,Yasumasa Miyazawa,Tetsuji Ogawa

Published Date

2022/2/21

An attempt has been made to construct a system for detecting good fishing grounds using meteorological and oceanographic information. Monitoring fishing ground conditions is helpful for fishermen’s decision-making for efficient operations and fishery resource management. Since it is not realistic to monitor the ocean condition of the entire target area, an inlier modeling-based (also referred to as unsupervised) detector is constructed using only the good fishing ground data observed during the operation, and useful features for monitoring fishing ground conditions are also investigated. Experimental comparisons using four years of operation data of bullet tuna trolling demonstrated that the developed system detected good fishing grounds with a recall of about 99%.

Unsupervised Learning of a Dynamic Task Ordering Model for Crowdsourcing

Authors

Ryo Yanagisawa,Susumu Saito,Teppei Nakano,Tetsunori Kobayashi,Tetsuji Ogawa

Journal

IEICE Technical Report; IEICE Tech. Rep.

Published Date

2022/6/27

(in English) An unsupervised learning method for a dynamic task ordering model that optimizes the number of orders according to the difficulty of the data was proposed as a framework for efficiently ensuring annotation quality through crowdsourcing. Since responses collected by crowdsourcing contain errors, the responses were collected from multiple workers for each sample and then aggregated by majority voting to ensure reliability. However, since the monetary cost increases as the number of orders increases, it is desirable to reduce the number of workers who perform majority voting while maintaining the high accuracy of the final label. Therefore, we focus on a dynamic task ordering model that continues to place orders to workers until the variation in responses becomes sufficiently small, based on the assumption that the smaller the variation in responses by multiple workers, the more reliable the majority …

Improving speaker identification performance by crowd-assisted verification of results

Authors

YUTA IDE,SUSUMU SAITO,TEPPEI NAKANO,TETSUJI OGAWA

Journal

情報処理学会研究報告 (Web)

Published Date

2021

Tang, Y., Ding, G., Huang, J., He, X. and Zhou, B.: Deep speaker embedding learning with multi-level pooling for text-independent speaker verification, ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE, pp. 6116-6120 (2019).

意思決定支援のための説明可能な状態監視システムの設計・運用法 (家畜の映像監視を例に)

Authors

小川哲司, 兵頭亮介, 斎藤奨, 中野鐵兵

Journal

IEICE Conferences Archives

Published Date

2021/2/23

状態監視の主目的は,緊急の意思決定を必要とする「平常とは異なる」状態を早期に検知し,監視依頼者に通知することである.例えば,牛繁殖農家は牛を監視しながら分娩の介助や人工授精のタイミングに関する意思決定を行う.こうした状態監視の多くは専門性の高いデータを扱うためオープンデータはほとんど存在せず,かつアノテーションに専門知識が必要なことが,人工知能技術を用いた自動化への障壁となっている.また,意思決定に際しては,判定プロセスがblack-boxな予測システムは使い勝手が悪い.我々は,このような問題の解決のためには,専門家の意思決定プロセスを状態監視モデルに組み込みながら,専門家によるアノテーションのみに頼らずに学習できるようにモデルを設計するアプローチが有望と考えている.本稿では,提案アプローチを繁殖牛の状態監視に適用した事例を紹介する.

イジングマシンを用いたミュージアムガイドの発話計画最適化

Authors

高津弘明, 安藤涼太, 中野鐵兵, 柏川貴弘, 木村浩一, 松山洋一

Published Date

2021

イジングマシンによりユーザごとに最適化された発話計画に基づいて効率的で一貫した展示物解説を行うミュージアムガイドシステムを提案する. 発話計画生成問題を, 談話構造と合計発話時間の制約もと, ユーザの興味度が最大となる文集合を抽出する QUBO (Quadratic Unconstraint Binary Optimization) モデルで定式化した. 提案手法を評価するために, 談話構造とユーザのプロフィールおよび作品とその解説文に興味度を付与したテキストコーパスを構築した. このデータセットを用いて, シミュレーテッドアニーリングベースのイジングマシンであるデジタルアニーラにより QUBO モデルを実用的な時間で制約違反なく解くことができることを確認した. また, 主観評価実験を行い, 提案手法で生成したシナリオに基づいてバーチャルミュージアム上の展示物を解説することの有効性を確認した.

Feature representation learning for calving detection of cows using video frames

Authors

Ryosuke Hyodo,Teppei Nakano,Tetsuji Ogawa

Published Date

2021/1/10

Data-driven feature extraction is examined to realize accurate and robust calving detection. Automatic calving sign detection systems can support farmers' decision making. In this paper, neural networks are designed to extract information relevant to calving signs, which can be observed from video frames, such as the frequency in pre-calving postures, statistics in movement, and statistics in rotation. Experimental comparisons using surveillance videos demonstrate that the proposed feature extraction methods contribute to reducing false positives and explaining the basis of the prediction compared to the end-to-end calving detection system.

Toward building a data-driven system for detecting mounting actions of black beef cattle

Authors

Yuriko Kawano,Susumu Saito,Teppei Nakano,Ikumi Kondo,Ryota Yamazaki,Hiromi Kusaka,Minoru Sakaguchi,Tetsuji Ogawa

Published Date

2021/1/10

This paper tackles on building a pattern recognition system that detects whether a pair of Japanese black beefs captured in a given image region is in a “mounting” action, which is known to be a sign critically important to be detected for cattle farmers before artificial insemination. The “mounting” action refers to a cattle's action where a cow bends over another cow usually when either cow is in estrus. Although a pattern recognition-based approach for detecting such an action would be appreciated as being low-cost and robust, it had not been discussed much due to the complexity of the system architecture, unavailability of datasets, etc. This study presents i) our image dataset construction technique that exploits both object detection algorithm and crowdsourcing for collecting cattle pair images with labels of either “mounting” or not; and ii) a system for detecting the mounting action from any given image of a cattle …

Crowdsourced verification for operating calving surveillance systems at an early stage

Authors

Yusuke Okimoto,Soshi Kawata,Susumu Saito,Teppei Nakano,Tetsuji Ogawa

Published Date

2021/1/10

This study attempts to use crowdsourcing to facilitate the operation of pattern-recognition-based video surveillance systems at an early stage. Target events (i.e. events to be detected during surveillance) are not frequently observed in recorded video, so achieving reliable surveillance on the basis of machine learning requires a sufficient amount of target data. Acquiring sufficient data is time-consuming. However, operating unreliable surveillance systems can induce many false alarms. Crowdsourcing is introduced to address this problem by verifying the unreliable results in data-driven surveillance. Experimental simulation conducted using monitoring video of Japanese black beef cattle demonstrates that crowdsourced verification successfully reduced false alarms in calving detection systems.

VocalTurk: Exploring feasibility of crowdsourced speaker identification

Authors

Susumu Saito12,Yuta Ide,Teppei Nakano12,Tetsuji Ogawa

Published Date

2021

This paper presents VocalTurk, a feasibility study of crowdsourced speaker identification based on our worker dataset collected in Amazon Mechanical Turk. Crowdsourced data labeling has already been acknowledged in speech data processing nowadays, but empirical analysis that answer to common questions such as “how accurate are workers capable of labeling speech data?” and “what does a good speech-labeling microtask interface look like?” still remain underexplored, which would limit the quality and scale of the dataset collection. Focusing on the speaker identification task in particular, we thus conducted two studies in Amazon Mechanical Turk: i) hired 3,800+ unique workers to test their performances and confidences in giving answers to voice pair comparison tasks, and ii) additionally assigned more-difficult tasks of 1-vs-N voice set comparisons to 350+ top-scoring workers to test their accuracyspeed performances across patterns of N={1, 3, 5}. The results revealed some positive findings that would motivate speech researchers toward crowdsourced data labeling, such as that the top-scoring workers were capable of giving labels to our voice comparison pairs with 99% accuracy after majority voting, as well as they were even capable of batch-labeling which significantly shortened up to 34% of their completion time but still with no statistically-significant degradation in accuracy.

See List of Professors in Teppei Nakano University(Waseda University)

Teppei Nakano FAQs

What is Teppei Nakano's h-index at Waseda University?

The h-index of Teppei Nakano has been 4 since 2020 and 6 in total.

What are Teppei Nakano's top articles?

The articles with the titles of

Deep Multi-stream Network for Video-based Calving Sign Detection

Voice or Content?—Exploring Impact of Speech Content on Age Estimation from Voice

Narrow Down Forecast Range: Using Knowledge of Past Operations and Attribute-Dependent Thresholding in Good Fishing Ground Prediction

Video Surveillance System Incorporating Expert Decision-making Process: A Case Study on Detecting Calving Signs in Cattle

クラウドソーシングにおける動的タスク発注モデルの教師なし学習

Worker Filtering Criteria for Subjective Evaluation of Synthesized Voice Sound Quality Using Crowdsourcing

PostMe: Unsupervised Dynamic Microtask Posting For Efficient and Reliable Crowdsourcing

クラウドソーシングを用いた合成音声の音質主観評価のためのワーカ選抜基準

...

are the top articles of Teppei Nakano at Waseda University.

What is Teppei Nakano's total number of citations?

Teppei Nakano has 176 citations in total.

    academic-engine

    Useful Links