Interview with Wenpeng Yin (CCG 2017-2019)

Dr. Wenpeng Yin joined the Cognitive Computation Group as a postdoctoral researcher in 2017. 

Dr. Wenpeng Yin
Dr. Wenpeng Yin

He is currently a tenure-track Assistant Professor at Penn State University, heading the AI4Research lab. In between, he served as Assistant Professor at Temple University (2022) and a Senior Research Scientist at Salesforce (2019-2021). He got his Ph.D. degree from the University of Munich, Germany, in 2017. 

Wenpeng’s research interests span AI for Research, Human-Centered AI, Large Language Models & NLP & Computer Vision, and general machine learning algorithms. He has been the Senior Area Chair for NAACL’2021, ACL Rolling Review, IJCNLP-AACL’23, LREC-COLING’24, and EACL’24. 

Hi, Wenpeng! Great to hear from you. Please tell me where you’re living and what you’re doing these days.

I live in Berwyn, conveniently close to King of Prussia (KOP), PA, maintaining a balanced lifestyle that seamlessly blends my dedication to research with the joy of cleaning my yard, preparing it to embrace the warm spring.

Nice! What kinds of plants grow in your yard?

I want to grow watermelons (we harvested three big watermelons with yellow flesh last summer), tomatoes, and cucumbers in the backyard, and some tulips in the front yard…but I just found deer have eaten all the new leaves of the tulips.

Aw, I’m sorry to hear about the tulips. Hope they recover, and wishing you all the best with your garden!

What are the most rewarding things about your current work?

Two dimensions: i) as a supervisor, I take immense pride in witnessing the remarkable growth of PhD students who initiate their journey in NLP research, eventually evolving into independent contributors to research projects and champions in disseminating our findings; ii) recognizing the impact of our work on the industry, evidenced by their outreach for commercial applications or collaborative endeavors, underscores our tangible contribution to both the industry and society at large.

The most surprising?

NLP research at Penn State only started in 2016 when Prof. Rebecca Jane Passonneau joined, and Penn State does not even have an undergraduate-level NLP course, which is exactly the one I have recently proposed.

I wish you all the best with putting that course together! 
How connected is your work now with the work you did in our group?

I’ve been extending my recent work, building upon my projects in CCG. Initially, my focus in CCG was on textual entailment, which played a pivotal role as indirect supervision for various NLP tasks. One prominent thread in my recent research (Arxiv 2024, CoNLL’22, TACL’22ACL’23 Tutorial) represents a natural extension of this earlier work. Additionally, my involvement in the LORELEI project in CCG, which centered around low-resource language translation, further enriched my research portfolios, such as our research work about machine translation evaluation (ICLR’24), and one of my main research directions, “Human-Centered AI”.

What new development(s) in the field of NLP are you excited about right now?

Yeah, NLP has come a long way, especially with these large language models (LLMs) making waves. But what really gets me pumped are these four things: i) “NLP for Other Disciplines“: Some folks thought NLP research was done for when these LLMs came into the scene, rocking super high performance on tasks we’ve been wrestling with for ages. Surprise, surprise—turns out, now everyone thinks NLP is the bee’s knees. It’s like this golden era where all sorts of disciplines are jumping on the NLP train, not just regular folks but also researchers from other fields who are using it to automate their research game. NLP’s never been in the spotlight like this before. ii) “NLP with Cross-Modalities“: NLP has become more effortlessly integrated with various modalities. It signifies that we’ve discovered a way to seamlessly blend knowledge across different modes, allowing information to flow smoothly between them. This was something hard to fathom just a couple of years ago. iii) “LLM+Agents“: LLM+agent combos are shaping up to be the next big thing. Even though universal LLMs are hogging the limelight, it turns out we still need specialized systems for specific domains. iv) “Open Source“: Open source is the rockstar in NLP research, making things zoom ahead and keeping everything out in the open. It’s like the norm now, making research faster and more transparent.

Thoughts about the state of AI?

Let me first look at the positive side of things: i) The behavior of AI systems today is nothing short of mind-blowing compared to just a couple of years ago. We’re witnessing an influx of potential applications that were once beyond imagination, opening up exciting possibilities. Now, let’s explore the downsides: i) Inequality is widening across the globe. Different fields benefit from AI unevenly, and people in various geographical areas have unequal access to the latest AI products and infrastructures. The dominance of top AI products by a handful of companies and a select few countries contributes to this disparity. ii) Security concerns are intensifying, with issues like forged images and videos becoming more prevalent. While AI systems often showcase unprecedented performance, it’s crucial to acknowledge that researchers still grapple with understanding the inner workings of these systems. The interpretability and control of AI systems remain challenging, leaving room for potential misuse. iii) In academia, there’s a heavy focus on studying Large Language Models (LLMs), and constructing benchmarks to evaluate their performance. Unfortunately, much of this research is heavily influenced by data-intensive and computation-intensive LLMs. This dominance limits the resources available to researchers for delving into the true nature of intelligence.

How are things outside of work?

We’re managing quite well. Our days are primarily occupied with shuttling the kids to a variety of clubs—soccer, dance, piano, gymnastics, and more. Surprisingly, weekends prove to be even busier than weekdays. Fortunately, our proximity to Philadelphia adds a delightful dimension to our lives, offering a diverse array of places like parks, museums, and various activities to explore regularly.

Excellent!  I was glad to hear when you returned to the area, and it’s nice knowing that you’re still nearby.  Do you have a memory to share from your time with the group?

Honestly, one memory that sticks with me from my time at CCG is when my daughter was diagnosed with a brain tumor just five months after joining. It meant spending practically every day at CHOP for about six months. It was a tough period, but what made it bearable was the incredible support from my CCG colleagues. I’m really grateful for their kindness and understanding, especially Dan, who was so flexible with my work during that challenging time. Those experiences, along with the group activities Dan organized, have had a big impact on how I now manage my own group at PSU.

I remember that time well.  I’m glad you felt so supported, and that you’re passing that on!

Please tell me about something you’ve read recently that you would recommend.

My wife and I recently embarked on a shared literary adventure, immersing ourselves in the book ‘A Woman Makes a Plan: Advice for a Lifetime of Adventure, Beauty, and Success‘ by Maye Musk. While the computer community recognizes Elon Musk for his groundbreaking ventures like OpenAI, Tesla, PayPal, and SpaceX, we were intrigued by the legendary status of his mother, Maye Musk. Her story fascinated us, and we eagerly sought wisdom from her book, uncovering the depth of her legendary status as a woman, and the realm of parenting and educating children. Maye Musk’s experiences and insights transcend borders, proving that legends can be forged regardless of one’s country of origin, gender, or age.

Any advice for the current students and postdocs in the group?

It’s a bit tricky to say whether the current students and postdocs are in the best era (thanks to cool stuff like LLMs) or the toughest one (e.g., publishing papers is getting trickier). But the big lesson from Dan that sticks with me is this: think about what kind of AI/NLP system you want to create, instead of just following the research of others. By figuring out your own research tastes and goals and sticking to them, you’re on the best path to stand out in this community.

That’s great.  Thank you so much for this interview!
For more information on Dr. Wenpeng Yin, please visit his website.

A flowerbed with a row of daffodils in the background and featuring tulips in shades of yellow, pale pink, and deep pink.

Event Semantic Classification in Context (EACL ’24)

Summary by Haoyu Wang

In today’s rapidly evolving field of Natural Language Processing (NLP), the quest for achieving deeper semantic understanding of texts continues to accelerate. In this new paper, “Event Semantic Classification in Context,” we demonstrate how classifying events from multiple perspectives can greatly enhance machines’ ability to understand and reason about events.

Understanding the Complex Realm of Event Semantics

Instead of the broad-brush approach of classifying easily understandable lexical items such as nouns, we delve into the nuanced domain of events. Events in texts are not mere occurrences; they are the pivot around which a narrative’s temporal dynamic, causality, and thematic progression revolve. This research classifies events based on six properties: modality, affirmation, specificity, telicity, durativity, and kinesis.

The Six Essential Properties for Event Classification:

  • Modality (Actuality) – Determines whether an event actually takes place.
  • Affirmation – Indicates whether an event is described affirmatively or negatively.
  • Specificity (Genericity) – Ascertains whether an event is a singular occurrence or part of a general trend.
  • Telicity (Lexical Aspect) – Identifies whether an event has a definite end.
  • Durativity (Punctuality) – Determines the duration over which an event unfolds.
  • Kinesis – Differentiates between states and actions.
Figure 1: An example of event semantic classification
from six perspectives. The synset of the event is drawn
from WordNet (Miller, 1992).

The significance of these classifications extends beyond mere semantic labeling. They provide foundational insights into how events are grounded in time and reality, laying the groundwork for more refined event understanding and reasoning—a leap forward in machine comprehension of narratives.

The ESC Dataset

One of the main contributions of this work is the introduction of the ESC (Event Semantic Classification) dataset. This novel bilingual dataset, encompassing both English and Chinese, is specifically crafted for fine-grained semantic classification tasks. It stands out for its inclusion of all example sentences from WordNet featuring frequent verbs, tagged with six aforementioned semantic properties concerning events.

Still Challenging for ChatGPT

Table 4: Experimental results on the ESC dataset (the numbers are averaged F1 scores on English and Chinese).
MP denotes the multi-label predictor, and MP+Gloss denotes the gloss-appended version of multi-label predictor.
Bold number in each column denote the best result for each property

We find that these fine-grained semantic understanding tasks are challenging for ChatGPT, while they can be well solved by fine-tuning smaller language models like XLM-RoBERTa.

Advancing Event Understanding and Reasoning

By integrating the classification of events according to these detailed semantic properties, the research demonstrates a marked improvement in event understanding and reasoning capabilities. This is meticulously evidenced through experiments focusing on tasks such as event extraction, temporal relation extraction, and subevent relation extraction. Notably, the dataset and the sophisticated classification models designed in this study are instrumental in making substantive advancements in these areas. By leveraging innovative datasets like ESC and pushing the boundaries of event classification, the NLP field is inching closer to unlocking the full potential of machines in understanding the intricacies of human language and thought.

To read the full paper: Haoyu Wang, Hongming Zhang, Kaiqiang Song, Dong Yu, and Dan Roth, Event Semantic Classification in Context, Findings of EACL (2024).
Dataset forthcoming.

Haoyu Wang is a third-year PhD student in the Cognitive Computation Group at the University of Pennsylvania, with a research interest in event-centric natural language understanding.