Pasquale Esposito — Computational Linguist & NLP Engineer

Computational Linguist · NLP Engineer · AI Systems Builder

PhDlinguistandfull-stackNLPengineer.Iturnlinguisticstructureintodeployablesystemsandlanguageplatformsbuiltforthemodernweb.

View Projects Get in Touch

About

Linguistics meets machine intelligence

PhD-trained Computational Linguist and NLP Engineer with a track record of delivering production-grade AI systems across research and industry environments. I combine deep linguistic expertise: morphology, syntax, discourse analysis, lexicology, with hands-on LLM engineering, bridging a gap that most candidates on either side of it cannot.

My systems span the full pipeline: transformer model fine-tuning, retrieval-augmented generation (RAG), knowledge graph construction, and large-scale corpus processing. The result is NLP infrastructure that is linguistically informed, interpretable, and built to scale.

PhD

Linguistics

Years in NLP & AI

Production AI Systems

Publications

Experience

Research Fellow – NLP Engineer

Department of Industrial Engineering, University of Salerno

Jan 2026 – Present

Designed and trained a custom NER model with a domain-specific label set for Cultural Heritage entities, enabling structured extraction from heterogeneous unstructured textual sources.
Built an ontology-driven RAG chatbot by vectorising library records and transforming unstructured texts into a semantic knowledge base integrated with LLMs for efficient information retrieval.
Implemented an iterative OCR/character recognition system using active learning strategies to achieve reliable performance with very few annotated records.
Developed a semantic SEO optimiser based on ontological data for vertical semantic enrichment and improved discoverability.
Fine-tuned transformer-based models (BERT, GPT); designed data pipelines for preprocessing and annotation; integrated vector databases with LLMs; and conducted systematic model evaluation and benchmarking.

PhD Researcher

University of Salerno

Nov 2022 – Mar 2026

Doctoral thesis: Appìl — An Interface for the Improvement of Lexical Skills. Designed and developed a full-stack adaptive language learning platform integrating computational linguistics, cognitive science, and modern web technologies.
Built a personalised difficulty calibration system using spaced repetition and Item Response Theory (IRT), achieving a 33% improvement in vocabulary acquisition rates among active users.
Developed a lexical transparency algorithm to quantify cross-linguistic morphological similarity between Italian and English, processing 50,000+ word pairs.
Integrated Linguistic Linked Open Data (LLOD) to deliver authentic, domain-specific content; designed a real-time challenge adjustment mechanism grounded in Dual Process Theory for 2,500+ active learners.
Published peer-reviewed research in computational discourse analysis, cross-linguistic modelling, and AI-driven language learning. Final grade: Excellent cum laude.

Tutoring & Teaching Support

University of Salerno

Oct 2023 – Sep 2024

Delivered tutoring and teaching support for first-year students in linguistic education, including preparatory and remedial activities.
Improved student comprehension, engagement, and academic performance across language and linguistics courses.

Career & Educational Orientation Advisor

CAOT – University of Salerno

Nov 2022 – May 2024

Organised workshops and career guidance sessions to support informed academic and professional decision-making.
Assessed participants' skills and educational goals; collaborated with teachers and industry professionals.
Facilitated the development of key soft skills including teamwork, communication, and problem-solving.

Skills & Projects

Selected work.

Explore full portfolio→

Core Skills

LLM Engineering

Fine-tuning (LoRA, RLHF, instruction tuning)
Retrieval-Augmented Generation (RAG)
Prompt engineering
Evaluation pipelines & LLMOps
Vector databases
Cloud deployment (AWS, Azure, GCP)

Natural Language Processing

Named Entity Recognition (NER)
Relation & information extraction
Dependency parsing & coreference resolution
Semantic role labelling
Large-scale text processing
Corpus construction & annotation

Machine Learning

Transformer architectures (BERT, GPT, T5, LLaMA)
Neural network training & evaluation
Hyperparameter tuning
MLOps workflows
PyTorch
Active learning

Knowledge Engineering

Ontology design (RDF/OWL)
Knowledge graphs
SPARQL
Semantic Web technologies
Linguistic Linked Open Data (LLOD)
Structured knowledge representation

Software Engineering

Python (NumPy, pandas, PyTorch, HuggingFace, spaCy, NLTK)
Java
R
REST API development
Docker & CI/CD pipelines
Git & Agile methodology

Linguistics

Morphological & syntactic analysis
Discourse structure modelling
Lexicology & cross-linguistic modelling
Multilingual NLP & cross-lingual transfer
Computational pragmatics
Vocabulary & lexical frequency modelling

Tech Stack

Python

Java

Docker

Git

AWS

Linux

Featured Projects

View All

Ontology-driven RAG Chatbot

Semantic retrieval system combining knowledge graphs and LLMs for context-aware, structured information access in the Cultural Heritage domain.

Architected a hybrid retrieval pipeline that queries an OWL ontology via SPARQL to ground LLM responses in structured domain knowledge, dramatically reducing hallucinations in specialised Q&A tasks. Integrated vector databases and transformer models for lightweight, scalable deployment.

RAGKnowledge GraphsLLMsSPARQLOWLCultural Heritage

Custom NER System

Domain-specific named entity extraction pipeline built for the cultural heritage domain, achieving state-of-the-art F1 on specialised entity types.

Trained and fine-tuned a transformer-based NER model on annotated cultural heritage corpora. Designed a domain-specific label set covering artworks, institutions, historical periods, and persons. Coupled with an active-learning OCR pipeline to minimise annotation effort on scarce data.

NERNLPCultural HeritageTransformersFine-tuningActive Learning

Appìl Platform

Adaptive language learning system powered by NLP-driven personalisation, IRT-based difficulty calibration, and Linguistic Linked Open Data integration.

Designed the full NLP backbone of an adaptive learning app: lexical transparency algorithm processing 50,000+ word pairs, real-time challenge adjustment via Dual Process Theory, and LLOD integration for authentic domain-specific content. Achieved a 33% improvement in vocabulary acquisition for 2,500+ active learners.

EdTechNLPIRTLLODPersonalisationPythonFull-Stack

Semantic SEO Optimiser

Ontology-based tool for vertical semantic enrichment of web content, improving discoverability and structured knowledge representation.

Developed a pipeline that leverages domain ontologies to semantically enrich content metadata, aligning it with structured knowledge bases and improving search engine visibility through entity-level annotation and schema markup generation.

Semantic WebOntologySEONLPKnowledge Graphs

PhD Thesis

Appìl: An Interface for the Improvement of Lexical Skills

Designed and developed a full-stack adaptive language learning platform integrating computational linguistics, cognitive science, and modern web technologies.

The ongoing evolution of digital education has opened new possibilities for second language learning, yet many existing platforms remain constrained by commercial interests and limited pedagogical depth. Despite the proliferation of language learning applications, few are explicitly grounded in robust second language acquisition (SLA) theory (Ellis, 2008; Krashen, 1982; Long, 1996) or capable of offering authentic, learner-driven adaptation. This research emerges within that gap, responding to the growing need for open, scientifically informed, and pedagogically coherent digital tools that can personalize instruction while preserving methodological rigor. This doctoral study explores the intersection of second language acquisition (SLA), digital pedagogy, and computational linguistics through the design, development, and evaluation of Appìl, an adaptive, web-based language learning platform created by me, under supervision, during my doctoral studies, and situated within the broader field of intelligent computer-assisted language learning (ICALL) (Heift & Schulze, 2007). It integrates theoretical insights from SLA, psychology, and cognitive science (Evans, 2008; Kahneman, 2011; Vygotsky, 1986) with technological innovations drawn from natural language processing (NLP) and linguistic linked open data (LLOD) (Chiarcos et al., 2013). The overarching goal is to demonstrate how open, data-driven systems can enhance learner autonomy and motivation (Deci & Ryan, 2014; Zimmerman, 2002) while maintaining instructional coherence within a structured learning framework. Adopting a mixed-methods approach, the present research project combines theoretical synthesis, technical implementation, and empirical evaluation. It includes a systematic review of SLA theories and digital learning methodologies, the computational design of the Appìl platform, and a field-based evaluation of its pedagogical effectiveness. The system’s architecture merges CEFR-aligned progression (Council of Europe, 2020) with NLP-enabled personalization, providing learners with individualized flashcards, adaptive exercises, and real-time feedback tailored to their evolving proficiency levels. Findings reveal that personalization grounded in SLA theory, rather than purely algorithmic adaptation, significantly enhances learner motivation, vocabulary retention, and engagement. Learners responded positively to flexible and user-driven content pathways, although sustaining participation and addressing digital literacy barriers remained ongoing challenges. The development process, conducted by a single researcher, underscored both the potential and the practical constraints of independent, interdisciplinary platform creation, while also highlighting the centrality of human engagement throughout design and implementation.

Research

Publications & Talks

Peer-reviewed research and international conference contributions in computational linguistics, NLP, knowledge graphs, and AI-driven language learning.

Publications

8 items

2026

Lexical Variation and Knowledge Construction across Historical, Methodological, and Cultural Ecologies

Ciancia, C.; Patrick, P. L.; Esposito, P.

Terminology and Lexicography Research and Practice, John Benjamins, pp. 179–196

2026

Design and Evaluation of a Historical Conversational Agent Embodying Carlo Pisacane

Clarizia, F.; Esposito, P.; Giunto, A.; Loffredo, R.

Proceedings of the 18th International Conference on Computer Supported Education (CSEDU 2026), Vol. 1, pp. 891–901

2026

Enhancing Vocabulary Learning with Lexical Metrics and Personalized Text Difficulty Assessment

Esposito, P.

SSRN

2025

FAIRness of the Linguistic Linked Open Data Cloud: an Empirical Investigation

Pellegrino, M. A.; Esposito, P.; Tuozzo, G.

ACM Journal of Data and Information Quality

2024

Broaden Your Horizon! Play with Semantics via a Knowledge Graph-Based Approach

Esposito, P.; Mazzone, C.; Pellegrino, M.; Scarano, V.

Proceedings of the 16th International Conference on Computer Supported Education (CSEDU 2024), SciTePress, pp. 380–387

2024

The Linguistic Linked Open Data through the Linguists’ Lens

Esposito, P.

DQMLKG 2024: Data Quality meets Machine Learning and Knowledge Graphs

2024

The Linguistic Linked Open Data Cloud: Phenomenal Cosmic Powers... Itty Bitty Quality Space!

Esposito, P.; Pellegrino, M. A.; Scarano, V.; Tuozzo, G.

Proceedings of the International Semantic Web Conference (ISWC 2024)

2024

Empowering Data Literacy Among High School Learners: Insights from a Linked Open Data Workshop

Antelmi, A.; Esposito, P.

MIS4TEL 2024 Proceedings, Springer, Vol. 2, pp. 204–215

Conferences & Talks

14 entries

2026

Cross Linguistic Passages of Meaning: Toward a Tailor-Made Digital System for Word-Sense, Cognates and Translation Pedagogy

Passaggi di senso: traduzioni e linguaggi oltre i confini · Fisciano, Italy

talk

2025

Recalibrating Lexical Development: Spatial Engagement in Digital Language Learning with Appìl

32nd AIA Conference – HUMAN, HUMANE, HUMANITIES · Torino, Italy

talk

2025

What’s the Frequency? Evaluating lexical frequency measures over phonological and morphological effects

UK Language Variation and Change 15 · Lancaster, UK

poster

2025

Shifting Cognitive Spaces in Computer-Assisted Language Learning

Underground Imaginaries 2025 · Napoli, Italy

talk

2024

Empowering Learners: A Tool for Sustainable Language Acquisition

Enhancing Sustainability Conference · Napoli, Italy

talk

2024

Appìl: Enhancing Access to English as a Lingua Franca through Adaptive Learning

ELF Communication Today

talk

2024

The Dual Role of AI in Second Language Learning: Exploring Applications and Addressing Biases

Shifting Boundaries: AI and Human Interactions Redefining Reality · Napoli, Italy

talk

2024

What kind of frequency measures best explain variation in a purely phonological variable

ICLaVe12 · Vienna, Austria

talk

2024

Empowering Data Literacy Among High School Learners

MIS4TEL 2024 · Salamanca, Spain

talk

2024

The Linguistic Linked Open Data through the Linguists' Lens

ESWC 2024 Workshop (DQMLKG) · Crete, Greece

talk

2024

Broaden Your Horizon! Play with Semantics via a Knowledge Graph-Based Approach

CSEDU 2024 · Angers, France

talk

2023

The role of Linguistic Linked Open Data for the development of Appìl

New Trends in English Language Teaching · Chieti–Pescara, Italy

talk

2023

Lexical Frequency Effects on Language Variation

ICHLL · Fisciano, Italy

talk

2023

Appìl – an interface for the improvement of lexical skills

Lectures on Computational Linguistics · Pisa, Italy

poster

Contact

Let's build something together

Whether it's a research collaboration, a production NLP system, or just a conversation about language and AI, I'm always happy to connect.

espositopasqualeb@gmail.com

View Profile →

GitHub

View Profile →

Zenodo

View Publications →

Open to opportunities

Available for freelance, research & full-time roles.

Send me an email

Support

A free platform for those who dream in more than one language

Appìl is an independent, open-access language learning platform built to make vocabulary acquisition personalised, effective, and freely available to everyone — no paywalls, no subscriptions. If you believe in accessible language education, you can help keep it alive and growing.

Buy Me a Coffee

Support the research and development behind Appìl with a small contribution. Every coffee fuels late-night debugging sessions and new linguistic features.

Support on BMC →

GoFundMe Campaign

Help us reignite free linguistic learning. This campaign funds server costs, dataset licensing, and the time needed to keep Appìl open and accessible to all learners.

Contribute on GoFundMe →

Every contribution, big or small, keeps language learning free for everyone.