Computational Linguist · NLP Engineer · AI Systems Builder

PhDlinguistandfull-stackNLPengineer.Iturnlinguisticstructureintodeployablesystemsandlanguageplatformsbuiltforthemodernweb.

Pasquale Esposito

About

Linguistics meets machine intelligence

PhD-trained Computational Linguist and NLP Engineer with a track record of delivering production-grade AI systems across research and industry environments. I combine deep linguistic expertise: morphology, syntax, discourse analysis, lexicology, with hands-on LLM engineering, bridging a gap that most candidates on either side of it cannot.

My systems span the full pipeline: transformer model fine-tuning, retrieval-augmented generation (RAG), knowledge graph construction, and large-scale corpus processing. The result is NLP infrastructure that is linguistically informed, interpretable, and built to scale.

PhD

Linguistics

5+

Years in NLP & AI

3

Production AI Systems

6+

Publications

Experience

Research Fellow – NLP Engineer

Department of Industrial Engineering, University of Salerno

Jan 2026 – Present
  • Designed and trained a custom NER model with a domain-specific label set for Cultural Heritage entities, enabling structured extraction from heterogeneous unstructured textual sources.
  • Built an ontology-driven RAG chatbot by vectorising library records and transforming unstructured texts into a semantic knowledge base integrated with LLMs for efficient information retrieval.
  • Implemented an iterative OCR/character recognition system using active learning strategies to achieve reliable performance with very few annotated records.
  • Developed a semantic SEO optimiser based on ontological data for vertical semantic enrichment and improved discoverability.
  • Fine-tuned transformer-based models (BERT, GPT); designed data pipelines for preprocessing and annotation; integrated vector databases with LLMs; and conducted systematic model evaluation and benchmarking.

PhD Researcher

University of Salerno

Nov 2022 – Mar 2026
  • Doctoral thesis: Appìl — An Interface for the Improvement of Lexical Skills. Designed and developed a full-stack adaptive language learning platform integrating computational linguistics, cognitive science, and modern web technologies.
  • Built a personalised difficulty calibration system using spaced repetition and Item Response Theory (IRT), achieving a 33% improvement in vocabulary acquisition rates among active users.
  • Developed a lexical transparency algorithm to quantify cross-linguistic morphological similarity between Italian and English, processing 50,000+ word pairs.
  • Integrated Linguistic Linked Open Data (LLOD) to deliver authentic, domain-specific content; designed a real-time challenge adjustment mechanism grounded in Dual Process Theory for 2,500+ active learners.
  • Published peer-reviewed research in computational discourse analysis, cross-linguistic modelling, and AI-driven language learning. Final grade: Excellent cum laude.

Tutoring & Teaching Support

University of Salerno

Oct 2023 – Sep 2024
  • Delivered tutoring and teaching support for first-year students in linguistic education, including preparatory and remedial activities.
  • Improved student comprehension, engagement, and academic performance across language and linguistics courses.

Career & Educational Orientation Advisor

CAOT – University of Salerno

Nov 2022 – May 2024
  • Organised workshops and career guidance sessions to support informed academic and professional decision-making.
  • Assessed participants' skills and educational goals; collaborated with teachers and industry professionals.
  • Facilitated the development of key soft skills including teamwork, communication, and problem-solving.

Skills & Projects

Selected work

A selection of systems I've designed and built — spanning semantic retrieval, entity extraction, and adaptive learning.

Core Skills

LLM Engineering

  • Fine-tuning (LoRA, RLHF, instruction tuning)
  • Retrieval-Augmented Generation (RAG)
  • Prompt engineering
  • Evaluation pipelines & LLMOps
  • Vector databases
  • Cloud deployment (AWS, Azure, GCP)

Natural Language Processing

  • Named Entity Recognition (NER)
  • Relation & information extraction
  • Dependency parsing & coreference resolution
  • Semantic role labelling
  • Large-scale text processing
  • Corpus construction & annotation

Machine Learning

  • Transformer architectures (BERT, GPT, T5, LLaMA)
  • Neural network training & evaluation
  • Hyperparameter tuning
  • MLOps workflows
  • PyTorch
  • Active learning

Knowledge Engineering

  • Ontology design (RDF/OWL)
  • Knowledge graphs
  • SPARQL
  • Semantic Web technologies
  • Linguistic Linked Open Data (LLOD)
  • Structured knowledge representation

Software Engineering

  • Python (NumPy, pandas, PyTorch, HuggingFace, spaCy, NLTK)
  • Java
  • R
  • REST API development
  • Docker & CI/CD pipelines
  • Git & Agile methodology

Linguistics

  • Morphological & syntactic analysis
  • Discourse structure modelling
  • Lexicology & cross-linguistic modelling
  • Multilingual NLP & cross-lingual transfer
  • Computational pragmatics
  • Vocabulary & lexical frequency modelling

Tech Stack

Python
Java
R
Docker
Git
AWS
Linux

Featured Projects

Ontology-driven RAG Chatbot

Semantic retrieval system combining knowledge graphs and LLMs for context-aware, structured information access in the Cultural Heritage domain.

Architected a hybrid retrieval pipeline that queries an OWL ontology via SPARQL to ground LLM responses in structured domain knowledge, dramatically reducing hallucinations in specialised Q&A tasks. Integrated vector databases and transformer models for lightweight, scalable deployment.

RAGKnowledge GraphsLLMsSPARQLOWLCultural Heritage

Custom NER System

Domain-specific named entity extraction pipeline built for the cultural heritage domain, achieving state-of-the-art F1 on specialised entity types.

Trained and fine-tuned a transformer-based NER model on annotated cultural heritage corpora. Designed a domain-specific label set covering artworks, institutions, historical periods, and persons. Coupled with an active-learning OCR pipeline to minimise annotation effort on scarce data.

NERNLPCultural HeritageTransformersFine-tuningActive Learning

Appìl Platform

Adaptive language learning system powered by NLP-driven personalisation, IRT-based difficulty calibration, and Linguistic Linked Open Data integration.

Designed the full NLP backbone of an adaptive learning app: lexical transparency algorithm processing 50,000+ word pairs, real-time challenge adjustment via Dual Process Theory, and LLOD integration for authentic domain-specific content. Achieved a 33% improvement in vocabulary acquisition for 2,500+ active learners.

EdTechNLPIRTLLODPersonalisationPythonFull-Stack

Semantic SEO Optimiser

Ontology-based tool for vertical semantic enrichment of web content, improving discoverability and structured knowledge representation.

Developed a pipeline that leverages domain ontologies to semantically enrich content metadata, aligning it with structured knowledge bases and improving search engine visibility through entity-level annotation and schema markup generation.

Semantic WebOntologySEONLPKnowledge Graphs

PhD Thesis

Appìl: An Interface for the Improvement of Lexical Skills

Designed and developed a full-stack adaptive language learning platform integrating computational linguistics, cognitive science, and modern web technologies.

The ongoing evolution of digital education has opened new possibilities for second language learning, yet many existing platforms remain constrained by commercial interests and limited pedagogical depth. Despite the proliferation of language learning applications, few are explicitly grounded in robust second language acquisition (SLA) theory (Ellis, 2008; Krashen, 1982; Long, 1996) or capable of offering authentic, learner-driven adaptation. This research emerges within that gap, responding to the growing need for open, scientifically informed, and pedagogically coherent digital tools that can personalize instruction while preserving methodological rigor. This doctoral study explores the intersection of second language acquisition (SLA), digital pedagogy, and computational linguistics through the design, development, and evaluation of Appìl, an adaptive, web-based language learning platform created by me, under supervision, during my doctoral studies, and situated within the broader field of intelligent computer-assisted language learning (ICALL) (Heift & Schulze, 2007). It integrates theoretical insights from SLA, psychology, and cognitive science (Evans, 2008; Kahneman, 2011; Vygotsky, 1986) with technological innovations drawn from natural language processing (NLP) and linguistic linked open data (LLOD) (Chiarcos et al., 2013). The overarching goal is to demonstrate how open, data-driven systems can enhance learner autonomy and motivation (Deci & Ryan, 2014; Zimmerman, 2002) while maintaining instructional coherence within a structured learning framework. Adopting a mixed-methods approach, the present research project combines theoretical synthesis, technical implementation, and empirical evaluation. It includes a systematic review of SLA theories and digital learning methodologies, the computational design of the Appìl platform, and a field-based evaluation of its pedagogical effectiveness. The system’s architecture merges CEFR-aligned progression (Council of Europe, 2020) with NLP-enabled personalization, providing learners with individualized flashcards, adaptive exercises, and real-time feedback tailored to their evolving proficiency levels. Findings reveal that personalization grounded in SLA theory, rather than purely algorithmic adaptation, significantly enhances learner motivation, vocabulary retention, and engagement. Learners responded positively to flexible and user-driven content pathways, although sustaining participation and addressing digital literacy barriers remained ongoing challenges. The development process, conducted by a single researcher, underscored both the potential and the practical constraints of independent, interdisciplinary platform creation, while also highlighting the centrality of human engagement throughout design and implementation.

Research

Publications & Talks

Peer-reviewed research and international conference contributions in computational linguistics, NLP, knowledge graphs, and AI-driven language learning.

Publications

8 items
Lexical Variation and Knowledge Construction across Historical, Methodological, and Cultural Ecologies

Ciancia, C.; Patrick, P. L.; Esposito, P.

Terminology and Lexicography Research and Practice, John Benjamins, pp. 179–196

FAIRness of the Linguistic Linked Open Data Cloud: an Empirical Investigation

Pellegrino, M. A.; Esposito, P.; Tuozzo, G.

ACM Journal of Data and Information Quality

Broaden Your Horizon! Play with Semantics via a Knowledge Graph-Based Approach

Esposito, P.; Mazzone, C.; Pellegrino, M.; Scarano, V.

Proceedings of the 16th International Conference on Computer Supported Education (CSEDU 2024), SciTePress, pp. 380–387

The Linguistic Linked Open Data through the Linguists’ Lens

Esposito, P.

DQMLKG 2024: Data Quality meets Machine Learning and Knowledge Graphs

The Linguistic Linked Open Data Cloud: Phenomenal Cosmic Powers... Itty Bitty Quality Space!

Esposito, P.; Pellegrino, M. A.; Scarano, V.; Tuozzo, G.

Proceedings of the International Semantic Web Conference (ISWC 2024)

Empowering Data Literacy Among High School Learners: Insights from a Linked Open Data Workshop

Antelmi, A.; Esposito, P.

MIS4TEL 2024 Proceedings, Springer, Vol. 2, pp. 204–215

Conferences & Talks

14 entries

Cross Linguistic Passages of Meaning: Toward a Tailor-Made Digital System for Word-Sense, Cognates and Translation Pedagogy

Passaggi di senso: traduzioni e linguaggi oltre i confini · Fisciano, Italy

talk

Recalibrating Lexical Development: Spatial Engagement in Digital Language Learning with Appìl

32nd AIA Conference – HUMAN, HUMANE, HUMANITIES · Torino, Italy

talk

What’s the Frequency? Evaluating lexical frequency measures over phonological and morphological effects

UK Language Variation and Change 15 · Lancaster, UK

poster

Shifting Cognitive Spaces in Computer-Assisted Language Learning

Underground Imaginaries 2025 · Napoli, Italy

talk

Empowering Learners: A Tool for Sustainable Language Acquisition

Enhancing Sustainability Conference · Napoli, Italy

talk

Appìl: Enhancing Access to English as a Lingua Franca through Adaptive Learning

ELF Communication Today

talk

The Dual Role of AI in Second Language Learning: Exploring Applications and Addressing Biases

Shifting Boundaries: AI and Human Interactions Redefining Reality · Napoli, Italy

talk

What kind of frequency measures best explain variation in a purely phonological variable

ICLaVe12 · Vienna, Austria

talk

Empowering Data Literacy Among High School Learners

MIS4TEL 2024 · Salamanca, Spain

talk

The Linguistic Linked Open Data through the Linguists' Lens

ESWC 2024 Workshop (DQMLKG) · Crete, Greece

talk

Broaden Your Horizon! Play with Semantics via a Knowledge Graph-Based Approach

CSEDU 2024 · Angers, France

talk

The role of Linguistic Linked Open Data for the development of Appìl

New Trends in English Language Teaching · Chieti–Pescara, Italy

talk

Lexical Frequency Effects on Language Variation

ICHLL · Fisciano, Italy

talk

Appìl – an interface for the improvement of lexical skills

Lectures on Computational Linguistics · Pisa, Italy

poster

Contact

Let's build something together

Whether it's a research collaboration, a production NLP system, or just a conversation about language and AI, I'm always happy to connect.

Open to opportunities

Available for freelance, research & full-time roles.

Send me an email

Support

A free platform for those who dream in more than one language

Appìl is an independent, open-access language learning platform built to make vocabulary acquisition personalised, effective, and freely available to everyone — no paywalls, no subscriptions. If you believe in accessible language education, you can help keep it alive and growing.

Every contribution, big or small, keeps language learning free for everyone.