Recent research on large language model (LLM) applications is increasingly focused on enhancing their utility across diverse commercial domains, from software development to education and maritime navigation. Innovations like KLong are pushing the boundaries of LLM capabilities by enabling effective handling of long-horizon tasks, which could streamline complex project management and research processes. Meanwhile, frameworks such as ShipTraj-R1 are reimagining trajectory prediction as a text generation challenge, potentially revolutionizing maritime safety and logistics. In the realm of software engineering, LLMs are being tested for their ability to transform user feedback from app reviews into actionable requirements, thus improving product development cycles. However, challenges remain, particularly in aligning LLM outputs with human expectations, as seen in studies revealing discrepancies in grading essays and generating user stories. As the field evolves, the emphasis is shifting toward integrating human-centered evaluations and adaptive learning strategies to mitigate issues like hallucinations and enhance reliability in real-world applications.
This paper introduces KLong, an open-source LLM agent trained to solve extremely long-horizon tasks. The principle is to first cold-start the model via trajectory-splitting SFT, then scale it via prog...
Current aligned language models exhibit a dual failure mode we term the Evasive Servant: they sycophantically validate flawed user beliefs while deflecting responsibility with boilerplate disclaimers....
Despite growing interest in using Large Language Models (LLMs) for educational assessment, it remains unclear how closely they align with human scoring. We present a systematic evaluation of instructi...
Recent advancements in reinforcement fine-tuning have significantly improved the reasoning ability of large language models (LLMs). In particular, methods such as group relative policy optimization (G...
Predicting narrative similarity can be understood as an inherently interpretive task: different, equally valid readings of the same text can produce divergent interpretations and thus different simila...
App store reviews provide a constant flow of real user feedback that can help improve software requirements. However, these reviews are often messy, informal, and difficult to analyze manually at scal...
Large language model (LLM) coding agents can generate working code, but their solutions often accumulate complexity, duplication, and architectural debt. Human developers address such issues through r...
Advanced reasoning capabilities in Large Language Models (LLMs) have led to more frequent hallucinations; yet most mitigation work focuses on open-source models for post-hoc detection and parameter ed...
Large Vision-Language Models (LVLMs) frequently suffer from severe hallucination issues. Existing mitigation strategies predominantly rely on isolated, single-step states to enhance visual focus or su...
Integrating Large Language Models (LLMs) into business process management tools promises to democratize Business Process Model and Notation (BPMN) modeling for non-experts. While automated frameworks ...