Advancing Developer Knowledge Quality in Human-AI Ecosystems
A lifecycle-driven thesis on engineering complete, reproducible, evolvable, and trustworthy developer knowledge in the age of Stack Overflow, LLMs, and human-AI software engineering.
Thesis Summary
Modern software development operates within a hybrid human-AI ecosystem, where developer knowledge is continuously created, refined, and reused through platforms such as Stack Overflow and AI-assisted tools. However, this ecosystem suffers from persistent quality deficiencies: incomplete problem descriptions, irreproducible issues, inconsistent evolution through edits, and vulnerable code snippets.
This thesis advances the central idea that developer knowledge quality is not merely an emergent outcome of community interaction. Instead, it can be systematically engineered across the knowledge lifecycle. The thesis proposes an AI-augmented knowledge quality framework that combines machine-learning-based detection and prediction with LLM-based generation, verification, and repair.
AI-Augmented Knowledge Quality Framework
The thesis organizes developer knowledge quality around four lifecycle pillars. Each pillar targets a fundamental failure point in human-AI developer knowledge ecosystems.
Formation
Ensure problem descriptions contain the required context and code before submission.
Reproducibility
Estimate whether reported programming issues can be reproduced reliably.
Evolution
Guide collaborative editing and evaluate whether accepted edits truly improve quality.
Trust
Detect and repair vulnerabilities in real-world, noisy, and non-parsable code snippets.
Improving Knowledge Completeness: Identifying Questions Requiring Code Snippets
This study investigates when Stack Overflow questions require code snippets and what happens when such required code is missing. It develops machine learning models to detect questions that need code and introduces CSChecker, a real-time support tool for improving question completeness.
Core Contribution
Detects missing required code during question formulation using text-based ML features.
Recognition
Dummy recognition text: Award / venue / distinction will be added here.
Replication Package
Replace with Study 1 replication package linkGENCNIPPET: Automated Generation of Contextual Code Snippets
This study introduces GENCNIPPET, an LLM-powered approach for generating representative code snippets for programming questions that need code. It evaluates foundation and fine-tuned models and motivates practical assistance for users who struggle to construct minimal examples.
Core Contribution
Generates contextual example code to improve question completeness and clarity.
Recognition
Dummy recognition text: MSR / award / invited extension placeholder.
Replication Package
Replace with Study 2 replication package linkImproving Knowledge Reproducibility: Understanding and Estimating Issue Reproduction
This study examines why programming issues reported in Stack Overflow questions fail to reproduce. It identifies reproducibility challenges, validates them with practitioners, and develops predictive models for estimating whether a reported issue is reproducible.
Core Contribution
Moves beyond code executability to issue-level reproducibility estimation.
Recognition
Dummy recognition text: Journal publication / best paper / award placeholder.
Replication Package
Replace with Study 3 replication package linkEditEx: Predicting and Preventing Low-Quality Edits
This study introduces EditEx, a machine-learning-based framework that predicts potentially rejected edits and explains likely rejection reasons. It supports proactive intervention before low-quality edits enter the collaborative knowledge evolution process.
Core Contribution
Transforms edit moderation from reactive rejection to proactive quality guidance.
Recognition
Dummy recognition text: EMSE / ICSE Journal First / award placeholder.
Replication Package
Replace with Study 4 replication package linkDo Accepted Edits Improve Stack Overflow Answers?
This study evaluates whether accepted edits actually improve answer quality across multiple dimensions, including relevance, usability, complexity, maintainability, vulnerability, performance, and readability. It challenges the assumption that accepted edits always improve intrinsic knowledge quality.
Core Contribution
Separates socially accepted edits from measurable multi-dimensional quality improvement.
Recognition
Dummy recognition text: ICSME / distinguished recognition / journal invitation placeholder.
Replication Package
Replace with Study 5 replication package linkSafeSnippet: Detecting and Repairing Vulnerabilities in Community Code
This study introduces SafeSnippet, an LLM-powered framework for detecting and repairing CWE vulnerabilities in both parsable and non-parsable code snippets. It expands trustworthiness support beyond traditional static analysis and addresses security risks in real-world developer knowledge.
Core Contribution
Uses LLMs and retrieval-augmented support to verify and repair vulnerable snippets.
Recognition
Dummy recognition text: Security impact / award / artifact placeholder.
Replication Package
Replace with Study 6 replication package linkReplication Packages
Replace the dummy links below with the final GitHub, Zenodo, or institutional repository links.