PhD Thesis · University of Saskatchewan · Saikat Mondal

Advancing Developer Knowledge Quality in Human-AI Ecosystems

A lifecycle-driven thesis on engineering complete, reproducible, evolvable, and trustworthy developer knowledge in the age of Stack Overflow, LLMs, and human-AI software engineering.

Explore the Framework View Replication Packages

6Interrelated Studies

4Lifecycle Pillars

ML + LLMQuality Engineering

Human-AIKnowledge Ecosystem

Thesis Summary

Modern software development operates within a hybrid human-AI ecosystem, where developer knowledge is continuously created, refined, and reused through platforms such as Stack Overflow and AI-assisted tools. However, this ecosystem suffers from persistent quality deficiencies: incomplete problem descriptions, irreproducible issues, inconsistent evolution through edits, and vulnerable code snippets.

This thesis advances the central idea that developer knowledge quality is not merely an emergent outcome of community interaction. Instead, it can be systematically engineered across the knowledge lifecycle. The thesis proposes an AI-augmented knowledge quality framework that combines machine-learning-based detection and prediction with LLM-based generation, verification, and repair.

AI-Augmented Knowledge Quality Framework

The thesis organizes developer knowledge quality around four lifecycle pillars. Each pillar targets a fundamental failure point in human-AI developer knowledge ecosystems.

Formation

Ensure problem descriptions contain the required context and code before submission.

Reproducibility

Estimate whether reported programming issues can be reproduced reliably.

Evolution

Guide collaborative editing and evaluate whether accepted edits truly improve quality.

Trust

Detect and repair vulnerabilities in real-world, noisy, and non-parsable code snippets.

Study 1 · Knowledge Formation

Improving Knowledge Completeness: Identifying Questions Requiring Code Snippets

This study investigates when Stack Overflow questions require code snippets and what happens when such required code is missing. It develops machine learning models to detect questions that need code and introduces CSChecker, a real-time support tool for improving question completeness.

Core Contribution

Detects missing required code during question formulation using text-based ML features.

Recognition

Dummy recognition text: Award / venue / distinction will be added here.

Replication Package

Replace with Study 1 replication package link

Study 2 · Knowledge Formation

GENCNIPPET: Automated Generation of Contextual Code Snippets

This study introduces GENCNIPPET, an LLM-powered approach for generating representative code snippets for programming questions that need code. It evaluates foundation and fine-tuned models and motivates practical assistance for users who struggle to construct minimal examples.

Core Contribution

Generates contextual example code to improve question completeness and clarity.

Recognition

Dummy recognition text: MSR / award / invited extension placeholder.

Replication Package

Replace with Study 2 replication package link

Study 3 · Reproducibility

Improving Knowledge Reproducibility: Understanding and Estimating Issue Reproduction

This study examines why programming issues reported in Stack Overflow questions fail to reproduce. It identifies reproducibility challenges, validates them with practitioners, and develops predictive models for estimating whether a reported issue is reproducible.

Core Contribution

Moves beyond code executability to issue-level reproducibility estimation.

Recognition

Dummy recognition text: Journal publication / best paper / award placeholder.

Replication Package

Replace with Study 3 replication package link

Study 4 · Knowledge Evolution

EditEx: Predicting and Preventing Low-Quality Edits

This study introduces EditEx, a machine-learning-based framework that predicts potentially rejected edits and explains likely rejection reasons. It supports proactive intervention before low-quality edits enter the collaborative knowledge evolution process.

Core Contribution

Transforms edit moderation from reactive rejection to proactive quality guidance.

Recognition

Dummy recognition text: EMSE / ICSE Journal First / award placeholder.

Replication Package

Replace with Study 4 replication package link

Study 5 · Knowledge Evolution

Do Accepted Edits Improve Stack Overflow Answers?

This study evaluates whether accepted edits actually improve answer quality across multiple dimensions, including relevance, usability, complexity, maintainability, vulnerability, performance, and readability. It challenges the assumption that accepted edits always improve intrinsic knowledge quality.

Core Contribution

Separates socially accepted edits from measurable multi-dimensional quality improvement.

Recognition

Dummy recognition text: ICSME / distinguished recognition / journal invitation placeholder.

Replication Package

Replace with Study 5 replication package link

Study 6 · Trustworthiness

SafeSnippet: Detecting and Repairing Vulnerabilities in Community Code

This study introduces SafeSnippet, an LLM-powered framework for detecting and repairing CWE vulnerabilities in both parsable and non-parsable code snippets. It expands trustworthiness support beyond traditional static analysis and addresses security risks in real-world developer knowledge.

Core Contribution

Uses LLMs and retrieval-augmented support to verify and repair vulnerable snippets.

Recognition

Dummy recognition text: Security impact / award / artifact placeholder.

Replication Package

Replace with Study 6 replication package link

Replication Packages

Replace the dummy links below with the final GitHub, Zenodo, or institutional repository links.

Study 1 · Missing Code Detection — CSChecker Study 2 · GENCNIPPET Study 3 · Reproducibility Estimation Study 4 · EditEx Study 5 · Accepted Edit Impact Study 6 · SafeSnippet