Automated leaderboard system for hackathon evaluation using large language models

Bowen Li, Bohan Cheng, Patrick D. Taylor, Dale A. Osborne, Fengling Han, Robert Shen, Iqbal Gondal

Article ID: 3166
Vol 3, Issue 1, 2025
DOI: https://doi.org/10.54517/cte3166
Received: 18 December 2024; Accepted: 17 February 2025; Available online: 24 February 2025; Issue release: 31 March 2025


Download PDF

Abstract

Evaluating large numbers of hackathon submissions quickly, fairly, and at scale is a persistent challenge. Existing automated grading systems often struggle with bias, limited scalability, and a lack of transparency. In this paper, we present a novel hybrid evaluation framework that leverages large language models (LLMs) and a weighted scoring mechanism to address these issues. Our approach classifies hackathon submissions using LLMs, converts Jupyter notebooks to markdown for consistent analysis, and integrates multiple evaluation factors—from technical quality to video presentations—into a single, balanced score. Through dynamic prompt engineering and iterative refinement against manually benchmarked evaluations, we mitigate prompt design biases and ensure stable, fair outcomes. We validate our method in a multi-campus GenAI and Cybersecurity hackathon, demonstrating improved scalability, reduced evaluator workload, and transparent feedback. Our results highlight the potential of hybrid AI-driven frameworks to enhance fairness, adaptability, and efficiency in large-scale educational and competitive environments.

Keywords

artificial intelligence; LLM-driven assessment; prompt engineering


References

Refbacks

  • There are currently no refbacks.


Copyright (c) 2025 Author(s)

License URL: https://creativecommons.org/licenses/by/4.0/