The A* Devlog: Building K.R. Mangalam University's Competitive Programming Hub from Scratch

A developer's log on building A*, the custom platform designed to centralize competitive programming contests and practice for K.R. Mangalam University. Details the tech stack choices, core feature implementation (contests, judging, leaderboards), and integration of AI tools.

Om Mishra

06 April 2025

The A* Devlog: Building K.R. Mangalam University's Competitive Programming Hub from Scratch

It feels a bit surreal looking back now. What started as an idea – a nagging feeling that we, the students at K.R. Mangalam University, needed a better, more unified space for coding competitions – has grown into A*. Today, A* isn't just code; it's a platform actively used by over 200 K.R. Mangalam University students and has already been the battleground for more than two official university coding competitions.

This wasn't an overnight project. It's been a journey filled with late nights, tough technical decisions, frustrating bugs, and moments of "it actually works!" relief. I wanted to share some of that journey, the technical nuts and bolts, the challenges faced, and the reasoning behind how A* came to be.

The Spark: Why Reinvent the Wheel?

Let's be honest, there are plenty of great online judges out there. But we lacked a centralized K.R. Mangalam University hub. Contests were run ad-hoc, tracking participation was manual, and there wasn't a persistent place to practice problems specifically relevant to our curriculum or interests, or even just to see how you ranked against your K.R. Mangalam University peers over time.

The goal wasn't necessarily to build the best judge, but to build our judge. A platform tailored for K.R. Mangalam University, easy for organizers to manage, and seamless for students to participate in official events and casual practice.

Phase 1: Laying the Foundation - The Tech Stack Choices

Every project starts with foundational choices. I needed something fast to develop with, scalable enough for our initial needs, and flexible.

Backend: Python & Flask
- Why: Python's readability and vast ecosystem were major draws. Flask, being a microframework, gave me the freedom to structure the application exactly how I wanted without excessive boilerplate. It's fantastic for building APIs, which I knew would be the core of the platform. Libraries like pymongo, requests, flask-session, and python-dotenv integrated easily.
Database: MongoDB
- Why: This was a key decision. Competitive programming data felt inherently non-relational sometimes. User profiles might evolve, problem descriptions involve rich text (HTML), and contest leaderboards needed to store complex, nested information (user scores, problem status per user, submission attempts, best submission time/ID per problem). MongoDB's document model felt like a natural fit. Storing a contest leaderboard as a single document (or a sub-document) seemed more straightforward than managing multiple relational tables, especially for read-heavy operations like displaying standings. Schemaless flexibility was a bonus during early development.
Session Management: Redis
- Why: Speed. Session lookups happen on almost every authenticated request. Redis, being an in-memory key-value store, is lightning fast for this. Flask-Session provides excellent integration, making it easy to store session data server-side, keeping cookies minimal and secure.
Frontend: Jinja2, HTML, CSS, Vanilla JS
- Why: Keep it simple initially. Flask integrates tightly with Jinja2 for server-side templating. Standard HTML/CSS for structure and style. For client-side interactions (like submitting code, polling for results, simple UI updates), vanilla JavaScript felt sufficient to avoid pulling in a heavy frontend framework early on. For rich text, Flask-CKEditor provided an easy way to integrate a powerful WYSIWYG editor for problem descriptions and announcements.

Phase 2: Core Features & Technical Hurdles

With the stack decided, it was time to build the core functionality.

Authentication: Rolling our own auth is generally a bad idea. We decided early on to integrate with an existing OAuth2 provider (accounts.om-mishra.com in this case, but could be any standard provider). The flow involves: redirecting the user -> handling the callback with an authorization code -> exchanging the code server-side for user info -> using mongodb_client.users.update_one with upsert=True, $set for updating last_logged_in_at, and $setOnInsert to create the user record only if they don't exist (ensuring atomicity). The user's ID and role are then stored in the Redis-backed Flask session.
Problem Management: Problems needed structure. A MongoDB document for each problem stores: problem_id (UUID), problem_title, problem_description (HTML from CKEditor), problem_stdin, problem_stdout, problem_level, problem_tags (array), visibility flags (is_visible, is_part_of_competition), competition_id (if applicable), and problem_statistics (counters for submissions, accepts, etc.). Admins needed a simple interface to create/edit these.

The Judging Engine - The Elephant in the Room: This was the most critical technical challenge. Building a secure, multi-language, resource-limited code execution sandbox is hard. Doing it wrong opens massive security holes.
- The Decision: Outsource it. We chose Judge0 CE API. It handles sandboxing, multiple languages, time/memory limits, and provides a clean API.
- The Workflow:
  1. User submits code (language, source code) via an API endpoint (/api/v1/submissions).
  2. Rate Limiting: Implement a simple check against the submissions collection in MongoDB to prevent spamming (e.g., check created_at timestamp for the user's last submission).
  3. Backend Prep: The Flask backend receives the submission. It fetches the corresponding problem_stdin and problem_stdout from MongoDB. Crucially, stdin and expected_output must be Base64 encoded before sending to Judge0 when using the base64_encoded=true flag (recommended for handling special characters/newlines reliably).
  4. Judge0 Submission: Post the source_code, Base64 stdin, Base64 expected_output, and language_id (mapped from our language string like 'python' to Judge0's ID like 71) to Judge0's /submissions endpoint.
  5. Initial DB Record: If Judge0 accepts the submission (HTTP 201), it returns a token. Immediately, I insert a record into the submissions collection in MongoDB with submission_id (our internal UUID), judge0_submission_id (the token), user_id, problem_id, code, language, initial submission_status ("In Queue", code 0), timestamps, etc.
  6. Similarity Check: Before inserting, I added a check using difflib.SequenceMatcher. It iterates through existing submissions for the same problem in the database. If the submitted code's ratio against any existing code (from a different user) exceeds a threshold (e.g., 0.8), I set an is_similar: True flag on the new submission document. This is a basic heuristic, not foolproof plagiarism detection, but flags potentially suspicious submissions for review.
  7. Polling for Results: Judging isn't instant. The frontend uses JavaScript (fetch + setTimeout or setInterval) to periodically call our backend endpoint (/api/v1/submissions/<submission_id>).
  8. Backend Polling Judge0: This backend endpoint takes our submission_id, finds the corresponding judge0_submission_id (token) from MongoDB, and makes a GET request to Judge0's /submissions/<token> endpoint (with fields=status,stdout,stderr,time,memory).
  9. Processing Results: When Judge0 returns a final status (e.g., status ID 3 for "Accepted", 4 for "Wrong Answer", etc.), the backend updates the submission document in MongoDB with the final status, time, memory, etc.
  10. Test Case Comparison: For non-accepted statuses where stdout is available, I added logic to compare the returned stdout (after Base64 decoding if necessary) line-by-line with the problem_stdout stored in MongoDB. This calculates the "X/Y test cases passed" string.
  11. Updating Stats: On a final status, increment the relevant counters (total_submissions, total_accepted_submissions or total_rejected_submissions) in the corresponding problems document using MongoDB's atomic $inc operator.

Contest Logic: This required careful modeling in MongoDB.
- Contest Document: Stores contest_id, title, start/end times, linked problem_ids (e.g., contest_problems: { first: id1, second: id2, third: id3 }), description, and contest_statistics.
- Registration: A simple API endpoint (/api/v1/contest/register/<contest_id>) uses update_one on the contest document to $push the user_id into the contest_statistics.contest_participants array and $inc the contest_statistics.total_participants counter. Check if the user is already registered first.
- Live Leaderboard (add_competition_submission): This function is triggered after a submission is judged "Accepted" (or any final status for attempt tracking). It first checks: Is the problem part of a competition? Is the contest currently running? Is the user registered for the contest? If yes:
  - It updates the contest_statistics.contest_leaderboard field within the contest document. This field is structured as a dictionary (object) where keys are user_ids.
  - contest_leaderboard.<user_id>.score: Calculated based on problem points, penalties (e.g., -2 for each incorrect submission before acceptance, capped at 5), and potentially time bonuses (I added a small bonus based on execution time).
  - contest_leaderboard.<user_id>.problems.<problem_id>.has_accepted_submission: Boolean flag.
  - contest_leaderboard.<user_id>.problems.<problem_id>.number_of_incorrect_submissions: Counter, incremented only if has_accepted_submission is false.
  - contest_leaderboard.<user_id>.problems.<problem_id>.submissions_id: Stores the submission_id of the first or best scoring accepted submission for that problem.
  - MongoDB's atomic operators are vital here. Updating a specific user's score or problem status within the nested leaderboard structure uses dot notation (e.g., $set: { "contest_statistics.contest_leaderboard.user123.score": new_score }). This avoids race conditions if multiple submissions from different users finish judging concurrently. I initially used an array for the leaderboard but quickly switched to a dictionary keyed by user_id for easier, atomic updates.

Phase 3: Enter the AI - Enhancing with Gemini

With the core platform stable, I wanted to explore how AI could add value without being gimmicky. Google's Gemini API seemed powerful and relatively easy to integrate via simple HTTPS requests.

AI Problem Generation: Creating unique, well-structured problems is time-consuming for admins.
- The Idea: Use AI to draft problems based on a difficulty level.
- Implementation (/api/v1/ai/create-problem):
  - The admin selects 'Easy', 'Medium', or 'Hard'.
  - The backend constructs a very detailed prompt for the Gemini API. This includes: requested difficulty, a list of existing problem titles (to encourage uniqueness), strict instructions on the output format (requesting JSON with specific keys: problem_title, problem_description, problem_stdin, problem_stdout, problem_level, problem_tags, solution), instructions to use HTML for the description, requirements for multiple examples and diverse test cases, and critically, a request for a correct Python solution. I set response_mime_type to application/json.
  - The Validation Loop: This was crucial. The AI might hallucinate or provide a solution that doesn't match its own test cases. When Gemini returns the JSON:
    1. Parse the JSON data.
    2. Extract the generated problem_stdin and the Python solution code.
    3. Execute the AI's solution locally: Use Python's subprocess.run(['python3', 'temp_solution.py'], input=problem_stdin, capture_output=True, text=True, check=True, timeout=10). Write the solution code to temp_solution.py first.
    4. Capture the stdout from this actual execution.
    5. Replace the problem_stdout in the parsed JSON data with this actual stdout. This ensures the provided test cases are solvable by the provided solution.
    6. Clean up the temporary Python file.
  - The validated (and corrected) problem data is then sent back to the frontend to pre-fill the "Create Problem" form for the admin to review and save.
AI Contest Reporting: Leaderboards show scores, but not the story.
- The Idea: Generate a narrative summary and improvement suggestions after a contest.
- Implementation (/contest-results/<contest_id>):
  - Triggered when an admin views the results page of an ended contest if the contest_summary and contest_improvement fields are missing in the contest's MongoDB document.
  - Gather context: Contest details, final leaderboard data, problem details (titles, difficulty), aggregated submission stats (pass rates).
  - Craft a prompt for Gemini asking for a detailed summary (e.g., 500+ words) and improvement suggestions (e.g., 300+ words) based on the provided data, specifically requesting HTML output within a JSON structure ({ "summary": "<h1>...", "improvement": "<h1>..." }).
  - Parse the Gemini response (using json.loads or falling back to regex if parsing fails unexpectedly).
  - Store the generated HTML strings in the contest document using update_one. Render these directly on the results page. Added retry logic in case the initial generation was too short.
AI User Summaries: A small touch for user profiles.
- Implementation (User Profile Page): If the user_summary field is missing for a user, send a simple prompt to Gemini: "Generate a concise, two-sentence plain text summary of a user based on their display name [{name}] and submissions [{basic stats like count, acceptance rate}]. Return ONLY the two sentences..." Requested text/plain output. Store the result in the user document.

Reflections & Ongoing Challenges

Judge0 Quirks: Handling occasional timeouts, understanding different status codes, and managing API key rate limits (using a pool of keys randomly selected helps) required iteration.
MongoDB Schema Design: While flexible, deciding on the optimal structure for the leaderboard and efficiently querying nested data needed refinement. Proper indexing (user_id, problem_id in submissions; contest_id in contests/problems) is non-negotiable for performance.
AI Reliability: Prompt engineering is an art. Getting consistent JSON, desired length, and relevant content from Gemini takes trial and error. The validation step for problem generation is essential. Sometimes the AI output needs manual editing.
Code Similarity: difflib is basic. It catches simple copy-paste but struggles with refactored code or logic reimplementation. True plagiarism detection is much harder.
Scalability: For 200 users, the current setup works well. Scaling to thousands would require more aggressive caching (e.g., Redis for global leaderboards, maybe even contest leaderboards during active contests), optimizing database queries further, and potentially scaling the Flask app horizontally with more workers (Gunicorn/Waitress).

Where We Are Now & What's Next

A* is live, functional, and actively used for K.R. Mangalam University competitions. The core features are stable. The AI integrations, while needing occasional oversight, add genuine value for admins and users.

The journey isn't over. Future plans include UI/UX refinements based on student feedback, exploring team-based competitions, adding more sophisticated analytics, and continually expanding the practice problem library.

Building A* has been an incredible learning experience, blending web development, database design, external API integration, and even a dip into the world of AI. It's rewarding to see something you built being used by your peers, even with all the bumps along the development road. Hopefully, this deep dive gives some insight into the process behind the platform.

Comments

No comments yet. Be the first to comment!