GSTN is inviting submissions for Analytics Hackathon on Developing a Predictive Model in GST for the year 2024. The last date of application is September 26.

About

The purpose of this Hackathon is to engage Indian students, researchers, and innovators in developing advanced, data-driven AI and ML solutions based on given data set. Participants will have access to a comprehensive data set containing approximately 900,000 records, each with around 21 attributes and target variables. This data is anonymized, meticulously labeled, and includes training, testing, and a non-validated subset reserved specifically for final evaluations by the GSTN.

Participants are encouraged to use this dataset to design and implement innovative artificial intelligence (AI) and machine learning (ML) algorithms to tackle the stated challenge.

Additionally, this initiative aims to foster collaboration between academia and industry professionals, driving the development of effective and insightful solutions that strengthen the GST analytics framework.

Who can participate?

Indian students or researchers associated with educational institutions, or working professionals associated with Indian startups and companies can participate in the Hackathon. The participant must be the citizen of India.

Structure of the hackathon

  • The Hackathon would be organised as an online event with processes for registration of participants, accessing the datasets to be utilized for each problem statement, and submission of developed prototypes. There would be an offline event with the shortlist participants for the finale/second round.
  • Indian students or researchers associated with educational institutions, or working professionals associated with Indian startups and companies can participate in the Hackathon. The participant must be the citizen of India.
  • The participants are expected to form teams of up to five members including at least one team lead. A participant may only register as a member of a single team.
  • The Hackathon would take place over 45 days from the start of registration to the final date for submission of developed prototypes.
  • Participants would receive a dataset containing 9 lakh records with around 21 attributes each. The data is anonymized and labelled, including trained, validated, and non-validated datasets.
  • Before submission of solution prototype, participants have to upload their code in GIT (https://www.github.com) repository and an optional demo/product video on YouTube.
  • For online submissions, following required/optional fields are to be shared for evaluation:
    • Idea/Concept
    • Project Description
    • Source Code URL (github.com)
    • Video URL
    • GitHub Unique Source Code Checksum – Steps to create checksum are mentioned in later steps.
    • Project Report
  • The evaluation process of the Hackathon would be overseen by a distinguished panel of jury members comprising experts from the fields of machine learning, data science, and tax administration. The jury would rigorously assess each submission based on predefined criteria to ensure a fair and comprehensive evaluation.

Problem Statement

Given a dataset D, which consists of:

  • Dtrain A matrix of dimension R(m×n) representing the training data.
  • Dtest A matrix of dimension R(m1×n) representing the test data.
  • We have also provided corresponding target variable Ytrain matrix dimension of R(m×1) and 
  • Ytest   with matrix dimension of R(m1×1).
  • The objective is to construct a predictive model Fθ(X)→ Ypred that accurately estimates the target variable Y{i} for new, unseen inputs X{i}

Steps:

  1. Model Construction:

Define a predictive function Fθ(X) parameterized by θ that maps input features X to predicted outputs Ypred.

The model Fθ(X) should be designed to capture the relationship between the input features and the target variable effectively.

      2. Training:

Optimize the model parameters θ by minimizing a loss function L(Y,Fθ(X)) using the training data Dtrain

Consider incorporating feature transformations, feature engineering, or feature selection to enhance the model’s predictive performance.

      3. Testing:          

Apply the learned model Fθ *(X) (with optimized parameters 𝜃∗) to the test data Dtest to generate predictions Ypred for each input Xj{X1,X2,…,Xm1}.

      4. Performance Optimization:

            Evaluate the model’s performance by calculating accuracy or other relevant metrics M on the test predictions Ypred_test.

Refine the model by iteratively adjusting θ or modifying  Fθ(Xto improve performance on the chosen evaluation metrics M.

       5. Submission:

Present the predicted outputs Ypred_test along with a detailed report that includes:

    • The modeling approach employed(Properly commented Codes, supporting citations etc).
    • The metrics used for evaluation.
    • Key performance indicators as per the defined metrics for the hackathon.

** Kindly refer ‘Submission and Expectation’ page before submitting your solutions.

TECH STACK FOR BUILDING AI/ML BASED ALGORITHM

  • Participants are encouraged to innovate by developing their unique functions (f(x)) to tackle the given challenge.
  • Participants have the liberty to utilize any tech stack of their preference for model development. This flexibility allows them to harness the tools and technologies they are most adept at, facilitating the creation of effective and inventive solutions and deriving the mathematical function for this Hackathon.
  • Participants are encouraged to explore and experiment with diverse ensemble techniques, blending different machine learning algorithms to enhance performance and attain optimal results on test data.

How to Register?

Interested candidates can register via this page.

Prizes

The Hackathon offers significant prizes for the top-performing teams, and these are:

  1. First Prize: Rs. 25 lakhs
  2. Second Prize: Rs. 12 lakhs
  3. Third Prize: Rs. 7 lakhs
  4. Special Prize of Rs. 5 lakhs for All-Women Teams (in addition to the top three prizes)
  • Prizes would only be awarded if the model created meets the jury’s satisfaction of usability of the designed solution as a viable product.
  • Consolation prizes of Rs. 3 lakh, Rs. 2 lakh, Rs. 1.5 lakh and Rs. 1 lakh would be given in lieu of announced prizes, if the jury does not find any model provide perfect solution of the problem statement.

Click here for the official notification of Analytics Hackathon on Developing a Predictive Model in GST by GSTN.