ICCAD Logo LLM4HWDesign

Contest Problem

Introduction

Large Language Models (LLMs) have demonstrated remarkable capabilities in generating high-quality content from natural language prompts, sparking growing interest in their application to hardware design [1,2,3]. The potential of LLMs to streamline design flows and enhance hardware design accessibility for non-experts is significant. Initiatives like Architecture 2.0 [4] aim to transform the hardware design paradigm by leveraging artificial intelligence to create more advanced and efficient hardware systems while significantly reducing manual design overhead.

Despite the significant potential and community excitement, current state-of-the-art (SOTA) pretrained LLMs, such as OpenAI's GPT-4 [5], still struggle to produce practical hardware designs without extensive human intervention in their original forms. In hardware code generation, for example, these models tend to either (1) generate non-synthesizable or non-functional code, necessitating human correction, or (2) produce overly simplistic or impractical implementations [3]. This issue can primarily stem from the LLMs' limited exposure to hardware design data during pretraining. A pioneering attempt in ChipNeMo [6] demonstrates that using an in-house large-scale Verilog code dataset can effectively improve LLMs' Verilog code generation abilities. However, there are no publicly available datasets, posing a significant limitation to the further development of LLM-assisted hardware design. Therefore, developing open-source, high-quality, hardware-specific code datasets is essential for unlocking the full potential of LLM-assisted hardware design.

This year's contest seeks to address this challenge by asking you to help build a large-scale, high-quality Verilog code generation dataset. By open-sourcing this dataset, we aim to establish critical infrastructure for advancing LLM-assisted hardware design workflows. Winning participants will be invited to co-author a technical report summarizing our efforts, insights, and lessons learned, thereby paving the way for future initiatives.

Objective

The goal of this contest is to enrich the current Verilog code dataset to a large-scale, high-quality open-source dataset, facilitating the development of more effective LLM-assisted Verilog code generation through fine-tuning. Participants are asked to (1) collect or generate Verilog code samples and (2) enhance the dataset quality through data cleaning and label generation techniques. Participants' contributions will be evaluated based on the improvement their data brings to the fine-tuned LLM.

Problem Definition

To achieve our goal of enriching the Verilog code generation dataset, we aim to leverage one of the current SOTA datasets, MG-Verilog [7], as the starting point and improve the scale and quality of the dataset. We propose a two-phase contest, with the first phase aiming to improve the scale of the existing dataset and the second phase to improve the quality of the dataset. We introduce the problem definition for each of the phases below.

Phase I

In this phase, we aim to explore scalable methods for collecting and generating Verilog code and corresponding natural language instructions to increase the scale of the Verilog code dataset. Participants are asked to focus on the following areas:

1. Data Collection: Investigate methods to gather new Verilog code samples from various sources, including but not limited to open-source repositories, academic publications, and proprietary designs. All collected samples must be appropriately licensed for open-source and public use.

2. Data Generation: Explore techniques to leverage existing LLMs or other tools to generate new Verilog code samples. There are no specific restrictions on the approaches participants adopt, provided that the generated content is available for open-source and public use.

Phase II

In this phase, we aim to explore automatic and effective methods to improve the quality of the MG-Verilog dataset. Specifically, participants are tasked with developing and applying innovative data filtering and labeling methods to improve the quality of the MG-Verilog dataset, thereby enhancing the performance of the fine-tuned LLM on the evaluation dataset. Specific areas to explore include:

1. Data Filtering: Develop techniques to automatically remove low-quality data samples from the dataset, focusing on reducing the potential harm to performance caused by low-quality data collected in Phase I. Please note that our contest restricts data filtering to static methods, which remove a fixed subset of data samples from the dataset throughout the fine-tuning process. Dynamic data filtering, which adaptively includes or excludes samples across different fine-tuning epochs, is not allowed.

2. Accurate Descriptions: Develop techniques to automatically generate more accurate descriptions for the data samples. Focus on bridging the gap between high-level instructions and the detailed implementations that LLMs are expected to adopt during code generation.

3. Label Design: Create labeling strategies that facilitate the learning process of LLMs. Aim to narrow the gap between the knowledge acquired by LLMs during pretraining and the new knowledge needed during fine-tuning.

Scoring

We evaluate participants' submissions in Phase I and Phase II separately, with both phases using the CodeLlama-7B-Instruct model as the target LLM and our in-house evaluation dataset as the target evaluation benchmark. To obtain a valid ranking, participants are expected to participate in both phases.

Submission Guidelines

Each participant is required to submit their data samples and generated labels with all necessary code and materials to reproduce their submissions. Submissions should be organized and include the following components for each phase:

Award-winning teams are expected to submit a technical report introducing their solutions before the award ceremony. Additionally, they are invited to attend the ICCAD conference to present their solutions and receive their awards in person. Detailed guidelines for the technical report and presentation format will be released soon.

Starting Toolkit

At the beginning of Phase I, we will release a starting toolkit, including (1) an existing dataset as the base dataset, (2) an example dataset of hardware code from external sources providing the format example of participants' submission, (3) a codebase to fine-tune a specific LLM with the base dataset and the example submission dataset, (4) an evaluation script to measure the how the example submission dataset mitigate the bias of the base dataset, and (5) the deduplication codebase we will use to duplicate the repeated data samples. Participants are expected to just replace the example submission dataset with their own collected datasets and get the corresponding metric from the starting toolkit to further improve their datasets during Phase I.

At the beginning of Phase II, we will add the following items to the starting toolkit: (1) the collected dataset from Phase I by all participants, (2) an example set of labels for part of the samples in the collected dataset, and (3) an evaluation script to measure the how the example set of labels improve the quality of the collected dataset.

Reference

[1] Blocklove, J., Garg, S., Karri, R., & Pearce, H. (2023, September). Chip-chat: Challenges and opportunities in conversational hardware design. In 2023 ACM/IEEE 5th Workshop on Machine Learning for CAD (MLCAD) (pp. 1-6). IEEE.

[2] Liu, M., Pinckney, N., Khailany, B., & Ren, H. (2023, October). Verilogeval: Evaluating large language models for verilog code generation. In 2023 IEEE/ACM International Conference on Computer Aided Design (ICCAD) (pp. 1-8). IEEE.

[3] Fu, Y., Zhang, Y., Yu, Z., Li, S., Ye, Z., Li, C., ... & Lin, Y. C. (2023, October). Gpt4aigchip: Towards next-generation ai accelerator design automation via large language models. In 2023 IEEE/ACM International Conference on Computer Aided Design (ICCAD) (pp. 1-9). IEEE.

[4] Reddi, V. J., & Yazdanbakhsh, A. (2023, July). Architecture 2.0: Challenges and Opportunities. In 2023 60th ACM/IEEE Design Automation Conference (DAC) (pp. 1-2). IEEE.

[5] Achiam, J., Adler, S., Agarwal, S., Ahmad, L., Akkaya, I., Aleman, F. L., ... & McGrew, B. (2023). Gpt-4 technical report. arXiv preprint arXiv:2303.08774.

[6] Liu, M., Ene, TD., Kirby, R., Cheng, C., Pinckney, N., Liang, R., ... & Ren, H. (2023). Chipnemo: Domain-adapted llms for chip design. arXiv preprint arXiv:2311.00176.

[7] Zhang, Y., Yu, Z., Fu, Y., Wan, C., & Lin, Y. C. (2024, June). MG-Verilog: Multi-grained Dataset Towards Enhanced LLM-assisted Verilog Generation. In LAD 2024: International Workshop on LLM-Aided Design.