Policy2Code Prototyping Challenge

Experimenting with Generative AI to Translate Public Benefits Policy into Rules as Code

Let’s work together to enhance public benefits.
Join our Rules as Code + AI experiments today!

The Policy2Code Prototyping Challenge invites experiments to test how generative artificial intelligence (AI) technology can be used to translate government policies for U.S. public benefits programs into plain language and software code.

It aims to explore ways in which generative AI tools such as Large Language Models (LLMs) can help make policy implementation more efficient by converting policies into plain language logic models and software code under an approach known worldwide as Rules as Code (RaC).

Generative AI technologies offer civic technologists and practitioners an opportunity to potentially maximize and save countless human hours spent interpreting policy into code, and translating between code languages — potentially expediting the adoption of Rules as Code. What is unclear is whether or not generative models are advanced enough to be fine tuned to translate rules to code easily, or whether a more robust human-in-the-loop or reinforcement learning process is needed. This challenge is a first step toward understanding that. 

The challenge is open to technologists, policy specialists, and practitioners who want to collaborate and demonstrate the benefits and limitations of using AI in this context.

This challenge is organized by the Digital Benefits Network (DBN) at the Beeck Center for Social Impact + Innovation in partnership with the Massive Data Institute (MDI), both based at Georgetown University in Washington, D.C.

Jump to Sections on This Page: 

Challenge Focus

Participating Teams

Coaches

Data Use and Code Requirements

Who Is This For?

How Does It Work?

Why Participate?

Matchmaking

Tools & Resources

Questions & Answers

About the Host Organizations

Challenge Focus

The Policy2Code Prototyping Challenge seeks experiments to test the use of generative AI in the following scenarios:

  • The translation of U.S. public benefits  policies into software code.
  • The translation of U.S. public benefits policies into plain language logic models.
  • The translation between coding languages, including converting code from legacy systems for benefits eligibility and enrollment. 
  • Recommendations of additional scenarios that enhance policy implementation through digitization.

The demos and analysis from this challenge will be made available to the public, allowing practitioners to gain insights into the capabilities of current technologies and inspiring them to apply the latest advancements in their own projects through open-source code and available resources. We will be using Github Classroom to coordinate and share starter resources as well as instructions across the prototyping teams.

Benefits Policy Focus

The challenge is focused on U.S. public benefit programs like SNAP, WIC, Medicaid, TANF, Basic Income, Unemployment Insurance, SSI/SSDI, tax credits (EITC, CTC), and child care.  

Participating Teams

Thank you to everyone who submitted creative ideas for impactful prototypes!

The following teams are participating in Policy2Code. Stay tuned for more details on Demo Day and seing work from each team.

PolicyPulse

Government policies often underperform due to a lack of clear understanding and measurement of results, making the implementation of statutes into policies a complex and lengthy process. Shifting societal and economic landscapes, implementation challenges, and resource-intensive manual evaluation methods create a gap between legislative intent and real-world impact, leading to inefficiencies and unmet public needs.

We propose the use of an Adversarial Model using a Large Language Model (LLM) to compare the logic model and code from existing policy implementation with those generated directly from the law/legislation. The tool thus developed will allow policy analysts and program managers to compare the logic of current policies with their original legislative intent, identifying gaps and inefficiencies in policy implementation.

This AI-driven approach will help bridge the gap between policy intent and outcomes, providing valuable insights into policy interpretation and all for greater efficiency of policy implementation by policy specialists.

Alexander Chen, Senior Consultant , BDO Canada

Jerry Wilson, Senior Analyst , BDO Canada

Craig Marchand, Vice President, BDO Canada Exponential Labs

Emmanuel Florakas, Partner, Technology Data Leader, Consulting

Rule Rangers

Building off of work to develop an innovative platform to streamline eligibility screenings and application submissions across various benefits and US states. A major goal we have is to empower subject matter experts to configure campaigns, benefit rules, screening logic, and submission logic with minimal reliance on software engineers.

Preston Cabe, Software Engineering Manager, Benefits Data Trust

Daniel Singer, Senior Product Manager, Benefits Data Trust

Miriam Peskowitz, Software Engineer, Benefits Data Trust

Code the Dream presents: Bennie and the Artificial Jets

Leveraging the power of large language models (LLMs) and generative AI, Code the Dream is developing innovative tools to enhance public benefits systems. Our comprehensive benefits screener, created in collaboration with public benefits partners in North Carolina and inspired by Colorado's MyFriendBen, aims to streamline access to vital resources.

Tom Rau, Director of CTD Labs, Code the Dream

Louise Pocock, Senior Program Consultant, Immigrant Services, Virginia Office of New Americans

Brian Espinosa, Senior Software Developer, Radiate Consulting

Mohammad Rezaie, Apprentice Software Developer, Code the Dream

Tam Pham, Apprentice Software Developer, Code the Dream

Sofia Perez, Apprentice UI/UX Designer, Code the Dream

Hoyas Lex Ad Codex

Building on our research in Exploring Rules Communication, we seek to understand how different Large Language Models (LLMs) perform key tasks in Rules as Code (RaC) adoption. We plan to provide documentation to help the ecosystem, and in particular government decision makers, better understand how the technologies currently perform and if public benefits eligibility rules can be more efficiently managed using a generative AI powered RaC approach.We will test three commercially available chatbots (and their related models) to evaluate how they 1) generate eligibility questions for SNAP and Medicaid in at least five states, 2) how they write plain language summaries of the logic, and 3) how they translate policy into code, including which languages are easiest to work with.Based on performance of the models, we will develop a strategy for fine-tuning the chatbots for the non-code generation experiments and the code generation LLMs for the code generation experiments. Our initial plan is to provide a small number of examples to the chatbots, have them extend the examples in different ways to increase the training data, and then use all the examples (hand developed, chatbot generated) to fine-tune each LLM for each experiment.

Lisa Singh, Director, Massive Data Institute (MDI), Sonneborn Chair, Professor, Department of Computer Science, Professor, McCourt School of Public Policy

Mahendran Velauthapillai, McBride Professor, CS Department, Georgetown University

Ariel Kennan, Senior Director, Digital Benefits Network (DBN) at the Beeck Center for Social Impact + Innovation

Tina Amper, Community Advisor, Rules As Code Community of Practice, DBN

Steve Kent, Software Developer, MDI

Jason Goodman, Student Analyst, DBN, MPP/MBA Candidate,Georgetown University

Alessandra Garcia Guevara, Student Analyst, DBN, Georgetown University

Kenny Ahmed, MDI Scholar, MDI,Georgetown University

Jennifer Melot, Technical Lead, Center for Security and Emerging Technology (CSET),Georgetown University

Lightbend

LLMs for better resident experience —The City of South Bend Civic Innovation team and Starlight team (Lightbend) hopes to reduce rejection rates, improve approval rate, and streamline administrative processes for LIHEAP (Low Income Household Energy Assistance Program) in South Bend, Indiana by understanding the approval processes for LIHEAP and other related energy assistance programs and building LLM assisted agents to help social service workers, direct enrollment teams and outreach and support teams improve the net outcomes for eligible applicants.

Shreenath Regunatha, Co-Founder, Starlight

Madi Rogers, Director of Civic Innovation, City of South Bend

Denise Riedl, Chief Innovation Officer, City of South Bend

Catherine Xu, Co-Founder, Starlight

Matthew Williams, First Engineer, Starlight

MITRE AI for Computable Rules

The team is exploring the hypothesis that AI can be used to assess a corpus of policy information to determine its quality and complexity. In addition, the team will leverage AI to develop an efficient method to generate high quality computable rules for machine consumption.

Matt Friedman, Senior Principal- AI and Policy, MITRE

Abigail Gertner, AI-Enhanced Discovery and Decisions Department Manager, MITRE

Laurine Carson, Senior Principal - Policy Analysis, MITRE

Moe Tun - Systems Engineer, MITRE

Shahzad Rajput - AI Engineer, MITRE

Billal Gilani - Policy Analyst, MITRE

Jenny Trinh - Computable Rules Analyst, MITRE

mRelief

mRelief is dedicated to improving access to social services, offering simplified SNAP applications in 10 states. Their user-friendly applications have led to an average application completion time of under 13 minutes and resulted in over 40,000 submissions in 2023. mRelief compared their success rates with FOIA requests for state data, showing a 70% application success rate compared to the states' 58%. However, only 49% of clients who start their self-service application complete it, with increased engagement via a community-based organization boosting this to 81%.

To address completion rate challenges, mRelief is collaborating with Google.org to develop a GenAI chatbot using LLMs, specifically Dialogflow, aiming to launch a MVP in South Carolina by August 2024. They are mindful of Gen AI risks and plan to keep a human in the loop at all stages, handcraft training data, and monitor the chatbot to ensure an empathetic tone. They aim to mitigate biases and misinformation by testing the prototype with small populations and creating an AI policy and ethics committee.

mRelief believes their AI training efforts will benefit the broader Policy2Code movement and hopes to share knowledge by participating in the Prototyping Challenge, looking forward to learning from other participants' risk mitigation approaches.

Dize Hacioglu, CTO, mRelief

Belinda Rodriguez, Product Manager, mRelief

Nava Labradors: Retrieving clear and concise answers to navigator’s public benefit questions

Nava Labs is developing generative AI solutions to support navigators such as caseworkers and call center specialists. Through research with navigators, program beneficiaries, and strategic partners, the team identified multiple use cases in which generative AI shows potential to streamline the process of applications and enhance the experience of applying for services. Currently, we’re prototyping one use case: an assistive chatbot that responds to navigators’ program and policy questions while they help clients complete eligibility screenings and applications. The chatbot uses a large language model (LLM) constrained by Retrieval Augmented Generation (RAG) to search policy guidance, retrieve information relevant to the navigator’s question, and summarize the information in plain language. We’ve partnered with Benefits Data Trust (BDT) to develop the chatbot, using BDT navigator feedback to iterate on its design and functionality. The chatbot searches BDT’s internal policy database to answer SNAP questions. Soon, we’ll test the chatbot’s performance retrieving data from public policy documents and supporting additional benefit programs. Key evaluation metrics include the tool's appropriateness for government services, user acceptability, administrative burden reduction, output accuracy, and algorithmic fairness. We are sharing findings as we progress via public demo days & articles and posting code on Github.

Martelle Esposito, Partnerships and Evaluation Manager + Nava Labs Co-Lead, Nava PBC

Genevieve Gaudet, Design Director, Research and Development + Nava Labs Co-Lead, Nava PBC

Alicia Benish, Partnerships and Evaluation Lead, Nava PBC

Diana Griffin, Product Lead, Nava PBC

Ryan Hansz, Design Lead, Nava PBC

Yoom Lam, Engineering Lead, Nava PBC

Kanaiza Imbuye, Designer, Nava PBC

Charlie Cheng, Engineer, Nava PBC

Kevin Boyer, Engineer, Nava PBC

PolicyEngine

PolicyEngine's goal is to improve accessibility and comprehension of public benefits policy for policymakers, researchers, and the public. Presently, complex eligibility rules and benefit calculations in programs like TANF are inaccessible and inconsistently applied across states. This leads to challenges in understanding benefits and hindering evidence-based policymaking. PolicyEngine believes in encoding benefits policy in open-source software to enable policy analysis, benefit calculators, and automation tools.

Their prototype aims to explore whether language models can expedite the process of translating benefits policy into code. They plan to leverage AI to encode TANF rules into PolicyEngine's Python framework, experimenting with few-shot prompting and fine-tuning language models. They will evaluate the quality of the generated code against expected inputs and outputs using models like GPT-4 and Anthropic's Claude.

The success of this prototype could benefit state agencies, advocates, think tanks, academics, and TANF participants, making the system easier to navigate and enabling data-driven policymaking. Key risks, such as hallucinated or biased rules, will be mitigated through a human-in-the-loop approach, test-driven development, and transparent sharing of training data and models. PolicyEngine believes that transparent open-source code for policy is essential for accountability.

Max Ghenis, CEO, PolicyEngine

Nikhil Woodruff, CTO, PolicyEngine

Anthony Volk, Engineer, PolicyEngine

Ron Basumallik, Senior Software Engineer, Independent Contributor

The RAGTag SNAPpers

Much attention has been placed on the use of large language models (LLMs) to aid individuals in applying for benefits like SNAP. LLMs can simplify the complex legal language surrounding a program’s eligibility requirements, hopefully making it easier to understand and access one’s benefits. Our goal for this project is to take the same philosophy and apply LLMs to help case workers review SNAP applications.

Each individual SNAP application must (by law) be reviewed by a human, who - with the aid of an eligibility system - ultimately makes the decision of whether an application is approved or not. Case workers are under pressure to process applications quickly, accurately, and in high volumes. They must respond to applications within 30 days. They must keep their error rates low as these are publicly reported numbers. And, in 2023, over 2 million households in Illinois alone received SNAP assistance. We believe that an LLM assistant can help case workers 1) process the application by reading in the application PDF, 2) tag areas on the application that provide evidence for an accepted or rejected applications, 3) provide a clear paper trail for the human reviewer to ultimately decide on the SNAP application outcome, and ultimately 4) be a lightweight, flexible model that accommodates differing eligibility guidelines between states.

Evelyn Siu, Data Science - Researcher, Garner Health

Ellie Jackson,Software Engineer, Harvard Growth Lab

Sophie Logan, Data Scientist, Invenergy

SSI/SSDI POMS Translator

Assessing eligibility for the SSI and SSDI disability benefits programs often requires years of experience working with a complicated interweaving set of regulations, operations manuals, and rulings. Rules as Code systems can help translate these legal documents into clear logical models that potential beneficiaries can measure themselves against to understand eligibility. However, Rules as Code systems require careful configuration and can be burdensome to create manually. Our team uses manually created, legally informed Rules as Code systems as templates for Generative AI systems to create Rules as Code for SSI/SSDI eligibility rules, exploring basic eligibility rules, pre-medical eligibility, and medical eligibility criteria while attempting to contribute to the existing Rules as Code ecosystem, particularly around disability benefits.

Nicholas Burka, Data Engineer/Data Architect, Beneficial Computing Inc.

David Xu, Engineer, Civic Tech Toronto

Marc Raifman, Benefits Lawyer

Angela Zhou, Assistant Professor. Data Sciences and Operations, University of Southern California (USC)

Team ImPROMPTu

Converting complex public benefit rules into plain language or working software to support the administration of public benefit programs can be a complex, lengthy, and costly process. Errors in this process can result in a number of issues including eligible recipients being denied benefits, and inaccurate benefit amounts being calculated. We believe that policy SMEs will always be the arbiter of correctness of system implementation. Thus we should give them tools that guide them in describing the essence of the policies  they are expert on for those who are expert in implementation.

We propose a prototype effort to test the use of a Domain Specific Language (DSL) for public benefit programs that enables subject matter experts (SMEs) to accurately communicate their requirements in depth to the teams implementing software to help administer those programs. This prototype effort would test the efficacy of using a simple DSL to better guide the output of an LLM into an intermediate format that can be easily converted to plain language or working software code, whichever is desired. This approach would appropriately place program SMEs as the authoritative source of information used by others to implement program rules as software code or in plain language.

Paul Smith, Chief Technologist, Ad Hoc, LLC

Deirdre Holub, Director of Engineering, Ad Hoc LLC

Greta Opzt, Senior Data Analyst, Ad Hoc LLC

Mark Headd, Senior Director of Technology, Ad Hoc LLC

Alex Mack, Director of Research, Ad Hoc LLC

Ben Kutil, CTO, Ad Hoc LLC

Clara Abdurazak, Strategic Partnerships, Ad Hoc LLC

Corey Kolb, Senior UX Designer, Ad Hoc LLC

John French, Experience Director, Ad Hoc LLC

Patrick Saxton, Director of Engineering, Ad Hoc LLC

Wryen Meek, Product Lead, Ad Hoc LLC

Tech2i Trailblazers

Navigating the complexities of Medicaid and the Children's Health Insurance Program (CHIP) can be challenging for many beneficiaries. To address this, we are introducing an innovative AI-powered solution designed to streamline access to essential healthcare information. Our advanced AI technology provides concise summaries of Medicaid/CHIP policies and features an intelligent chatbot to assist users with their inquiries, ensuring they receive accurate and timely information.

Comprehensive Summarization: This AI system scans vast amounts of Medicaid/CHIP documentation to generate easy-to-understand summaries. Summaries are tailored to address common questions and concerns, reducing the need for beneficiaries to sift through extensive materials.

Interactive Chatbot Assistance: The AI-powered chatbot offers real-time assistance, guiding users through their Medicaid/CHIP queries. Capable of answering a wide range of questions, from eligibility criteria to benefits details and application processes. Provides step-by-step support for common procedures, ensuring users feel confident and informed.

User-Friendly Interface: Designed with accessibility in mind, our interface ensures that users of all ages and technical backgrounds can easily navigate the system.

We envision a future where every Medicaid and CHIP beneficiary has effortless access to the information they need to make informed healthcare decisions. By leveraging cutting-edge AI technology, we aim to demystify healthcare programs, empower users, and ultimately improve public health outcomes.

Sunil Dabbiru, Data Engineer/Data Architect, Tech2i

Bahodir Nigmat, Data Architect, Tech2i

Harshit Raj, Data Engineer, Tech2i

Sergey Tsoy, Solution Architect, Tech2i

Farouk Boudjemaa, Engineer, Tech2i

Truviq

Truviq's project focuses on utilizing AI technology to develop prototypes for converting government policies into online applications. This aims to reduce the manual and error-prone process of translating policies into code, improving compliance, security, and operational efficiency. By merging policy-to-code approaches with GenAI capabilities, the team aims to shorten the time between policy definition and implementation.

Suneel Sundaraneedi, CEO, Truviq Systems

Winfried Hamer, Customer Success & AI Evangelist, Dutch UWV

Chaitanya PVK, Lead Architect, Truviq Systems

Coaches

The Policy2Code teams are receiving multidisciplinary and cross-sector advice from our four coaches.

  • Harold Moore, Director, The Opportunity Project for Cities and Senior Technical Advisor at Beeck Center for Social Impact + Innovation
  • Michael Ribar, Principal Associate for Social Policy and Economics Research at Westat
  • Andrew Gamino-Cheong, Co-Founder and CTO, Trustible, an AI-Governance Software Company 
  • Meg Young, PhD, Participatory Methods Researcher, Algorithmic Impact Methods (AIM) Lab at Data & Society Research Institute

Data Use and Code Requirements

  • All code should be well-documented. 
  • All code and documentation used in the prototypes will be open and available for viewing by the public at the completion of the challenge. It may also be incorporated into Georgetown University research. 
  • Only use policy documentation that is publicly available unless you have permission to share internal documentation.
  • Ensure that you have the rights or permission to use the data or code in your prototype.  
  • Ensure that no personally identifiable information (PII) is shared via your prototype or documentation. 
  • Please only use a sandbox environment, not production systems.

Who Is This For?

The challenge is open to individuals and teams including technologists, policy specialists, academics (including current or graduating students), program administrators, and other practitioners from all sectors who are interested in collaborating on the translation of U.S. public benefits policies into software code and logic models. Individuals are encouraged to join forces to form cross-disciplinary teams!

We believe that teams with at least one technologist and one policy specialist will benefit from collaboration between different domain experts. However, individuals with multi-dimensional expertise are welcome to participate. We encourage all teams to consider public sector and beneficiary perspectives as part of their prototype.

Please see the matchmaking section if you are looking for team members.


How Does It Work?

1. Program Launch 

Join us for the Policy2Code launch webinar on May 3, 2024 from 1-2pm ET. We’ll provide a full overview of the program and leave time for questions. Questions and answers will be shared on this page following the webinar.

2. Apply

To participate in the Policy2Code Challenge, individuals or teams must complete an application form. The deadline for submission is May 22, 2024. The DBN and MDI will select up to 10 teams to join the challenge.

3. Announcement of Participants

We will announce selected teams the week of June 3, 2024

4. Monthly Touchpoints

Once selected, participants will have approximately three months to complete their testing.

Throughout the summer, we’ll host monthly touchpoints to share progress, solicit feedback from fellow participants, and receive support from coaches to help problem-solve issues with experiments. Teams are asked to join at least 2 of the 3 monthly touchpoints. 

Proposed dates: 

  • June 25, 2024, 12 noon to 1:30pm ET
  • July 23, 2024, 12 noon to 1:30pm ET
  • August 22, 2024, 11:30am to 1pm ET

5. Demo Day at BenCon 2024 

The challenge will culminate in a Demo Day at the DBN’s signature event, BenCon 2024, scheduled for September 17-18 in Washington, D.C. Participants will present their experiments, prototypes, tinkering, and other developments to the community for feedback, awareness, and evaluation. 

Participating teams will be invited for in person demos, and a virtual option will also be available. Each member of a demo team will be awarded a $500 honorarium in accordance with the Georgetown University honoraria policy and if their institution allows. The coaches and organizers will also award non-monetary prizes for categories such as best use of open tools, best public sector demo, community choice, design simplicity, etc.

6. Report Findings 

In early 2025, the DBN and MDI will co-author and publish a report summarizing challenge activities, documenting findings from the prototypes, and highlighting the suitability of LLM frameworks and usage. Building on findings from the challenge, the public report will also recommend next steps for using cross-sector, multidisciplinary approaches to scaling and standardizing how benefits eligibility rules are written, tested, and implemented in digital systems for eligibility screening, program enrollment, and policy analysis.

Why Participate?

Understand the Application of Generative AI in Policy Implementation

Gain insights into how generative artificial intelligence technologies, specifically Large Language Models (LLMs), can be utilized to translate policy documents into plain language logic models and software code under the Rules as Code (RaC) approach.

Explore the Benefits and Limitations of Generative AI in Policy Translation

Analyze the potential benefits and limitations of using generative AI tools like LLMs for converting U.S. public benefits policies into software code, plain language logic models, and for translating between coding languages, with an emphasis on efficiency and accuracy.

Collaborate Across Disciplines for Policy Digitization

Discover the importance of cross-disciplinary collaboration between technologists, policy specialists, and practitioners across sectors in developing innovative solutions to translate policies into software code and logic models, applying the latest technological advancements.

Enhance Problem-solving Skills Through Experiments

Engage in hands-on experimentation to test scenarios related to policy digitization, such as converting legacy code systems for benefits eligibility, and recommend additional scenarios that can enhance policy implementation through digitization.

Apply RaC Approach in Real-world Applications

Gain practical experience in utilizing the Rules as Code approach to scale and standardize how benefits eligibility rules are written, tested, and implemented in digital systems, leading to a deeper understanding of digital service delivery.

Matchmaking

Are you looking for team members? 

We ran an open matchmaking process ahead of the application deadline. We have taken down this information since the application period has closed.

Tools & Resources

We have collected tools and resources that may be useful in prototyping. Listing of a tool does not designate endorsement. If you’d like to recommend a tool or resource be added to the list, please send us a note at rulesascode@georgetown.edu

Digital Benefits Hub

The Digitizing Policy + Rules as Code page of the Digital Benefits Hub hosts numerous resources including example projects, case studies, demo videos, research, and more related to Rules as Code. 

The Automation + AI page of the Digital Benefits Hub hosts resources on how to responsibly, ethically, and equitably use artificial intelligence and automation, with strong consideration given to mitigating harms and bias.

Rules Frameworks

LLMs + Tools 

Policy Manuals 

Questions & Answers

Please see below for questions asked by potential participants and answers from the DBN and MDI Team. Please send any additional questions to rulesscode@georgetown.edu.

Question 1I have a question regarding the source material. The website provides links to SNAP manuals and state websites. What should participants consider as the authoritative source for the rules to be used by the generative AI tools? Should it be from an official government website or a third-party source?

Answer 1: All of the mentioned sources are considered acceptable for the Policy2Code challenge. Many nonprofits and private sector organizations develop tools based on these rules, usually referencing original government documents.

There are some national policies that come from the U.S. Department of Agriculture,  the Food and Nutrition Service (FNS). For SNAP policies, the state policy manuals are considered the most reliable source. These manuals can be found in various formats, including dynamic websites and PDFs.  [To see examples of state rules, you can refer to the report authored by DBN-MDI: “Exploring Rules Communication: Moving Beyond Static Documents to Standardized Code for U.S. Public Benefits Programs”]

The SNAP policy manuals link listed on our Policy2Code website references a resource created by the Center on Budget and Policy Priorities. It provides comprehensive access to state SNAP websites and policy manuals. However, if you need information on programs like WIC in a specific state, a simple Google search will yield relevant results. It's worth noting that the quality and availability of the sources may vary.

Question 2Will the coaches offer constructive criticism on the social potential for harm of these projects, or will they be focused primarily on technical assistance?

Answer 2: We have not yet publicly announced our coaches for the Policy2Code challenge, but we are actively working on assembling a team of multidisciplinary experts. These experts will go beyond specific domains such as technology, policy, AI, or AI ethics. We are considering cross-disciplinary support to ensure comprehensive guidance. At the DBN and MDI, we are deeply focused on understanding the civil rights implications of the tools being developed. We recognize that people's livelihoods and lives are significantly impacted by benefit systems, particularly with the introduction of AI and automation. Therefore, we are extremely cautious and strive to raise awareness about these implications.

Part of the purpose of the sandbox nature of this work is to showcase where these tools excel and where they may potentially cause harm. By experimenting away from real-life systems currently in use, we can better understand their performance and impact. Additionally, ethical considerations play a pivotal role in our approach. Comparing different approaches, such as open source versus closed source, can shed light on ethical considerations, biases, and better understanding of the tools. It is worth sharing these findings as part of your project's output.

Question 3Could you elaborate on the output and deliverable and criteria for success?

Answer 3: We are really interested in trying to create a holistic setup to understand what works well and what doesn't when it comes to large language models and rules as code. There are various areas where your project can make an impact, and we don't expect you to figure out every aspect and provide a complete system. Instead, we encourage you to focus on a specific slice that interests you.

For example, if you're not as experienced in coding, you could explore the types of prompts that are effective for generating specific types of code using chatbots. On the other hand, if you excel in working with large language models, you might want to fine-tune them for SNAP rules or rules with specific properties. You could also investigate how to evaluate the performance of language models.

We have deliberately kept the task formulation broad, allowing you to choose a specific task that aligns with our overarching goal. The role of coaches and check-ins is to ensure that your chosen task fits the overall objective.

In summary, you have a task, you have an experiment, and you have results from that experiment. And then perhaps, you can explore some ethical considerations or policy considerations that are connected to your output.

Question 4What are the expected actions for participants in this challenge? Are they expected to examine potential use cases and scenarios for applying AI and LLMs in information collection, decision-making, and the execution of public policy?

Answer 4: The challenge is focused on three core scenarios. The first scenario involves translating policy into software code and plain language, while the second scenario deals with translating code between different programming languages. The third scenario involves extracting code from legacy systems or transitioning from one code language to another for system integration purposes. While we are open to hearing ideas for other scenarios, our core research is focused on these three.

We are limiting the scope of the challenge to policy-focused scenarios specific to U.S. public benefit systems. While we are considering other programs, we are primarily concentrating on the U.S. context, as we are aware that compared to international colleagues, the Rules as Code concept is less advanced in the U.S. We hope to use this challenge to close that gap and produce new evidence and documentation on the potential benefits of implementing Rules as Code in the U.S.

Question 5What is the criteria for selecting teams?

Answer 5: We do not plan to release a public-facing rubric. We have tried to keep the application form as low-burden as possible. And knowing that these are prototypes and experiments, and they're going to change over time. 

We would like to see teams with diverse expertise. We're very interested in teams that consider public sector or benefits perspectives, either  included within a prototyping team, or how you would engage those stakeholders directly. 

We've asked for a description of your intended prototype and how you plan to test for it. What technologies are you thinking about using? How are you mitigating bias and harms potentially in that system? How are you thinking about who is impacted by a potential system?

We're also wanting to make sure that we have a range of solutions and experiments. So we're also looking at diversity of experiments. So that we're able to learn more as a community. When developing your prototype, it is important not to scope it too large. Keep it within a specific scope to ensure you can complete the project successfully.

About the Host Organizations

This challenge is organized by the Digital Benefits Network at the Beeck Center for Social Impact and Innovation in partnership with the Massive Data Institute, both based at Georgetown University in Washington, D.C.

The Digital Benefits Network (DBN) supports government in delivering public benefits services and technology that are accessible, effective, and equitable in order to ultimately increase economic opportunity. We explore current and near term challenges, while also creating space to envision future policies, services, and technologies. The DBN’s Rules as Code Community of Practice (RaC CoP) creates a shared learning and exchange space for people working on public benefits eligibility and enrollment systems — especially those tackling the issue of how policy becomes software code. The RaC CoP brings together cross-sector experts who share approaches, examples, and challenges.

At Georgetown’s McCourt School of Public Policy, the Massive Data Institute (MDI) is an interdisciplinary research institute that connects experts across computer science, data science, public health, public policy, and social science to tackle societal scale issues and impact public policy in ways that improves people’s lives through responsible evidence-based research.

Learn more about Rules as Code and join our community of practice on the Digital Benefits Hub. If you have further questions, email us at rulesascode@georgetown.edu.