Problem Statement
The majority of San José residents speak languages other than English, particularly Spanish and Vietnamese. The City of San José sought to expand access to their 311 service to be more responsive to the needs of these residents.
Project Description
The City of San José built a custom translation model using machine learning to facilitate conversations regarding 311 services between Spanish and Vietnamese-speaking residents and municipal employees. To inform their approach, the city worked with Code for America to research resident needs, agency requirements, and use cases for the existing 311 service. The researchers offered monetary incentives and conducted interviews throughout the community to solicit feedback.
The custom San José translation model is built using Google AutoML, which has extensive documentation on preparing training data, creating and managing datasets and models, and evaluating models. They used their research to manually construct a number of English sentences related to the 311 service and paired them with Spanish and Vietnamese translations to add to the model. The model uses Mexican Spanish—the most common Spanish dialect in the city—and multiple Vietnamese words for technical terms unique to English.
Project Outcomes and Impact
Using human translators and the Bilingual Evaluation Understudy metric for translations based on machine learning, the city determined that their custom model contributed to between a 22 and 55 percent improvement in the quality of translations from standard Google Translate.
Replicable Takeaways
San José offers a model for using resident feedback, human translators, and the power of machine learning to customize and improve translations for information about public services. Additionally, the city identified the importance of incorporating existing API tools and adopting a nuanced approach to address jargon and dialects.