People looking for information on legal questions often start their searches online, without a good handle on the terminology. Today’s machine learning tools can help put nonlegal phrasing into context, using artificial intelligence to match people’s situations with specific legal issues, supplying accurate information and connections to potential services.
A team at the Legal Innovation and Technology (LIT) Lab at Suffolk University Law School in Boston, with funding from The Pew Charitable Trusts, is building an application programming interface, or API—known as Spot—that can serve as a computerized issue spotter. Spot could be used by legal services websites and others to help lay users, and its functionality will improve as it accumulates more data and real-life examples.
David Colarusso teaches at Suffolk University and runs the LIT Lab. He started his career teaching high school physics and astronomy, while also running a small software firm with a college friend. Because law proved to be a common thread connecting so much he cared about, Colarusso went to law school and became a public defender. Over time, he started asking questions about why his agency did things the way it did, especially with data, and became convinced it could better serve clients.
While at the Massachusetts Committee for Public Counsel Services, Colarusso took on the role of data scientist in 2015, becoming probably one of the nation’s first at a public defender’s office. His perspectives as a lawyer and a programmer inform his approach and his ability to make connections others might miss. The Suffolk lab, started a little over two years ago, leverages existing technology to classify text to address specific needs in a legal context.
Colarusso recently answered questions about Spot’s growth and its future. His comments represent his personal views. The interview has been edited for clarity and length.
A: Spot is a computerized issue spotter. Give Spot a non-lawyer’s description of a situation, and it returns a list of likely issues from the National Subject Matter Index [NSMI], version 2. Developed by the Legal Design Lab at Stanford University, NSMI provides the legal aid community a standard nomenclature for talking about client needs. It includes issues like eviction, foreclosure, bankruptcy, and child support. Spot is provided as a service over an API. That’s fancy speak meaning that it’s built for use by computer programs, not people. Coders can build things (like websites) on top of the API.
A: Margaret Hagan from the Legal Design Lab at Stanford and I were having a hallway conversation at the Legal Services Corp.’s annual tech conference in 2018 about work her lab was doing putting together a taxonomy of legal issues when someone from Microsoft walked by and mentioned the difficulty they were having building an automated issue spotter. I figured it was an issue with training the data to make connections between legal terminology and the text people use when searching for legal information.
A moderator from Reddit had just given a talk where he observed that they had tens of thousands of laypeople’s legal questions. I suggested we have folks label the Reddit questions with Margaret’s taxonomy. The labeled texts could serve as training data for something like the classifier Microsoft was building. I figured we could get folks to volunteer their time to apply the labels by making an online game. So we snagged the guy from Reddit and worked out how we could use their texts. Now anyone can help label this data (about 75,000 texts) by playing the online game Learned Hands, a partnership between the LIT Lab and Stanford’s Legal Design Lab.
Eventually, we built Spot on top of the Learned Hands data because we wanted to make an issue spotter that would be available for free to legal aid providers, and the existing tools didn’t quite meet the community’s needs.
A: Spot can play the role of a facilitator. For example, if someone uses a website to ask a legal question involving a landlord and tenant dispute, Spot could identify the specific issue, and, using the information from Spot, the website could direct the person to appropriate resources. Our hope is that developers will work with legal aid information providers to integrate Spot into existing or new tools to help folks in need. The big value comes from the ability to do this at scale.
A: Spot looks for patterns in labeled texts that correlate with certain labels or issues. So we need to show it a lot of examples. It builds on data from the Learned Hands online game, which from its start in 2018 has aimed to crowdsource the labeling of laypeople’s legal questions to train machine learning (ML) classifiers/issue spotters.
The diversity of examples matters because different communities talk about issues in different ways. For Spot to recognize an issue in a text using a particular phrasing, it needs to have seen an example of such phrasing. To protect privacy, users have the option to let Spot forget or remember the content of text shared. If permission is given to remember a text, our staff or other legal experts may review phrases to confirm the presence of issues and use their insights to retrain the issue spotter.
A: We’re building a general-purpose tool with the hope that folks will find creative ways to use it. That being said, four classes of projects come to mind:
But Spot’s potential usage could go well beyond these four areas.
A: They should visit Spot’s homepage, where they can find additional information that should help them decide if Spot is right for them. They can also reach out to me on Twitter at @Colarusso.
A: Ultimately, our goal is to make Spot a sustainable, free-to-use resource for legal aid and similarly focused groups. The real promise lies in the virtuous cycle that arises when users feed data back to Spot. Over time, as more people use Spot and it sees more examples, its performance should improve, and that improved performance can be shared with the community.
Spot is in the early stages of development. Over the next year, we’ll work on expanding the number of issues it can spot as well as its performance. How fast this happens is partially dependent on the data we have available. So anyone interested in improving Spot’s performance should consider playing a round or three of Learned Hands.