Using AI to fill in the blanks

I’m no stranger to dealing with large datasets. But recently, I found myself facing a particularly tedious challenge: I had a list of rugby venues that was missing crucial information: latitude, longitude, and country details. Without this geospatial data, the list was incomplete and, frankly, useless for the purposes I had in mind.

Manually sourcing the data felt daunting. Imagine searching for dozens of locations one by one, copying coordinates, and verifying them. It was the kind of problem that was begging for an automated solution. That’s when I turned to AI.

Missing Data

The list of rugby venues was robust in some areasβ€”venue names and city detailsβ€”but it was lacking in critical geospatial information. Without latitude and longitude, it would be impossible to perform deeper analysis, mapping, or logistical planning. And with no country data, filtering or categorizing venues by region was out of the question.

Manual lookup was not an option. Not only would it have been a massive time sink, but the potential for human error was high. Even if I managed to scrape through the entire list, there was no guarantee of complete accuracy.

Automated Data Entry

That’s when I realized there had to be a better wayβ€”and it came in the form of OpenAI. I’d been experimenting with OpenAI’s capabilities for other projects, and this seemed like the perfect use case. My goal was to feed the AI the venue names and locations I had and let it handle the heavy lifting: retrieving the latitude, longitude, and country values with complete accuracy.

I initiated a straightforward process. With the venue names and partial location details as input, OpenAI was able to not only provide the missing latitude and longitude but also assign the correct country to each venue. And it did so with 100% accuracy.

This wasn’t just a time-saving maneuverβ€”it was a quality assurance leap. The AI didn’t just guess; it cross-referenced and delivered verifiable data. Here was my prompt:

You’re are a sports analyst and geographer that has intricate knowledge of rugby venues. Your sole task is to provide the latitude, longitude and country code (ISO 3166) of a rugby venue based on the information provided.

Provide a JSON object as your response. The JSON object must have a key called β€œlatitude” which contains the venue’s latitude. The JSON object must have a key called β€œlongitude” which contains the venue’s longitude. The JSON object must have a key called β€œcountry_code” which contains the venue’s country code (ISO 3166).

I then ran a user prompt with the following data:

Basically giving as much useful context as possible. Plugging it into the API was trivial with the OpenAI Ruby gem. OpenAI allowing for JSON mode gets rid of those parsing errors that were really common in their API’s early days.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
api = OpenAI::Client.new(

response = api.chat(
  parameters: {
    model: "gpt-4o",
    messages: [
      {
        role: "system",
        content: system_prompt,
      },
      {
        role: "user",
        content: user_prompt,
      },
    ],
    response_format: { type: "json_object" },
    temperature: 0.7,
  }
)

result = JSON.parse(response.dig("choices", 0, "message", "content"))

Data Integrity Meets Efficiency

The real breakthrough here wasn’t just that OpenAI could generate the missing dataβ€”it was how fast and accurately it did so. With AI, what would have been hours of manual work was condensed into mere minutes. More importantly, I could trust the results. Having complete confidence in the accuracy of the data means I can move forward without hesitation, whether it’s for visual mapping, analytics, or logistical planning.

This experience underscored for me how AI is transforming data tasks that we once considered tedious or impractical to automate. In industries like sports, where real-time data and precise location information are crucial for everything from travel logistics to fan engagement, the ability to populate missing data fields accurately and quickly is invaluable.

LLMs are another API

This experience served as a powerful reminder of how practical AI can be in streamlining workflows. For anyone dealing with incomplete datasets, whether in sports or other fields, the ability to automate the population of missing information isn’t just convenientβ€”it’s essential.

Here’s what I learned:

  1. AI Saves Time: Tedious, time-consuming tasks can be reduced to seconds with the right AI implementation.
  2. Improved Accuracy: Manual input is prone to errors, but AI can deliver precise results that you can rely on.
  3. Scalability: AI solutions allow you to scale your data management processes as your dataset grows.

This experience opened my eyes to the practical applications of AI for even small, detail-oriented tasks like filling in geospatial data. It’s easy to think of AI as something futuristic or far off, but the reality is, it’s already here, helping solve everyday problems like mine.

If you’re dealing with incomplete data, consider how AI can help fill in the gapsβ€”accurately, efficiently, and at scale.