This past year was defined by my work at Find AI. I had joined as a contractor in 2022 when it was still an AI lab searching for direction. Working alongside two non-technical founders, my technical skills complemented theirs as we explored various innovative applications of LLMs.
About a year ago, we prototyped a search engine concept that would become Find AI's core product - a novel LLM-powered system for finding people and companies. Having built similar systems at Moonlight, I recognized its potential immediately. The founders agreed, and we pivoted the entire company around this search product.
This opportunity came at a crucial time. Prior to Find AI, I had been working on independent software products like Booklet and Postcard. After two VC-backed startup failures, I had wanted to try a different approach to entrepreneurship. What I didn't realize then was that I was in a rut - startup burnout had led me to pursue increasingly conservative projects, which ultimately proved unsuccessful.
As my excitement about Find AI began to eclipse my personal projects, I recognized something had shifted. I joined as a late-coming cofounder and threw myself completely into the work. Over the next year, I built an engineering team, implemented an asynchronous working style, and tackled formidable technical challenges. As early OpenAI adopters, we developed sophisticated tools for scaling and evaluating LLMs, and I dove deep into search system architecture and vector databases.
Then, a few weeks ago, everything changed. One of the original founders implemented a sudden, divergent new direction, which led to my departure. While the decision to leave was difficult, I knew in my gut that we had made a positive impact. More importantly, working on Find AI had rekindled my optimism about venture-scale startups and the transformative potential of AI.
Finding direction
In the aftermath, I felt lost about next steps. Starting another company seemed like an obvious path, but something felt off. I didn't have a clear vision of what to build or who to build it with. Instead of making a momentum-driven decision, I chose to take a thoughtful approach.
After contemplation and long walks, I outlined what I wanted in my next role:
Focus on deep work and building. My best days were always those spent coding quietly until lunch. While I could have pursued product or engineering management, I recognized that I didn't enjoy meetings and politics.
In-person collaboration with ambitious people. Having worked remotely since founding Moonlight in 2017, I missed office culture. I'd seen how remote work could sometimes be misused to prioritize lifestyle over impact. An in-person environment felt like a signal of serious commitment.
Living in a livable, walkable U.S. city. As someone who hasn't owned a car in over a decade, and who developed a deep appreciation for urbanism during my digital nomad years, this was non-negotiable.
Working on AI. Through Find AI, I'd witnessed firsthand how powerful and disruptive LLM technology could be. The excitement around AI's potential was palpable, and I wanted to be at its epicenter.
This clarity led me to pursue staff engineering roles at AI companies. Coding digital products has been my craft for twelve years - it's my rare and valuable skill.
Deciding not to be a founder felt like a weight lifting from my shoulders. But, the prospect of technical interviews was daunting. I hadn't held the title "Software Engineer" in over a decade. While I had the skills, the interview process - particularly the leetcode puzzles - was intimidating. Rather than shy away, I embraced the challenge, studying algorithms, reading books, and practicing problems.
Finding home
The location decision proved more complex. After graduating in 2013, I'd spent four years in San Francisco before becoming a digital nomad. Those travels took me to Mexico City, Buenos Aires, Barcelona, and London, deepening my appreciation for urban living. In 2019, I settled in New York City, seemingly the perfect choice for someone valuing urbanism in the US.
My travels continued, particularly to Nordic cities like Copenhagen, Oslo, and Stockholm. I developed a specific appreciation for compact city design and realized how much I missed easy access to nature within NYC's urban sprawl. During my fourth trip to Copenhagen in three years, I had an epiphany while running along the waterfront: almost everything I loved about Copenhagen - the quiet streets, bicycle-friendly infrastructure, compact layout, access to nature - was available in San Francisco.
I spent three weeks in San Francisco this Fall as a trial. It felt like coming home, and claims of the city's demise proved exaggerated. While I'd hoped New York would emerge from the pandemic as a stronger tech hub, it remained primarily focused on finance and applied technology. As groundbreaking AI companies like OpenAI and Anthropic emerged, they were based in San Francisco.
The pandemic had created uncertainty about SF's future, but today it has reemerged as the clear center for those serious about technology careers.
The next chapter
I conducted nearly a hundred interviews over one month, pushing myself beyond my comfort zone and direct network. After passing numerous technical interviews and exploring many exciting AI companies, I found myself with multiple compelling options.
In the end, one opportunity stood out. I've accepted an offer to join an AI startup in San Francisco as a member of the technical staff, starting later this month. More details to come soon.
Today, my belongings are somewhere in Tennessee, making their way west. I've said goodbye to Manhattan and am headed to San Francisco to begin this new chapter.
I recently returned from a trip to San Francisco. While there, I presented to the innovation group of a large insurance company about how startups are applying AI.
This post shares that presentation I gave. In addition to the written presentation, I've recorded an audio version of it here, too.
We all know that ChatGPT can write essays, suggest travel itineraries, and draft emails. These tasks are powerful, but only scratch the surface of what LLMs can do.
In the past year, a wave of startups has emerged, leveraging AI to rethink industries ranging from software development to operations to marketing. These companies are building entirely new categories of AI-driven tools, seeking to disrupt current businesses.
In this presentation, I’ll explore the emerging archetypes of LLM-powered apps: the core techniques, architectures, and approaches shaping the next generation of products.
I’m Philip, and I write a blog called Contraption Company. For the past two years, I've been the CTO of Find AI, a startup building a search engine for people and companies. We’re pushing OpenAI’s technology to its extremes—making over one hundred million requests this year alone—and uncovering innovative ways to apply its power. Today, I’ll share some of those lessons and ideas with you.
By the end of this presentation, my goal is for you to understand how businesses are applying LLMs in practice. With this toolkit of patterns, you’ll be able to identify opportunities in your own company to improve efficiency with LLMs and decide whether it makes sense to build or buy solutions.
Well go through three parts in this presentation, from basic to advanced.
In part one, we'll review building block technologies - like chat, embeddings, semantic search, fine tuning, and some non-LLM tools.
In part two, we'll look at basic applications of LLMs that power most startups - such as code generation, text to SQL, summarization, advanced moderation, text generation, analysis, intent detection, and data labeling.
In part three, we'll review advanced applications of LLMs that represent more frontier applications: retrieval-augmented generation (RAG), agents, and swarms.
First, let's review foundational "building block" technologies that power LLM apps.
Chat lies at the heart of most LLM applications. As we review advanced techniques like “intent detection” and “retrieval-augmented generation,” the underlying interface is still chat: input text is processed by an LLM, which generates output text.
Input text typically consists of both instructions and user-provided data. Hosted model providers like OpenAI and Anthropic charge for the length of the input text, which is measured in “tokens.” The size of the input text varies by model.
Currently, Google’s Gemini 1.5 Pro model offers the largest input capacity, handling up to 2 million tokens—roughly equivalent to the text of 10 books. For example, it can process the entire Harry Potter series in its input and perform tasks like generating a chapter-by-chapter summary of the spells used. However, it’s important to note that recall isn’t perfect, and including large volumes of context can sometimes reduce performance.
The primary models in use today are OpenAI’s GPT-4o, Anthropic’s Claude, and Meta’s LLaMA. All of these models generally offer comparable performance.
Smaller, more cost-efficient models, such as GPT-4o-mini, are also available. These require less computational power, enabling higher throughput on the same hardware. As a rule of thumb, these "mini" models typically cost about 1/10th as much as standard models, but are less accurate.
LLMs include a “temperature” parameter that developers can adjust for each request. This parameter controls the randomness of the output: higher temperatures produce more creative responses, while lower temperatures yield more predictable results.
LLMs output text. Hosted providers charge for the length of the output text. But, most output text is limited to about 14 pages of text. So, output length tends to contribute far less to overall costs than the input length.
While we typically think of output from LLMs as plain text sentences, they can also return structured data using formats like JSON. Providers such as OpenAI have introduced tools to enforce specific output formats, ensuring reliability and accuracy. This capability allows you to transform unstructured data into structured formats or request multiple data points in a single call, streamlining tasks that would otherwise require separate queries.
Among the major model constructors today, OpenAI and Anthropic provide hosted solutions, where the companies manage the infrastructure, and you pay per request. In contrast, Meta’s LLaMA is open-source, giving you the flexibility to run it on your own servers.
Based on our experience using OpenAI’s GPT-4o at Find AI, a useful mental model is that a typical LLM call costs around one cent, assuming standard input and output sizes. However, if you process a large amount of data—such as the full text of all the Harry Potter books—the cost can rise to approximately $2.50 per call.
Hosted model providers offer enterprise-grade support. For example, Microsoft can deploy a dedicated instance of an OpenAI model for you, ensuring privacy, HIPAA compliance, and other enterprise requirements.
Self-hosting a model involves significant complexity, requiring you to forecast capacity, deploy and manage servers, and optimize request routing. Due to these challenges, many businesses rely on vendors to handle these tasks, further blurring the line between hosted and self-hosted models.
For context, one H-100 GPU, often considered the workhorse for high-performance AI workloads and recommended for models like LLaMA, typically costs around $2,500 per month on a cloud provider.
The next building block is embeddings. Embeddings are algorithms used in LLM applications, though they are not themselves LLMs. They convert text into numerical representations that capture its underlying meaning, enabling us to measure the relatedness of text using mathematics.
Embedding algorithms transform text into vectors, which are essentially points in a multi-dimensional space. These vectors encode meaning as a series of numbers, allowing us to determine how similar two pieces of text are based on their proximity in this space.
OpenAI offers some of the most advanced embedding algorithms available today. Their most advanced model returns 3,072-dimensional vectors, can process inputs in multiple languages, and are widely used to extract and compare textual meaning. However, there are many different embedding algorithms, and it’s crucial to use the same algorithm consistently across your text for accurate results.
By measuring the distance between points, we can determine how closely related different concepts are. For example, “cat” and “dog” are closer to each other than “sandwich,” reflecting their greater similarity in meaning. LLM applications leverage embeddings to enable searches based on semantic meaning rather than just keywords.
Historically, search applications have relied on keyword-based approaches to find relevant text. Tools like Elasticsearch and Algolia use this traditional method, often employing algorithms such as Levenshtein distance to measure relatedness. This approach works well for locating exact or similar keywords—for example, searching “dog breeds” might return “list of dog breeds.” However, it might miss relevant results like “poodles” and mistakenly include irrelevant ones like “hot dog ingredients.”
Semantic search represents a new generation of search technology, widely used in LLM applications. Instead of focusing on keywords, it evaluates meaning by measuring the cosine distance between embedded vectors. With semantic search, a query like “dog breeds” would correctly identify “poodles” as relevant while excluding “hot dog ingredients.”
As you explore LLM applications, it’s important to understand that semantic search is a foundational technology powering many of them.
As semantic search becomes integral to many LLM applications, specialized databases for storing and searching vectors are gaining traction. Some options, like pgvector, are free and open source, serving as an extension to the widely used PostgreSQL database. Others, such as Pinecone and Milvus, are standalone vector databases designed specifically for this purpose.
Storing vectors can be resource-intensive because they don’t compress well, and maintaining fast search speeds requires computationally expensive algorithms.
At Find AI, we initially implemented semantic search using pgvector alongside our application data. However, we found that 90% of our disk space and 99% of our CPU were consumed by vector calculations, resulting in slow performance. Eventually, we transitioned to Pinecone's managed vector database optimized for this workload. While it significantly improved performance, it also became more expensive than our primary application database.
A takeaway is that there are infrastructure costs to running LLM applications beyond the LLMs themselves, and these can be substantial.
Fine-tuning is an important concept in working with LLMs. It allows you to take a pre-trained model and further train it for your specific use case. This can be done with both hosted and self-hosted models. One common approach is to fine-tune a less expensive model to perform a specific task at a level comparable to a more costly model.
However, fine-tuning comes with significant trade-offs. The process is often slow and expensive, and it can be difficult to assess whether fine-tuning has introduced negative impacts on the model’s performance in other areas. For these reasons, I typically recommend avoiding fine-tuning until you have a mature AI program. It’s better thought of as a scaling tool rather than a starting point for developing AI applications.
As the final part of the “Building Blocks” section, I want to highlight a few tools provided by Anthropic and OpenAI that, while not LLMs themselves, can play an important role in LLM applications.
Moderation: Both OpenAI and Anthropic offer advanced moderation APIs that can review text and flag potential safety issues. These tools are sophisticated enough to differentiate between nuanced phrases like “I hope you kill this presentation” and “I hope you kill the presenter.” Many LLM applications integrate these moderation endpoints as a preliminary step before executing application logic.
Voice: Speech-to-text and text-to-speech technologies have become quick and reliable, enabling most text-based applications to be seamlessly adapted into voice-based ones. It’s worth noting, however, that most voice-driven LLM applications work by first converting voice to text and then using the same text-based LLM tools discussed here. Essentially, it’s just a different user interface.
Image generation: Image generation has advanced significantly and is a powerful tool often used alongside LLMs. While not directly powered by LLMs, it complements many AI-driven applications, expanding their functionality.
Batch processing: Hosted model providers like OpenAI offer discounts—up to 50% —if you allow a 24-hour turnaround for requests instead of requiring immediate responses. This can be particularly useful for background tasks, such as data analysis. By taking advantage of batch processing, you can dramatically lower costs, especially for tasks that don’t need real-time results.
Next, we'll review some basic LLM applications.
The first major archetype of LLM applications is code generation. LLMs excel at tasks ranging from generating basic functions to making contextual modifications across multiple files and even building full-stack features. By analyzing multiple files as input, these models can maintain consistency and streamline development workflows.
AI-powered code generation has brought a step-function increase in productivity, making it an essential tool for developers. As one CTO of a billion-dollar company told me, “Developers who haven’t adopted AI are now considered low-performers.” While we’ll explore various startups and tools in this presentation, I want to emphasize that AI is no longer a “future” tool in software development—it’s already the standard.
A notable category of code generation is text-to-SQL. AI excels at generating database queries, making it possible for even non-technical users to eailiy ask questions from data stores and warehouses. LLM models can analyze available data structures, including tables and columns, and generate complex, advanced queries. I rely heavily on AI for SQL queries, and there have been instances where it produced queries I initially thought were impossible.
Text-to-SQL can even improve customer-facing applications. Traditional filter interfaces—commonly used to narrow data in tables via dropdowns, typeaheads, and tags—are a staple of CRMs, customer support tools, and similar platforms. These interfaces work by generating SQL queries behind the scenes to retrieve results.
With AI, these cumbersome filter-based UIs are being replaced by natural language input. Users can now enter queries like “Companies in the USA with 50-100 employees,” and the AI automatically generates the appropriate SQL query, eliminating the need for complex and bloated interfaces.
Summarization is one of the core strengths of LLMs. By providing text, you can receive concise, high-quality summaries. Summarizations can also be structured, such as condensing a news article into a tweet or transforming a historical article into a timeline.
Here’s an example of an email newsletter created using my software, Booklet. It analyzes new posts and discussions in a community and generates all the content automatically. The subject line, titles, and summaries in the email are all generated by AI. This newsletter is sent to thousands of people daily—completely automated, with no human intervention.
Earlier, I mentioned that model providers offer free safety-focused moderation tools. However, LLMs can also be leveraged to build more advanced, rule-based moderation systems. For example, in a customer support forum, you can provide the community rules to an LLM and have it review posts to ensure compliance. These automated community management systems are quick and reliable.
Interestingly, most moderation applications also prompt the LLM to provide a reason for its judgment. Asking the model to explain its decisions not only adds transparency but often improves its accuracy.
The next archetype is text generation, where LLMs excel at creating new content. One particularly effective use case is combining two existing documents into a cohesive new one. For instance, if you have a document titled “How to File a Reimbursement” and another titled “How to Add Your Child to Your Account,” you can prompt an LLM to generate a new article, such as “How to File a Reimbursement on Behalf of a Child.”
Text generation is a key feature in many marketing startups. For example, LoopGenius leverages LLMs to automatically generate, test, and refine Facebook ads. Tools like Copy.ai and Jasper focus on creating content for marketing pages, helping businesses improve their SEO strategies.
However, AI-generated content is flooding the internet. It’s now easier than ever for companies to add millions of pages to their websites, leading to an oversaturation of material. As a result, it’s likely that Google will adapt its algorithms to address the proliferation of AI-driven content.
The next archetype is analysis, where LLMs can evaluate data and provide decisions. For example, you can ask ChatGPT to compare a job description and a resume to analyze whether a candidate is a good match for the role—and it performs this task remarkably well.
At Find AI, we also leverage analysis. When you run a search like “Startup founders who have a dog,” the system asks OpenAI to review profiles one by one and determine, “Is this person a startup founder who has a dog?”
Currently, recruiting is one of the most common use cases for analysis. Many companies rely on AI for initial applicant screening, significantly streamlining the hiring process. Applicant AI is one example, but many similar tools are emerging in this space.
Intent detection is one of my favorite applications of LLMs. We’ve all encountered traditional phone menus that say, “Press 1 for new customer enrollment, press 2 for billing, press 3 for sales,” and so on. AI can replace this process by simply asking, “Why are you calling?” and then routing the caller to the appropriate department. This technique, where AI maps a user’s input to a predefined set of options, is known as intent detection.
Intent detection is a foundational technique widely used in more advanced AI applications because it enables systems to navigate decision trees. Many customer interactions are essentially a series of decisions, and LLMs can make these processes feel seamless by converting them into natural language exchanges. At Find AI, for example, every search begins with an intent detection step, where we ask the LLM, “Is this query about a person or a company?”
Call centers have been early adopters of intent detection, integrating it into customer support and sales workflows. Companies like Observe AI and PolyAI are reimagining these functions with solutions that blend the strengths of LLMs and human agents.
LLMs are increasingly being used in analytics for data labeling, a critical task in tools like customer support and sales systems. Tags and labels help track things like feature requests or objections, tasks that previously required customer support agents to spend significant time manually tagging conversations. Now, LLMs can automate this process entirely.
This capability is particularly useful for analyzing historical data. For example, you could instruct an LLM to review all past customer conversations and identify instances where a company requested an API.
At Find AI, we use LLMs to label every search after it’s run, applying tags like “Person at a particular company” or “Location-based search.”
Data labeling also pairs well with the Batch processing capability discussed earlier. By allowing up to 24 hours for a response, you can significantly reduce costs while efficiently processing large volumes of data.
Building LLMs required massive amounts of human-labeled data, leading to the rise of companies over the past decade that specialize in data labeling, such as Scale AI and Snorkel AI. Interestingly, many of these tools, which were once entirely human-driven, have now evolved to incorporate both AI and human-based labeling systems. As a result, there is now a robust ecosystem of reliable tools available for data labeling, combining the efficiency of AI with the precision of human input.
In the final section, we’ll explore advanced applications of LLMs, focusing on complex and cutting-edge techniques at the forefront of AI development.
The first advanced technique we’ll cover is retrieval-augmented generation (RAG). This approach enables an LLM to retrieve relevant information to improve its responses. After a user inputs a query, the LLM retrieves specific data, feeds it into the model, and generates a more accurate output.
A common use case for RAG is improving help documentation. For example, if a user asks, “How do I submit an expense report?” we want the LLM to access relevant documents about expense reporting to provide the correct answer. However, including all help docs in every query would be prohibitively expensive, and overwhelming the context with too much information could decrease accuracy.
The goal of RAG is to retrieve and include only the most relevant documents—perhaps two or three—to assist with the query. This is achieved using the foundational technologies of embeddings and vector databases
Here’s how most RAG applications work: beforehand, all data (such as help docs) is broken down into smaller chunks, like paragraphs. Each chunk is embedded and stored in a vector database. When a user asks a question like “How do I file an expense report?” the system retrieves only the most relevant articles from the database. By feeding this targeted information into the LLM, RAG enhances the response.
RAG is foundational to many LLM applications today because it allows companies to incorporate unique, business-specific information into responses while keeping costs manageable. This technique is already widely used in customer support tools, such as Intercom’s chatbots, and powers other AI-driven applications like Perplexity AI.
In many ways, RAG is the core method businesses use to tailor AI systems to their specific logic and needs.
The next advanced technique is Agents, which have become a hot topic in the AI space. If you visit a startup accelerator today, you’ll likely find a dozen startups touting their agent-based solutions, many of them raising millions in funding.
The definition of an agent remains somewhat fluid, but I like the one from the Quick Start Guide to Large Language Models: an agent is an LLM with access to tools. These tools define the agent’s functionality and can, in theory, be anything.
The most popular agent today is ChatGPT. If you ask ChatGPT about the tools it has access to, it will list: Bio for memory, DALL-E for image generation, Python for executing code, and Webfor internet searches. This is also the key difference between ChatGPT and the OpenAI API: these four tools are not available to API users.
Developers can create tools for agents using code, enabling a wide range of functionalities—from retrieving data and submitting forms to processing refunds. These tools can incorporate user-specific context and include safeguards and limitations to ensure proper usage.
LLMS can now even interact with computers, extending their capabilities beyond traditional tasks. Robotic Process Automation (RPA) has long allowed developers to automate actions like browsing websites or performing operations. However, agents are taking this further. For instance, Anthropic’s new Computer Use feature gives LLMs a computer, allowing them to performing tasks such as web browsing, clicking buttons, and responding to error messages.
This advancement has significant implications. Compared to traditional RPA tools, agents are less fragile and far more adaptable, making them better suited to dynamic and complex workflows.
Agents represent the cutting edge of AI today, with startups equipping LLMs with a wide range of tools to tackle complex tasks. Veritas Labs is developing agents to automate healthcare operations and customer support. AiSDR has created a virtual salesperson that autonomously finds leads, sends emails, responds to customer inquiries, and schedules meetings. Meanwhile, Cognition AI has introduced Devin, touted as “the world’s first AI software engineer,” capable of accepting tasks and writing the code needed to complete them.
Agents are pushing the boundaries of LLM technology, enabling some of the first fully autonomous LLM applications.
The final advanced application I want to discuss is the concept of Swarms—AI agents that collaborate to achieve a shared goal. OpenAI introduced this idea, along with the name, through an open-source project called “Swarm.” The core concept is to have a team of specialized AI agents that work together, each focusing on specific tasks.
For example, imagine a swarm designed for handling expense reports. One agent could guide users through submitting expense reports, another could review and approve them by accessing relevant data (like past reports or messaging team members), and a third could handle reimbursements, including sending payments and updating bookkeeping. By dividing tasks among multiple agents, you can enhance safety and control—such as ensuring the expense review agent only processes documents and doesn’t access subjective information from the submitter.
Swarms represent the near future of generative AI applications. As agent platforms mature and standards for agent collaboration emerge, the adoption of swarms will likely become widespread, unlocking new possibilities for AI-driven workflows.
The goal of this presentation was to help you understand what people are actually doing with LLMs.
We covered building blocks, such as chat, embeddings, and semantic search. Then, we explored basic applications such as code generation, summarization, moderation, analysis, intent detection, and data labeling. Finally, we explored advanced applications - such as RAG, agents, and swarms.
Understanding the archetypes of LLM applications can help you identify opportunities to improve business processes and workflows with AI. Additionally, the discussion around hosted versus self-hosted solutions, along with potential vendors, should equip you to make informed decisions about when to build versus buy and how to evaluate the sophistication of various tools.
In software engineering, AI is already the present—not the future—and I believe we’ll see this same transformative impact extend across many other functions and industries. Thank you for taking the time to explore these ideas with me today.
APIs allow developers to interact with a product using code. There are many different ways to build an API, and many tools to make it easier for customers to adopt. In this post, I’ll take you behind the scenes of how we built the Find AI API, from its technical foundations to the tools we used to simplify developer adoption.
I'll start with the end customer experience, then work our way back to the internal architecture.
Explainer video
Here's a video I created to explain how to use the Find AI API:
I recorded the video using a DJI Pocket 3 with their lavalier mic and edited it in Descript. I recorded almost an hour of footage - so I spent a lot of time editing the video to be as short and clear as possible. Descript's text-based editing tool, originally designed for podcasts, made it easy to scan through retakes and figure out which was best.
Video tutorials help people understand the end-to-end integration process of an API before diving into detailed documentation. Every customer who has integrated with our API started by watching this video, so it was a good use of time.
Generated client libraries
In the video, calling the API looks straightforward because you can install a Find AI client library and call import FindAI from "find-ai" to make requests. Client libraries save developers time by abstracting away boilerplate code and reducing the need to read extensive documentation.
We provide official client libraries in Python, Node, Ruby, and Go, making integration accessible across multiple programming languages.
Companies like Stripe and OpenAI have set a high standard, making libraries a key part of their developer ecosystems. So, developers now expect client libraries in multiple languages whenever integrating with a new API. Writing and maintaining these libraries manually, however, would be both time-consuming and error-prone.
This is where Stainless comes in. Stainless reads our OpenAPI specification and automatically generates client libraries in multiple languages. The tool was created by Alex, who built similar systems at Stripe, and it now powers libraries for OpenAI, Anthropic, and Cloudflare.
For the Find AI API, every single user I’ve spoken to relies on one of our Stainless-generated client libraries. If you’re building an API, providing robust client libraries isn’t just a nice-to-have—it’s the new standard.
Interactive docs
Clear documentation is the foundation of a great developer experience. It helps users understand how to interact with your API and what data they can send or retrieve.
Initially, we used Swagger to generate our API documentation. Swagger reads an OpenAPI spec and creates interactive docs, allowing users to input their API key and test endpoints directly. This interactivity is a fantastic way for developers to use an API before writing any code. It’s also what I use in the explainer video. Our Swagger docs are still available at usefind.ai/api/docs.
However, Swagger had limitations. Its design felt dated, and it wasn’t easy to add additional text or media to guide users through setup or multi-step calls.
To address this, we switched to Mintlify for our primary documentation. Mintlify offers the same interactive features as Swagger but provides more flexibility for customization. For example, we embedded the explainer video and added step-by-step guides to explain each function in detail.
When designing the Find AI API, we opted for a RESTful architecture. While newer paradigms like GraphQL and gRPC are gaining popularity, we chose REST because the ecosystem has largely standardized around OpenAPI for documenting APIs.
The OpenAPI specification serves as the backbone of our API ecosystem. It’s a machine-readable file that defines what the API can do. When we update the spec, tools like Stainless automatically regenerate client libraries in multiple languages, and our documentation on Mintlify and Swaggerupdates automatically.
This unified workflow ensures that our API is consistent and always up-to-date for developers.
Usage-based billing
One of the key architectural decisions we made was to adopt usage-based billing. We wanted our pricing model to reflect the value provided to users. For example, if a query requests 100 matches but only 50 exist, the user is billed for 50. This ensures fairness and aligns costs with usage.
To implement this model, we used Stripe’s usage-based billing. Setting up usage tracking and integrating with Stripe was surprisingly straightforward. Customers simply add a credit card to begin using the API, and Stripe charges their credit card weekly based on their usage.
This approach has worked well for our customers and ensures a seamless payment experience while scaling with their needs.
Demo mode
Another important decision was to include a demo environment in the API. Since using the full API requires adding a credit card, we wanted to provide a way for developers to experiment with the product risk-free.
To achieve this, we allow developers to issue a demo-mode API key. This key returns placeholder data (e.g., results like example.com) without incurring any costs. It’s particularly useful for mimicking the API’s functionality in development environments.
Looking at our analytics, however, demo mode hasn’t been widely used. Most developers were comfortable testing with the production API and seemed hesitant to rely on test-mode data. If I were to rebuild the API, I’d likely skip the demo mode entirely.
Try it out
If you want to incorporate search of people and companies into your application, check out the Find AI API.
Recent events have reminded me of a phrase I’ve long used in the startup world: “Having a TJ.”
Before Staffjoy became a company, it was just a side project. Our first user was TJ, whose biggest challenge was scheduling his workforce. Every week, TJ would meet with us to explain his problems. We’d show him what we were working on, and he’d provide invaluable feedback. TJ became the lifeblood of our startup—a real person with a real problem, collaborating with us to find a solution. Over time, “TJ” evolved into a metaphorical persona representing our customer base: “What would TJ want?”
Our minimum viable product at Staffjoy involved just emailing spreadsheets of schedules back and forth with TJ. Despite its simplicity—and perhaps clunkiness—he was happy to use it because we were addressing his core workforce management issues. TJ wasn’t distracted by unnecessary features; he cared about solving his problem.
With TJ’s help, we built an app, got into the Y Combinator Fellowship, raised a seed round, and helped more customers. TJ’s feedback and enthusiasm were instrumental in guiding Staffjoy from an idea into a venture we worked on for two years.
Many startups fail to secure even a single customer or create something that one person genuinely wants. Having a “TJ” keeps a company focused on solving real problems for real people. Individuals like TJ validate assumptions, offer honest feedback to prioritize work, answer spontaneous questions, and become references for future customers. They confirm that the company is tackling a genuine need. Once you’ve built something that satisfies TJ, you can seek out more customers like them.
In other companies I’ve been involved with, there’s always been that “TJ”—the first customer who has a problem, collaborates on the solution, and then champions your product. If you’re building a startup and don’t yet have a passionate user, I recommend focusing on finding that early adopter who can provide feedback. If you can’t find such a user, perhaps you’re addressing the wrong problem.
Later, as the industry shifted amid consolidations and shutdowns, TJ was laid off. Responding to the market dynamics, we pivoted, but we struggled to find another TJ and ultimately shut down. Losing a “TJ” can be a canary in the coal mine for a startup.
A passionate early customer keeps a startup team motivated and working on the right thing. Most startups focus on growth too early and fail to make something that a single customer wants. The TJ lesson is that a successful product starts with one customer, and that one customer’s love of the product is rooted in a problem they desperately want your help solving.
It's better to have 100 users love you than 1 million kinda like you. The true seed of scale is love, and you can't buy it, hack it, or game it. A product that is deeply loved is one that can scale. - Sam Altman
Earlier this year, I attended a talk in NYC by Vinay Hiremath, co-founder of Loom. He explained a mental model that's stuck with me.
Here’s the model: When a startup competes with an incumbent, it has an innovative product but seeks distribution. The incumbent has distribution—all its customers—but seeks innovation. So, they race: the startup tries to capture the incumbent’s customers before the incumbent can develop a better product.
Sometimes, the innovator wins, such as when Google surpassed Yahoo or the iPhone overtook BlackBerry.
Other times, the incumbent prevails. In the case of Slack vs. Microsoft Teams, Microsoft Teams now reports about ten times as many daily active users as Slack. Salesforce has also stood the test of time against many innovators.
Some ongoing races include Linear vs. Jira and ChatGPT vs. Google.
To win with innovation, small companies need to be hard to copy (like Figma), have strong network effects (like Facebook), or be ignored by incumbents (such as Lyft eschewing taxi laws).
Big tech companies should not be underestimated. They have become skilled at building products and often let startups do the hard work of validating new markets before they compete. They sometimes engage in tactics that are unethical and potentially illegal, such as cloning features to stifle emerging competitors—a strategy Instagram notoriously employed against Snapchat and later TikTok. These actions often go unchecked because if the incumbent dominates the market, the startup may not have the resources or time to pursue legal action.
I often think about this model because it applies well to many markets. As a startup, you should always ask, “Can somebody just copy this?” As an incumbent, you should ask, “Are we nimble enough to keep our product competitive?” Either way, the first step to winning a race is recognizing that you’re in one.
This week, I presented at the Mindstone AI meetup in NYC about internal tools we built at Find AI. We use OpenAI extensively to build a search engine for people and companies - making millions of daily LLM requests.
In this presentation, I covered two internal tools we built to improve our understanding and usage of OpenAI. The first is a semantic search engine we built on top of OpenAI Embeddings to understand the performance and accuracy of vector-based semantic search. The second is a qualitative model evaluation tool we built to compare the performance of different AI models for our use cases. These tools are internal research products that have never been shown publicly.
Earlier this month, I traveled to the Alsace wine region of France to explore the craft of wine. Their harvest season had just officially kicked off, so winemakers were beginning to pick grades and produce their 2024 vintage.
I love finding people that focus on mastery of one skill. Winemaking is one of the classic crafts, and the Alsace region is a historic region filled with tradition. Many of the winemakers came from a multi-generational lineage of producers.
Even amid the tradition and rules, I saw innovation. In a region known for its white wines, four producers had successfully lobbied for the government to award grand cru designations to their Pinot Noir wines. I visited some of these producers and felt their renewed sense of autonomy.
I brought a DJI Pocket 3 camera to document the visit and turned my footage into a little video about a day in Alsace. Take a look:
At Find AI, we use OpenAI a lot. Last week, we made 19 million requests.
Understanding what's happening at that scale can be challenging. It's a classic OODA loop:
Observe what our application is doing and which systems are triggering requests
Orient around what's happening, such as which models are the most costly in aggregate
Decide how to make the system more efficient, such as by testing a more efficient model or shorter prompt
Act by rolling out changes
Velvet, an AI Gateway, is the tool in our development stack that enables this observability and optimization loop. I worked with them this week to produce a video about how we use data to optimize our AI-powered apps at Find AI.
The video covers observability tools in development, cost attribution, using the OpenAI Batch API, evaluating new models, and fine-tuning. I hope it's a useful resource for people running AI models in production.