July 10, 2024

The Future of Human-Machine Interaction

The next generation of tools will intuitively understand who you are and what you want to accomplish. We envision a future where we can abstract tools and programming languages to a declarative state ("what you want done") and use personalization modules to derive imperative implementation ("how it's done").

‍

"It is our job to create computing technology such that nobody has to program. And that the programming language is human, everybody in the world is now a programmer. This is the miracle of artificial intelligence."
Jensen Huang

‍

To achieve this goal would be a technological renaissance equivalent in magnitude to the advent of personal computing. It would put any conceivable software program directly in the hands of individuals creative enough to imagine their utility. Isn’t this a future worth chasing?

We certainly believe so. The idea of a programming language being human is an idea much more nuanced than it might appear on the surface. It goes beyond using a Latin script with English grammar or a logographic script with Chinese phonetics. Within it contains the crucial idea of human-human alignment—this is something we deal with every day in the form of communication. Even between humans that speak the same language, this alignment is a non-trivial problem. The same words that come out of a manager’s mouth may be interpreted by an employee as something entirely different. It is only through trial and error and an alignment of expectations and perspectives that we reach a cooperative understanding.

This is the same scenario for human-machine alignment. Solving this problem is at the core of what it means to have a human programming language and would birth a plethora of tools that enable humans to directly interface with computers without needing opinionated or restrictive translation programs.

So why doesn’t this exist today? Why have we invented thousands of programming languages, tens of thousands of frameworks, and countless UI wrappers for these languages rather than just use human language?

Background

The answer to this question lies in ambiguity and abstraction.

The underlying hardware used by computers is rooted in a precision system that only speaks its native language of 0s and 1s. It lacks the concept of ambiguity, expecting instructions to be prescriptive and self-encompassing. In contrast, natural language is filled with ambiguity. Implied cultural norms, interpersonal dynamics, and individual biases litter text and speech in the form of assumptions and veiled implications. The meaning of human language is heavily context dependent and is just as much about what is written as what is not.

The systems are by nature incompatible. Something as simple as asking a student to add 4 and 5 is already working in assumptions. Most people perform the operation assuming the decimal system and expect the answer to be 9. But 11 is just as valid an answer in the octal system or 1 if modulo 8 arithmetic is used. For hardware, these assumptions cannot be made - they need to be explicitly stated.

To convey this complexity to machines, programming languages act as a bridge between the ambiguity-prone, context dependent nature of human language and the precision required by computers to execute tasks. Languages like C and SQL serve as a shared intermediary that can both be written by humans as a more concise subset of natural language as well as compiled into the 0s and 1s required by the machine. It’s a compromise.

But if we make the parallel to human communication, why would two humans that share no common languages seek to communicate by both learning a foreign language that neither party speaks?

Human Speaking the Language of the Computer

Why not have the engineer learn to speak the language of the computer - machine code?

This is certainly done. It’s a time consuming process that involves telling the hardware every instruction that needs to be executed: from loading registers to branching. Imagine needing to use 6 instructions to add two numbers together. How many instructions would it take to build a web application? The benefit of speaking a language the computer speaks is that you have absolute control over what the machine does, but the tradeoff is that it’s massively time consuming. Programming in machine code is akin to designing proteins one nucleotide at a time. It’s certainly possible and gives you absolute control, but for this task, using a higher level representation—amino acids—makes it much simpler. Programming languages are all about abstracting away the details that the user doesn’t need to know. A Python programmer building a simple web server doesn’t need the ability to handle garbage collection in the same way that an operating system, written in C, needs to carefully allocate and de-allocate memory. We invent different languages at different levels of abstraction to allow the user to focus on what’s important to them.

So can’t we just abstract higher and higher and eventually get to natural language? Let’s take a brief look at imperative and declarative programming languages for an answer. Declarative programming languages like SQL are typically domain specific, thereby narrowing the scope of use and abstracting to the level where the user just has to specify “what” they want done. Imperative programming languages like C are more general purpose and allow the user to specify “how” they want something done. As an example, if you want to fetch data from a database, you typically don’t care how the data is fetched so long as it is relatively performant and you get the right results. As such, a language like SQL makes an opinionated decision on how it will carry out a task. The user cannot request a different way to do so, but this is ok because it’s generally unneeded. You also can’t write a web server with SQL because it’s been purpose built as a language for databases.

When working with natural language, the issue is that by default it can be either imperative or declarative. Taking the database example, a user can write any number of possible queries at varying levels of detail:

“Get me all rows in the database that are related to cats” (declarative)
“Make a connection to the database using XYZ library, initialize a list variable, then using a for loop over every row in the database, check if the row is related to cats by using a cosine-similarity comparison between the vector column in the database and the vector representation of a cat stored in file XYZ” (mostly imperative)

How do you interpret the natural language query? Do you ignore the imperative details? What if the query is truly ambiguous? How can you have a general purpose declarative language?

These are all design choices that need to be considered when building a human programming language and just like with modern programming languages, we believe there will be just as many tools for a diverse range of purposes. That said, the one shared thread between them all is the idea that they require the user and the machine to be aligned on what should be done.

Computer Speaking the Language of the Human

This brings us back to the concept of learning to speak the other language—this time, computers learning to speak the language of humans. None of the hardware limitations that we’ve talked about have significantly changed over the years, but what has changed is the advent of language models that have the ability to act as a sophisticated compiler between human language and the world of bits and bytes.

As we’ve mentioned before, language is nuanced—how then do these modern day large language models bridge this gap?

To start, language models train on a massive amount of data to predict what a human would say in numerous contexts. They learn the cultural and context-dependent responses from thousands of Reddit posts and internet forums. They learn the intricacies of the English language, customs of formality, and a wealth of facts. They are statistical machines that build an understanding of the world based on the distribution of knowledge available on the internet. Beyond understanding the world, these models have undergone numerous alignment techniques such as RLHF, instructing fine tuning, and prompt tuning in order to act faithfully to the designer’s intentions. We have seen this in the form of models that refuse to generate toxic content, models that are never belligerent despite being asked numerous inane questions, and models that attempt to replicate the speech patterns of famous figures.

The task of AI alignment is a big research goal with even bigger implications. Some researchers focus on alignment as being centered around encoding the collective set of human values and goals into these language models. But if no one can even define a singular set of human values or expectations, how can we measure success?

We think the future of human programming has a more interesting path in the idea of alignment to the individual. Each person has a different perspective and a way of thinking about the world - if you average out all these perspectives, you get a machine that is presumably less creative, less uniquely human than a single aligned machine. The precise modeling of a singular thought process and the constraints this imposes drive innovation rather than hinder the model’s ability.

‍We aren’t trying to replace humans, we’re building tools that intuitively understand who you are and what you really want done rather than have you learn the intricacies of the tool and work with the limitations of the machine.

Where We’re Headed

To begin this journey, we’re diving deep into the field of consulting—an area that is all about human-human alignment with lots of inefficiencies in the back and forth communication cycles. We think this is the perfect testing ground for understanding human intent and aligning to a machine. We conduct the typical iterations in a consulting process with humans in the foreground and replicate all of them with a machine behind the scenes.

We don’t aim to replace the experts in the field—the years of experience dealing with clients is invaluable. But we can definitely lighten their load by properly understanding client expectations, speeding up the feedback cycle, and offloading simpler tasks to AI agents. If we model both the expert and the client we can build a de-facto expectation alignment layer that streamlines the process.

Instead of an iteration cycle taking three or four back-and-forth emails, the new cycle can be as simple as a client specifying requirements to a web application, having the changes instantly propagated, and the client commenting on the novel output and explaining why it meets or doesn’t meet requirements and how they came to that conclusion. This is a lofty goal—if the client wants a complex web service built, this goes back to idea of “how can human language be declarative?”. How can we map out all of this functionality?

To start, we’re working with a more manageable use case. The specific area that we’re currently targeting is data transformation in the procurement and healthcare sectors. By data transformation, we refer to the general set of extraction, categorization, and labeling tasks that are often outsourced to Mechanical Turk and contractors in other countries. Many of these tasks can be attempted with language models but often fall short in ability to deal with out of distribution data and lack of exact metrics to optimize for, often requiring a human expert’s opinion.

The end goal is still being explored - regardless of whether we build a set of tools for these specific verticals or a more general human-machine alignment and personalization platform for others to use during user onboarding, we’re very excited about the potential for a next generation set of tools powered by the tenet of human-machine alignment.

‍

Let us know your thoughts! If you are interested or have use cases you’d like to talk through, we’re all ears. Reach out to us at lucas@tesselai.com!

‍