I’ve gotten it out the door. The AI focused MVP built for Third Bridge Creative is up and running.
https://hearincolor.com/

So what’s going on here?
In terms of data, I found the 18,393 Pitchfork Reviews dataset on Kaggle. It was a SQLite database which needed manipulation which was carried out within a Jupyter notebook. I’m finding Jupyter notebooks as being a great vehicle to do this type of work, so using the same notebook I wrote a script to chop up the data then tokenize it. Tokens are the units LLMs use to process information.
Again, working within the same notebook I took that processed data and leveraged OpenAI’s embeddings model to build … you guessed it … the embeddings. From OpenAI,
An embedding is a vector (list) of floating point numbers. The distance between two vectors measures their relatedness. Small distances suggest high relatedness and large distances suggest low relatedness.
There are semantic relationships between datapoints in the vector database. It’s the measurements of distance between data points that the LLM generating the finished product utilizes to build coherent output. The digested data was then sent to vector database, purpose built for this type of data that I’d setup on on Pinecone.
I’m guessing only the nerds are with me at this point, so let’s keep moving.
Now with the data massaged into place, I turned to the task of actually interacting with it, pointing GPT-3.5 at it to do its waving of hands. The app at this point was a Next.js frontend, and within the same codebase I’d setup a FastAPI backend. The backend was built to sift through the Pitchfork review, pull the top 3 most relevant articles, and using prompt engineering, was finessed into a prompt then sent to the LLM.
The selected articles were used as context that GPT understood and pulled from in order to structure and present it’s own creative assessment and writing. I know I’m moving fast here. This post is too long.
A decent amount of Langchain was utilized for the juggling of that data, from the db search through the prompt engineering and being passed to GPT.
There’s quite a bit more I want to do with the app. ThirdEyes is currently a semantic search type app. I’d like to leverage what I’ve got for more of a creative writing task. I’ve built out a backlog of ideas that I’m excited to run past Sam and Bear. I hope to speak with them in the next couple days.
More on the way. If you’d like a peek behind the curtain, let me know. I’m considering putting some of the code I wrote up to make it available for people. I’m just hesitant to give away state secrets just yet.
Hit the app, and break it please. Let me know what works for you, and especially what doesn’t.
