It’s hard to remember that the Generative Era is barely more than a year into its fledgling life since LLMs were cast upon us, and the tremendous excitement that GPT-4 started delivering with all the dramatic improvements helping us create new content and data.
However, our new research* covering a quarter of the Global 2000 shows things have not moved as fast as many of us were expecting, with only 5% of enterprises committing significant technology spend on GenAI and successfully deploying GenAI solutions across multiple parts of their business, and two-thirds doing practically nothing:
So, what will spark enterprises to move faster with GenAI and keep it from the graveyard of so many previous technology “innovations”?
The big problem with any type of new tech, since John Mauchly initiated modern computer programming in 1949 with Short Code, is that business folks dump it into the tech people to figure out and implement for them. While they get excited at the tech’s potential impact on their businesses to be slicker, smarter, and more competitive, they do not believe they need to get pulled deep into the technology to understand exactly how it can do it and what the business needs to do to exploit it.
This is why so many businesses got sold down the river with cloud migration during the pandemic and RPA just prior. They bought into the vision the technology firms were selling but gave it to technologists to implement that vision without changing how they ran their businesses to drive that change.
As Microsoft CEO Satya Nadella recently bemoaned, the uptake of AI “hinges on other companies doing ‘the hard work’ of changing their cultures.” Easier said than done, Satya, but perhaps Microsoft needs to invest in fast-track change management services to help your clients buy more CoPilot licenses?
Enter GPT-4o… another iteration of GenAI that just took things to a much more human level
The one thing that has been consistent since ChatCPT 3.5 was launched in November 2022 has been the continual proliferation of LLMs and the capabilities of the technology. However, it is the latest iteration of GPT that makes the biggest advancement yet and will surely wake up the majority of enterprise leaders as we pointed out when GPT-4 hit the streets last year.
- Multimodal makes everything much more human. The thing I am loving about GPT-4o, apart from being twice as fast as GPT-4 is the “omni” or multimodal capability to bring text, vision and speech into the same neural network. With GPT-4, these were processed separately, with voice being transcribed into plain text, which erases the nuanced information from the LLM; so all the tone and emotion captured in an audio format are now reduced to plain boring text. Net-net GPT-4o can process images, audio, video, and text simultaneously. GPT-4 could only process text and images. In effect, old GPT was like texting a friend, GPT-4o is like calling a friend.
- Real-time human-2-machine conversation is now possible. In short, we are able to converse naturally without first converting words to text, with real energy, emotion, and expressiveness. We’ll also be able to interrupt it, have it change its tone of voice, and respond with emotion. The whole nature of collaboration with machines has gone to a new level.
- Enhanced multilingual support and capabilities. GPT-4o has greatly improved the quality and speed of ChatGPT’s international language capabilities compared to previous models. It can communicate fluently in dozens of languages, making it accessible to more users globally. The model demonstrates more robust performance in non-English languages and translation tasks. Combined with its human-like chat and collaboration, surely the excuses to invest in generative customer engagement are moot?
- It really does have human eyes now. GPT-4o can read the expressions on people’s faces and judge their emotions by simply pointing your iPhone camera. This thing really does have eyes that process what we see beyond transactional images. While GPT-4 had optical potential, GPT-4o is making AI optical capability much more real.
- It’s being incorporated into Apple’s iPhone and Google’s Android operating systems. The earlier version of ChatGPT Voice available in the iPhone and Android app allowed you to converse with the AI in a relatively natural way — but it wasn’t listening to what you were saying; rather, it converted it to text and analyzed that instead. Hence, Siri and Google Assistant should soon be becoming much more human than their current transactional forms, which most of us turn off because they’re just so useless.
- Summaries are concise and relevant. GPT-4o provides summaries of conversations and searches that are very accurate in both tone and length, while GPT-4 often produces inaccurate language and tone which require a lot of supervision to get right. Is this finally the end of legacy Google search and bad call transcripts? Surely disruption of legacy text strings is now in full play?
- Visual interpretation and data tables are much more usable and accurate, ready to support business needs. It accurately converts image data into a clean table format without misinterpretations. It is precise in converting text and data, while previous versions made a lot of inaccuracies. Research capabilities are more detailed, provide more accurate breakdowns of data and analysts, and provide real practical examples. Do we really need to keep relying on clunky old data and analytics tools that require so much manual manipulation to get what we need?
- Image generation capabilities are just so much sharper. GPT-4o is more visually appealing and produces conceptually accurate images. It is much more usable for enterprise projects needing high-quality visuals than what we have experienced using the current versions of Dall-E (for example). GPT-4 gave us a taste but now we surely we now are seeing the potential to create content ourselves without the need for expensive agencies and outdated complicated software packages?
- The cost of accessing its APIs is 50% cheaper. OpenAI has clearly realized its costs are holding back wary enterprises and is now pushing 50% less cost for many of its core APIs, such as Chat Completions API, Assistants API, and Batch API. Are we finally going to be freed from decades of legacy software, abhorrent license fees and meaningless code bases?
- Coding is vastly improved. So far, many developers are lauding the improvements in GTP40’s ability to solve many coding projects, such as multiple thousand lines of code in under 10 minutes, which previously took prompt engineering processes many hours. It can also create multiple apps in Python that the previous version struggled with. According to one developer, “4o not only solved it and provided clear concise dissection of the solution. 4 can be easily tricked into going down a death spiral it does not know how to backtrack correctly. OpenAI did incredible improvements to 4o. I can see Model 5 gonna start to get rid of human programmers for good.” We recently discussed how GenAI is already making radical improvements to human-heavy legacy code development, and these new advancements are reinforcing the end to legacy coding as we know it.
The Bottom-line: Just as we were giving up the ghost on GenAI, it becomes more human than ever
I am one of the biggest cynics when it comes to tech innovation and business change because of one reason—there needs to be a bloody great burning platform to force businesses to adopt. With GPT-4o, many of the reasons for murdering the technology in this death spiral of a thousand pilots have been the inability to adapt it to so many business scenarios. Ambitious C-suites will clamor louder than ever to see this AI tech immersed into their organizations and will seek leaders to defrost their frozen middle ranks to make this happen for them. Your job may not be replaced directly by AI, but you will more likely be replaced by someone who knows how to use AI if you don’t wake up and get with the GenAI program.
To conclude, I will go back to the main excitement behind GenAI… it is disruptive because it helps us create new data and new content. But it needs to become an extension of our humanness to do that, not merely another technology tool that can add some value in bits and pieces. Having multimodal capability that brings speech, text, video, and content together in one neural network that we can communicate with in real-time and immerse into our day-to-day activities is the game changer we have been unwittingly waiting for. Now it is here, and we can only imagine how quickly this will keep evolving as OpenAI, Google, Anthropic, Apple, Microsoft, NVidia, and co keep pumping all their investments into this emerging tech.
* The survey was conducted in collaboration with Genpact. We will be releasing the full study on 5/22
Posted in : Analytics and Big Data, Artificial Intelligence, Automation, Buyers' Sourcing Best Practices, ChatGPT, Cloud Computing, Customer Experience, Digital OneOffice, Employee Experience, GenAI, Generative Enterprise, Global Business Services, GPT-4, GPT-4o