You're on a phone call (on your Android phone, let's be specific). Some suspicious “bank executives” call you and tell you that a fraudulent transaction is being attempted through your account and advise you to transfer the money to another account to keep it safe. Is a miracle of humanity at work, or is it just malicious intent? Your Android smartphone will alert you in real-time during a call about “possible fraud” attempts, including specific details. Banks will never ask you to move your money to keep it safe. ” to end the call. Notification options. It's the world Google envisions using artificial intelligence (AI). Powering this is Gemini Nano, the smallest on-device of all Gemini AI models.
PREMIUM Alphabet CEO Sundar Pichai speaks at the Google I/O event on Tuesday, May 14, 2024 in Mountain View, California. (AP Photo/Jeff Chiu)(AP)
If this week's I/O 2024 keynote was on full display that Google doesn't want to appear left behind in its bigger AI investments, it's likely thanks to the fast-paced competition. Masu. OpenAI's GPT is credited with kickstarting his AI era a little over a year ago, and despite some management turmoil last year, he has never looked back.
With everyone playing catch-up, Microsoft wisely decided to partner with OpenAI and invest over $10 billion to acquire GPT as the basis for Copilot Assistant in Microsoft 365.
One day before Google was gearing up and ready to talk about its Gemini update and vision for AI on Android, OpenAI's curveball was a new iterative GPT-4o model. Don't get me wrong. This is a significant step forward in accepting video and audio input, far beyond the text we already know we're proficient at.
It becomes an assistant that can read your facial expressions and tone of voice and conduct conversations in context, much like a smart enough human would translate what they read or hear in real time. , you can also see the world through your eyes. As well as talking about cell phone cameras. Should I be concerned about Google Lens and Microsoft Translator?
You can even talk to the AI, discuss the questions on your mind, and rehearse the performance you have to do. AI will be your companion. At this point, my mind went back to her OpenAI partnership with Be My Eyes. Be My Eyes plans to upgrade from GPT-4 to GPT-4o as the basis for guidance for visually impaired users. This is a concrete use case and will definitely have a positive impact on the world.
This is what I have been striving for. Generative AI, whether on-device or relying on the cloud, is now incredibly powerful and therefore more capable than we imagined. Models are becoming increasingly smart and able to follow complex and subtle instructions, such as form and style. No wonder, now you can plan travel itineraries for human users. At least on paper, until we get real experience that they can't do that and end up tripping over their own feet.
For now, be amazed and horrified in equal measure.
The fact that Google used Gemini Live to respond in less than 24 hours to OpenAI's GPT-4o feature that lets you see the world through your phone's camera highlights just how little room there is to get this wrong. Masu. The announcement has been made, but the pressure on the team to get this right continues to mount. It resembles human chat, with natural voices, conversational flow, and mid-sentence interruptions. In Google and OpenAI's demo, the AI was able to accurately identify the world seen through a mobile phone's camera. Did I mention amazing and terrifying as underlying emotions?
My attention was drawn to an announcement that underlies Google's advantage in having an ecosystem of services available to millions of users. At some point, you start wondering too. How much does AI already know about us as individuals? And how much more will we know in the coming months?
What will happen in the future?
This is where we are headed. Google plans to integrate the Gemini model into the sidebar of Gmail, Docs, Drive, etc., not just for Workplace users, but for all of us who pay for Gemini Advanced (I haven't integrated it yet) ). It will happen someday. You can then have an AI agent organize all the receipts (shopping, travel, etc.) in her Gmail into a spreadsheet. Tools to help you find the order details for the items you want to return and help you with the process (difficult to implement on a global scale, and shopping sites have limited processes). The AI assistant will plan a trip for you based on what you say about the places you want to visit in the city and the food you want to eat.
Data is the key to making all this smart functionality work. The key to Gemini for Gmail, Docs, Drive, and other Google services is running a model specifically tailored to your data on your devices and accounts. Responses to queries are more likely to have better context, you can now search for answers in documents and chats, and you can learn what you searched for over time (like sports scores) It will look like this. AI is less likely to hallucinate or misinterpret context.
Google Assistant is being replaced by Gemini for a specific reason: to help predict what you're trying to do and build context and relevance in your suggestions. This includes things like phone calls and document contextualization. The AI in Messages accesses the conversation and understands the context, allowing the AI to provide suggestions and help specific to what you're doing at the time.
The point is, we lost the battle to keep data in the privacy layer a long time ago. Google executives repeatedly insisted when I asked questions that they don't use any user data to train their AI models. You'll also have a clear option to turn off AI features, whether it's a Google service or an Android phone. OpenAI has placed limits on a number of audio options available in GPT-4o's audio modality until further training and safety measures are in place.
Competition is the reason we are rapidly deploying real-world AI. No discussion would be complete without mentioning Google's text-to-video conversion tool Veo. This tool is currently only available to some creators. At least the level of realism of what is being demonstrated has to be seen to be believed. I recall earlier this year when OpenAI detailed its proprietary text-to-video generation AI tool called Sora. So real that they refused to make it public, at least for the time being.
OpenAI's move to release a ChatGPT app for Apple's Mac computing devices was an interesting one. Although it's not Windows, Microsoft has been a pillar of support for years. Is that a way for OpenAI Chief Technology Officer Mira Murati to ensure that her speculative deal with Apple is ready to be approved? Is that really where most ChatGPT users are, or are we reading between the lines – is that where most ChatGPT users are? I wouldn't be surprised if they relied on OpenAI to organize their AI. We'll find out next month if we need that.
Vishal Mathur is the technology editor at Hindustan Times. Tech Tonic is a weekly column that looks at the impact personal technology has on our lives, and vice versa. The views expressed are personal.