Open in app

Sign in

Write

Sign in

Eric Zhù
Eric Zhù

93 Followers

Home

About

Published in

Towards Data Science

·2 days ago

Please Use Streaming Workload to Benchmark Vector Databases

Why static workload is insufficient and what I learned by comparing HNSWLIB and DiskANN using streaming workload — Vector databases are built for high-dimensional vector retrieval. Today, many vectors are embeddings generated by deep neural nets like GPTs and CLIP to represent data points such as pieces of text, images, or audio tracks. Embeddings are used in many applications like search engines, recommendation systems, and chatbots. You can…

Vector Database

9 min read

Please Use Streaming Workload to Benchmark Vector Databases
Please Use Streaming Workload to Benchmark Vector Databases
Vector Database

9 min read


Published in

Towards Data Science

·Aug 18

Finding Needles in a Haystack — Search Indexes for Jaccard Similarity

From basic concepts to exact and approximate indexes — Vector databases are in the news for being the external memory of large language models (LLMs). The vector databases today are new systems built on decade-old research called approximate nearest neighbor (ANN) indexes. These indexing algorithms takes many high-dimensional vectors (e.g., float32[]), and built a data structure that supports finding…

Jaccard

15 min read

Finding Needles in a Haystack — Search Indexes for Jaccard Similarity
Finding Needles in a Haystack — Search Indexes for Jaccard Similarity
Jaccard

15 min read


May 10

GPT-4’s Maze Navigation: A Deep Dive into ReAct Agent and LLM’s Thoughts

The Sparks My colleagues at Microsoft Research recently published a paper that demonstrated GPT-4’s navigational and mapping capability.

Gpt 4

10 min read

GPT-4’s Maze Navigation: A Deep Dive into ReAct Agent and LLM’s Thoughts
GPT-4’s Maze Navigation: A Deep Dive into ReAct Agent and LLM’s Thoughts
Gpt 4

10 min read


Mar 18

Human-Aligned Text-to-SQL Evaluation

In my last post about Text-to-SQL using GPT-3.5, I pointed out the issue with existing benchmark’s evaluation metric: it rejects perfectly fine SQL queries for not having the same strict execution result as golden (aka. labeled) queries. In this post, I discuss a new metric I created to better evaluate…

Text To Sql

5 min read

Human-Aligned Text-to-SQL Evaluation
Human-Aligned Text-to-SQL Evaluation
Text To Sql

5 min read


Mar 7

What is Coming Next for Text-to-SQL

Text-to-SQL is a natural language processing (NLP) task that involves converting natural language questions into SQL queries that can be executed on a database. Here is a Text-to-SQL example on a TV show database: Question: What is the content of TV Channel with serial name “Sky Radio”? Model: SELECT Content…

Text To Sql

10 min read

Text To Sql

10 min read

Eric Zhù

Eric Zhù

93 Followers

Researcher at Microsoft Research.

Help

Status

About

Careers

Blog

Privacy

Terms

Text to speech

Teams