MeiliSearch

mirror of https://github.com/meilisearch/MeiliSearch synced 2024-11-05 04:28:55 +01:00

Author	SHA1	Message	Date
Clément Renault	d28f18658e	Retrieve the previous version of the words prefixes FST	2022-01-27 10:05:59 +01:00
bors[bot]	38d23546a5	Merge #431 431: Fix and improve word prefix pair proximity r=ManyTheFish a=Kerollmops This PR first fixes the algorithm we used to select and compute the word prefix pair proximity database. The previous version was skipping nearly all of the prefixes. The issue is that this fix made this method to take more time and we were trying to reduce the time spent in it. With `@ManyTheFish` we found out that we could skip some of the work we were doing by: - discarding the prefixes that were shorter than a specific threshold (default: 2). - discarding the word prefix pairs with proximity bigger than a specific threshold (default: 4). - remove the unused threshold that was specifying a minimum amount of word docids to merge. We will take more time to do some more optimization, like stop clearing and recomputing from scratch the database, we will compute the subsets of keys to create, keep and merge. This change is a little bit more complex than what this PR does. I keep this PR as a draft as I want to further test the real gain if it is enough or not if it is valid or not. I advise reviewers to review commit by commit to see the changes bit by bit, reviewing the whole PR can be hard. Co-authored-by: Clément Renault <clement@meilisearch.com>	2022-01-27 07:04:56 +00:00
bors[bot]	c63f945093	Merge #441 441: Changes related to the rebranding r=curquiza a=meili-bot _This PR is auto-generated._ - [X] Change the name `MeiliSearch` to `Meilisearch` in README. - [x] ⚠️ Ensure the bot did not update part you don’t want it to update, especially in the code examples in the Getting started. - [x] Please, ensure there is no other "MeiliSearch". For example, in the comments or in the tests name. - [x] Put the new logo on the README if needed -> still using the milli logo so far Co-authored-by: meili-bot <74670311+meili-bot@users.noreply.github.com> Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>	2022-01-26 17:07:37 +00:00
Clémentine Urquizar	0f213f2202	Replace MeiliSearch by Meilisearch	2022-01-26 17:49:55 +01:00
Clémentine Urquizar	de808a391a	Replace meilisearch by Meilisearch	2022-01-26 17:48:22 +01:00
meili-bot	0d282e3cc5	Update README.md	2022-01-26 16:33:16 +01:00
bors[bot]	d342c3c357	Merge #438 438: CLI improvements r=Kerollmops a=MarinPostma I've made the following changes to the cli: - `settings-update` become `settings`, with two subcommands: `update` and `show`. - `document-addition` becomes `documents` with a subcommands: `add` (I'll add a feature to list documents later) - `search` now has an interactive mode `-i` - search return the number of documents and the time it took to perform the search. Co-authored-by: mpostma <postma.marin@protonmail.com>	2022-01-26 15:18:20 +00:00
Clément Renault	f9b214f34e	Apply suggestions from code review Co-authored-by: Many <legendre.maxime.isn@gmail.com>	2022-01-26 11:28:11 +01:00
bors[bot]	e1cc025cbd	Merge #440 440: fix(fuzzer): fix the fuzzer after #430 r=Kerollmops a=irevoire Co-authored-by: Tamo <tamo@meilisearch.com>	2022-01-25 16:33:57 +00:00
Clément Renault	f04cd19886	Introduce a max prefix length parameter to the word prefix pair proximity update	2022-01-25 17:04:23 +01:00
Clément Renault	1514dfa1b7	Introduce a max proximity parameter to the word prefix pair proximity update	2022-01-25 17:04:23 +01:00
Clément Renault	23ea3ad738	Remove the useless threshold when computing the word prefix pair proximity	2022-01-25 17:04:23 +01:00
Clément Renault	e3c34684c6	Fix a bug where we were skipping most of the prefix pairs	2022-01-25 17:04:23 +01:00
mpostma	b5f01b52c7	cli improvements	2022-01-25 14:08:30 +01:00
Tamo	fb51d511be	fix(fuzzer): fix the fuzzer after #430	2022-01-25 12:08:47 +01:00
bors[bot]	9f2ff71581	Merge #434 434: bump milli to v0.22.0 r=curquiza a=irevoire This is breaking because of this PR: `98a365aaae` Should we do a special branch to only release the [patch](https://github.com/meilisearch/milli/pull/433) for https://github.com/meilisearch/MeiliSearch/issues/2082 (which is non-breaking)? Co-authored-by: Tamo <tamo@meilisearch.com>	2022-01-24 17:31:20 +00:00
bors[bot]	fd177b63f8	Merge #423 423: Remove an unused file r=irevoire a=irevoire This empty file is not included anywhere Co-authored-by: Tamo <tamo@meilisearch.com>	2022-01-19 14:18:05 +00:00
bors[bot]	8433516d85	Merge #430 430: Document batch support r=Kerollmops a=MarinPostma This pr adds support for document batches in milli. It changes the API of the `IndexDocuments` builder by adding a `add_documents` method. The API of the updates is changed a little, with the `UpdateBuilder` being renamed to `IndexerConfig` and being passed to the update builders. This makes it easier to pass around structs that need to access the indexer config, rather that extracting the fields each time. This change impacts many function signatures and simplify them. The change in not thorough, and may require another PR to propagate to the whole codebase. I restricted to the necessary for this PR. Co-authored-by: Marin Postma <postma.marin@protonmail.com>	2022-01-19 13:32:59 +00:00
Marin Postma	0c84a40298	document batch support reusable transform rework update api add indexer config fix tests review changes Co-authored-by: Clément Renault <clement@meilisearch.com> fmt	2022-01-19 12:40:20 +01:00
bors[bot]	74962b2fd9	Merge #435 435: Ensure we get no documents and no error when filtering on an empty db r=Kerollmops a=irevoire Co-authored-by: Tamo <tamo@meilisearch.com>	2022-01-18 10:46:26 +00:00
Tamo	01968d7ca7	ensure we get no documents and no error when filtering on an empty db	2022-01-18 11:40:30 +01:00
Tamo	367f403693	bump milli	2022-01-17 16:41:34 +01:00
bors[bot]	8f4499090b	Merge #433 433: fix(filter): Fix two bugs. r=Kerollmops a=irevoire - Stop lowercasing the field when looking in the field id map - When a field id does not exist it means there is currently zero documents containing this field thus we return an empty RoaringBitmap instead of throwing an internal error Will fix https://github.com/meilisearch/MeiliSearch/issues/2082 once meilisearch is released Co-authored-by: Tamo <tamo@meilisearch.com>	2022-01-17 14:06:53 +00:00
bors[bot]	4c516c00da	Merge #426 426: Fix search highlight for non-unicode chars r=ManyTheFish a=Samyak2 # Pull Request ## What does this PR do? Fixes https://github.com/meilisearch/MeiliSearch/issues/1480 <!-- Please link the issue you're trying to fix with this PR, if none then please create an issue first. --> ## PR checklist Please check if your PR fulfills the following requirements: - [x] Does this PR fix an existing issue? - [x] Have you read the contributing guidelines? - [x] Have you made sure that the title is accurate and descriptive of the changes? ## Changes The `matching_bytes` function takes a `&Token` now and: - gets the number of bytes to highlight (unchanged). - uses `Token.num_graphemes_from_bytes` to get the number of grapheme clusters to highlight. In essence, the `matching_bytes` function now returns the number of matching grapheme clusters instead of bytes. Added proper highlighting in the HTTP UI: - requires dependency on `unicode-segmentation` to extract grapheme clusters from tokens - `<mark>` tag is put around only the matched part - before this change, the entire word was highlighted even if only a part of it matched ## Questions Since `matching_bytes` does not return number of bytes but grapheme clusters, should it be renamed to something like `matching_chars` or `matching_graphemes`? Will this break the API? Thank you very much `@ManyTheFish` for helping 😄 Co-authored-by: Samyak S Sarnayak <samyak201@gmail.com>	2022-01-17 13:39:00 +00:00
Tamo	d1ac40ea14	fix(filter): Fix two bugs. - Stop lowercasing the field when looking in the field id map - When a field id does not exist it means there is currently zero documents containing this field thus we returns an empty RoaringBitmap instead of throwing an internal error	2022-01-17 13:51:46 +01:00
bors[bot]	15bbde1022	Merge #432 432: Fuzzer r=Kerollmops a=irevoire Provide a first way of fuzzing the indexing part of milli. It depends on [cargo-fuzz](https://rust-fuzz.github.io/book/cargo-fuzz.html) Co-authored-by: Tamo <tamo@meilisearch.com>	2022-01-17 12:50:26 +00:00
Samyak S Sarnayak	c0313f3026	Use chars for highlight instead of graphemes Tokenizer v0.2.7 uses chars instead of graphemes for matching bytes. `unicode-segmentation` dependency isn't needed anymore. Also, oxidised the highlight code :) Co-authored-by: many <maxime@meilisearch.com>	2022-01-17 13:15:31 +05:30
Samyak S Sarnayak	2d7607734e	Run cargo fmt on matching_words.rs	2022-01-17 13:04:33 +05:30
Samyak S Sarnayak	5ab505be33	Fix highlight by replacing num_graphemes_from_bytes num_graphemes_from_bytes has been renamed in the tokenizer to num_chars_from_bytes. Highlight now works correctly!	2022-01-17 13:02:55 +05:30
Samyak S Sarnayak	c10f58b7bd	Update tokenizer to v0.2.7	2022-01-17 13:02:00 +05:30
Samyak S Sarnayak	e752bd06f7	Fix matching_words tests to compile successfully The tests still fail due to a bug in https://github.com/meilisearch/tokenizer/pull/59	2022-01-17 11:37:45 +05:30
Samyak S Sarnayak	30247d70cd	Fix search highlight for non-unicode chars The `matching_bytes` function takes a `&Token` now and: - gets the number of bytes to highlight (unchanged). - uses `Token.num_graphemes_from_bytes` to get the number of grapheme clusters to highlight. In essence, the `matching_bytes` function returns the number of matching grapheme clusters instead of bytes. Should this function be renamed then? Added proper highlighting in the HTTP UI: - requires dependency on `unicode-segmentation` to extract grapheme clusters from tokens - `<mark>` tag is put around only the matched part - before this change, the entire word was highlighted even if only a part of it matched	2022-01-17 11:37:44 +05:30
Tamo	0605c0ac68	apply review comments	2022-01-13 18:51:08 +01:00
Tamo	b22c80106f	add some settings to the fuzzed milli and use the published version of arbitrary json	2022-01-13 15:35:24 +01:00
Tamo	c94952e25d	update the readme + dependencies	2022-01-12 18:30:11 +01:00
Tamo	e1053989c0	add a fuzzer on milli	2022-01-12 17:57:54 +01:00
bors[bot]	559e019de1	Merge #424 424: Store the geopoint in three dimensions r=Kerollmops a=irevoire Related to this issue: https://github.com/meilisearch/MeiliSearch/issues/1872 Fix the whole computation of distance for any “geo” operations (sort or filter). Now when you sort points they are returned to you in the right order. And when you filter on a specific radius you only get points included in the radius. This PR changes the way we store the geo points in the RTree. Instead of considering the latitude and longitude as orthogonal coordinates, we convert them to real orthogonal coordinates projected on a sphere with a radius of 1. This is the conversion formulae. ![image](https://user-images.githubusercontent.com/7032172/145990456-eefe840a-384f-4486-848b-81d0036814ec.png) Which, in rust, translate to this function: ```rust pub fn lat_lng_to_xyz(coord: &[f64; 2]) -> [f64; 3] { let [lat, lng] = coord.map(\|f\| f.to_radians()); let x = lat.cos() * lng.cos(); let y = lat.cos() * lng.sin(); let z = lat.sin(); [x, y, z] } ``` Storing the points on a sphere is easier / faster to compute than storing the point on an approximation of the real earth shape. But when we need to compute the distance between two points we still need to use the haversine distance which works with latitude and longitude. So, to do the fewest search-time computation possible I'm now associating every point with its `DocId` and its lat/lng. Co-authored-by: Tamo <tamo@meilisearch.com>	2022-01-10 15:23:43 +00:00
bors[bot]	660eac50b2	Merge #427 427: Handle escaped characters in filters r=Kerollmops a=irevoire Co-authored-by: Tamo <tamo@meilisearch.com>	2022-01-10 15:01:23 +00:00
Tamo	92804f6f45	apply clippy suggestions	2022-01-10 15:59:04 +01:00
Tamo	0fcde35a20	Update filter-parser/src/value.rs Co-authored-by: Clément Renault <clement@meilisearch.com>	2022-01-10 15:53:44 +01:00
Tamo	3c7ea1d298	Apply code suggestions Co-authored-by: Clément Renault <clement@meilisearch.com>	2022-01-10 15:19:21 +01:00
bors[bot]	74594be234	Merge #429 429: Benchmark CIs: not use a default label to call the GH runner r=irevoire a=curquiza Since we now have multiple self-hosted github runners, we need to differentiate them calling them in the CI. The `self-hosted` label is the default one, so we need to use the unique and appropriate one for the benchmark machine <img width="925" alt="Capture d’écran 2022-01-04 à 15 42 18" src="https://user-images.githubusercontent.com/20380692/148079840-49cd7878-5912-46ff-8ab8-bf646777f782.png"> Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>	2022-01-04 15:41:08 +00:00
Clémentine Urquizar	3d99686f7a	Change self-hosted label by benchmarks	2022-01-04 16:01:01 +01:00
bors[bot]	c039562723	Merge #428 428: Reintroduce the gitignore for the fuzzer r=Kerollmops a=irevoire Reintroduce the gitignore in the fuzz directory Co-authored-by: Tamo <tamo@meilisearch.com>	2022-01-04 12:09:06 +00:00
Tamo	9bdcd42b9b	reintroduce the gitignore for the fuzzer	2022-01-04 13:07:32 +01:00
bors[bot]	4cae691b86	Merge #425 425: Push the result of the benchmarks to influxdb r=irevoire a=irevoire Now execute a benchmark for every PR merged into main and then upload the results to influxdb. Co-authored-by: Tamo <tamo@meilisearch.com>	2022-01-04 11:04:16 +00:00
Tamo	6a1216bd51	Integrate telegraf into our CI	2022-01-04 11:59:05 +01:00
Tamo	02a21fd309	Handle the escapes of quote in the filters	2022-01-04 04:04:10 +01:00
Tamo	98a365aaae	store the geopoint in three dimensions	2021-12-14 12:21:24 +01:00
Tamo	d671d6f0f1	remove an unused file	2021-12-13 19:27:34 +01:00

1 2 3 4 5 ...

1534 Commits