MeiliSearch

mirror of https://github.com/meilisearch/MeiliSearch synced 2024-11-29 16:24:26 +01:00

Author	SHA1	Message	Date
Tamo	367f403693	bump milli	2022-01-17 16:41:34 +01:00
bors[bot]	8f4499090b	Merge #433 433: fix(filter): Fix two bugs. r=Kerollmops a=irevoire - Stop lowercasing the field when looking in the field id map - When a field id does not exist it means there is currently zero documents containing this field thus we return an empty RoaringBitmap instead of throwing an internal error Will fix https://github.com/meilisearch/MeiliSearch/issues/2082 once meilisearch is released Co-authored-by: Tamo <tamo@meilisearch.com>	2022-01-17 14:06:53 +00:00
bors[bot]	4c516c00da	Merge #426 426: Fix search highlight for non-unicode chars r=ManyTheFish a=Samyak2 # Pull Request ## What does this PR do? Fixes https://github.com/meilisearch/MeiliSearch/issues/1480 <!-- Please link the issue you're trying to fix with this PR, if none then please create an issue first. --> ## PR checklist Please check if your PR fulfills the following requirements: - [x] Does this PR fix an existing issue? - [x] Have you read the contributing guidelines? - [x] Have you made sure that the title is accurate and descriptive of the changes? ## Changes The `matching_bytes` function takes a `&Token` now and: - gets the number of bytes to highlight (unchanged). - uses `Token.num_graphemes_from_bytes` to get the number of grapheme clusters to highlight. In essence, the `matching_bytes` function now returns the number of matching grapheme clusters instead of bytes. Added proper highlighting in the HTTP UI: - requires dependency on `unicode-segmentation` to extract grapheme clusters from tokens - `<mark>` tag is put around only the matched part - before this change, the entire word was highlighted even if only a part of it matched ## Questions Since `matching_bytes` does not return number of bytes but grapheme clusters, should it be renamed to something like `matching_chars` or `matching_graphemes`? Will this break the API? Thank you very much `@ManyTheFish` for helping 😄 Co-authored-by: Samyak S Sarnayak <samyak201@gmail.com>	2022-01-17 13:39:00 +00:00
Tamo	d1ac40ea14	fix(filter): Fix two bugs. - Stop lowercasing the field when looking in the field id map - When a field id does not exist it means there is currently zero documents containing this field thus we returns an empty RoaringBitmap instead of throwing an internal error	2022-01-17 13:51:46 +01:00
bors[bot]	15bbde1022	Merge #432 432: Fuzzer r=Kerollmops a=irevoire Provide a first way of fuzzing the indexing part of milli. It depends on [cargo-fuzz](https://rust-fuzz.github.io/book/cargo-fuzz.html) Co-authored-by: Tamo <tamo@meilisearch.com>	2022-01-17 12:50:26 +00:00
Samyak S Sarnayak	c0313f3026	Use chars for highlight instead of graphemes Tokenizer v0.2.7 uses chars instead of graphemes for matching bytes. `unicode-segmentation` dependency isn't needed anymore. Also, oxidised the highlight code :) Co-authored-by: many <maxime@meilisearch.com>	2022-01-17 13:15:31 +05:30
Samyak S Sarnayak	2d7607734e	Run cargo fmt on matching_words.rs	2022-01-17 13:04:33 +05:30
Samyak S Sarnayak	5ab505be33	Fix highlight by replacing num_graphemes_from_bytes num_graphemes_from_bytes has been renamed in the tokenizer to num_chars_from_bytes. Highlight now works correctly!	2022-01-17 13:02:55 +05:30
Samyak S Sarnayak	c10f58b7bd	Update tokenizer to v0.2.7	2022-01-17 13:02:00 +05:30
Samyak S Sarnayak	e752bd06f7	Fix matching_words tests to compile successfully The tests still fail due to a bug in https://github.com/meilisearch/tokenizer/pull/59	2022-01-17 11:37:45 +05:30
Samyak S Sarnayak	30247d70cd	Fix search highlight for non-unicode chars The `matching_bytes` function takes a `&Token` now and: - gets the number of bytes to highlight (unchanged). - uses `Token.num_graphemes_from_bytes` to get the number of grapheme clusters to highlight. In essence, the `matching_bytes` function returns the number of matching grapheme clusters instead of bytes. Should this function be renamed then? Added proper highlighting in the HTTP UI: - requires dependency on `unicode-segmentation` to extract grapheme clusters from tokens - `<mark>` tag is put around only the matched part - before this change, the entire word was highlighted even if only a part of it matched	2022-01-17 11:37:44 +05:30
Tamo	0605c0ac68	apply review comments	2022-01-13 18:51:08 +01:00
mpostma	0515c6e844	bug(http): fix task duration	2022-01-13 16:41:07 +01:00
Irevoire	38176181ac	fix(dump): Fix the import of dump from the v24 and before	2022-01-13 16:40:58 +01:00
Tamo	b22c80106f	add some settings to the fuzzed milli and use the published version of arbitrary json	2022-01-13 15:35:24 +01:00
bors[bot]	a7e634bd4f	Merge #2074 2074: fix(dump): Fix the import of dumps when there is no data.ms r=irevoire a=irevoire Co-authored-by: Irevoire <tamo@meilisearch.com>	2022-01-13 13:47:03 +00:00
bors[bot]	78a381a30b	Merge #2076 2076: fix(dump): Fix the import of dump from the v24 and before r=ManyTheFish a=irevoire Same as https://github.com/meilisearch/MeiliSearch/pull/2073 but on main this time Co-authored-by: Irevoire <tamo@meilisearch.com>	2022-01-13 13:09:23 +00:00
Irevoire	343bce6a29	fix(dump): Fix the import of dump from the v24 and before	2022-01-13 13:23:57 +01:00
mpostma	d263f762bf	feat(http): accept empty document additions wip	2022-01-13 12:46:56 +01:00
Irevoire	dfaeb19566	fix(dump): Fix the import of dumps when there is no data.ms	2022-01-13 12:30:58 +01:00
Tamo	c94952e25d	update the readme + dependencies	2022-01-12 18:30:11 +01:00
Tamo	e1053989c0	add a fuzzer on milli	2022-01-12 17:57:54 +01:00
bors[bot]	010dcc3e80	Merge #2066 2066: bug(http): fix task duration r=MarinPostma a=MarinPostma `@gmourier` found that the duration in the task view was not computed correctly, this pr fixes it. `@curquiza,` I let you decide if we need to make a hotfix out of this or wait for the next release. This is not breaking. Co-authored-by: mpostma <postma.marin@protonmail.com>	2022-01-12 14:50:58 +00:00
bors[bot]	d0aa5f747c	Merge #2067 2067: chore(all): fix rust edition r=irevoire a=MarinPostma I hadn't correctly set the rust edition in my previous pr, and cargo was returning a warning. This time I followed this guide: https://doc.rust-lang.org/edition-guide/editions/transitioning-an-existing-project-to-a-new-edition.html Co-authored-by: mpostma <postma.marin@protonmail.com>	2022-01-12 13:32:42 +00:00
mpostma	f6d53e03f1	chore(http): migrate from structopt to clap3	2022-01-12 14:07:19 +01:00
mpostma	3ecebd15ee	chore(all): fix rust edition	2022-01-12 11:14:50 +01:00
mpostma	db83e39a7f	bug(http): fix task duration	2022-01-11 18:01:25 +01:00
bors[bot]	5d48f72ade	Merge #2065 2065: MeiliSearch v0.25.0: `stable` -> `main` r=curquiza a=curquiza Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com> Co-authored-by: Clément Renault <clement@meilisearch.com> Co-authored-by: bors[bot] <26634292+bors[bot]@users.noreply.github.com> Co-authored-by: many <maxime@meilisearch.com> Co-authored-by: Marin Postma <postma.marin@protonmail.com> Co-authored-by: Maxime Legendre <maximelegendre@MacBook-Pro-de-Maxime.local> Co-authored-by: Maxime Legendre <maximelegendre@mbp-de-maxime.home> Co-authored-by: Tamo <tamo@meilisearch.com> Co-authored-by: ManyTheFish <many@meilisearch.com>	2022-01-11 16:30:22 +00:00
bors[bot]	1818026a84	Merge #2057 2057: fix(dump): Uncompress the dump IN the data.ms r=irevoire a=irevoire When loading a dump with docker, we had two problems. After creating a tempdirectory, uncompressing and re-indexing the dump: 1. We try to `move` the new “data.ms” onto the currently present one. The problem is that if the `data.ms` is a mount point because that's what peoples do with docker usually. We can't override a mount point, and thus we were throwing an error. 2. The tempdir is created in `/tmp`, which is usually quite small AND may not be on the same partition as the `data.ms`. This means when we tried to move the dump over the `data.ms`, it was also failing because we can't move data between two partitions. ------------------ 1 was fixed by deleting the content of the `data.ms` and moving the content of the tempdir inside the `data.ms`. If someone tries to create volumes inside the `data.ms` that's his problem, not ours. 2 was fixed by creating the tempdir inside of the `data.ms`. If a user mounted its `data.ms` on a large partition, there is no reason he could not load a big dump because his `/tmp` was too small. This solves the issue; now the dump is extracted and indexed on the same partition the `data.ms` will lay. fix #1833 Co-authored-by: Tamo <tamo@meilisearch.com>	2022-01-10 17:57:16 +00:00
bors[bot]	0ad7d38eec	Merge #2061 2061: Update dashboard for v0.25.0 r=curquiza a=mdubus Co-authored-by: Morgane Dubus <30866152+mdubus@users.noreply.github.com>	2022-01-10 16:29:31 +00:00
Morgane Dubus	b17ad5c2be	Update with latest release of the dashboard	2022-01-10 17:10:09 +01:00
bors[bot]	559e019de1	Merge #424 424: Store the geopoint in three dimensions r=Kerollmops a=irevoire Related to this issue: https://github.com/meilisearch/MeiliSearch/issues/1872 Fix the whole computation of distance for any “geo” operations (sort or filter). Now when you sort points they are returned to you in the right order. And when you filter on a specific radius you only get points included in the radius. This PR changes the way we store the geo points in the RTree. Instead of considering the latitude and longitude as orthogonal coordinates, we convert them to real orthogonal coordinates projected on a sphere with a radius of 1. This is the conversion formulae. ![image](https://user-images.githubusercontent.com/7032172/145990456-eefe840a-384f-4486-848b-81d0036814ec.png) Which, in rust, translate to this function: ```rust pub fn lat_lng_to_xyz(coord: &[f64; 2]) -> [f64; 3] { let [lat, lng] = coord.map(\|f\| f.to_radians()); let x = lat.cos() * lng.cos(); let y = lat.cos() * lng.sin(); let z = lat.sin(); [x, y, z] } ``` Storing the points on a sphere is easier / faster to compute than storing the point on an approximation of the real earth shape. But when we need to compute the distance between two points we still need to use the haversine distance which works with latitude and longitude. So, to do the fewest search-time computation possible I'm now associating every point with its `DocId` and its lat/lng. Co-authored-by: Tamo <tamo@meilisearch.com>	2022-01-10 15:23:43 +00:00
bors[bot]	1824b3c07b	Merge #2060 2060: chore(all) set rust edition to 2021 r=MarinPostma a=MarinPostma set the rust edition for the project to 2021 this make the MSRV to v1.56 #2058 Co-authored-by: Marin Postma <postma.marin@protonmail.com>	2022-01-10 15:04:14 +00:00
bors[bot]	660eac50b2	Merge #427 427: Handle escaped characters in filters r=Kerollmops a=irevoire Co-authored-by: Tamo <tamo@meilisearch.com>	2022-01-10 15:01:23 +00:00
Tamo	92804f6f45	apply clippy suggestions	2022-01-10 15:59:04 +01:00
Tamo	0fcde35a20	Update filter-parser/src/value.rs Co-authored-by: Clément Renault <clement@meilisearch.com>	2022-01-10 15:53:44 +01:00
Tamo	3c7ea1d298	Apply code suggestions Co-authored-by: Clément Renault <clement@meilisearch.com>	2022-01-10 15:19:21 +01:00
Tamo	c9c7da3626	fix(dump): Uncompress the dump IN the data.ms When loading a dump with docker, we had two problems. After creating a tempdirectory, uncompressing and re-indexing the dump: 1. We try to `move` the new “data.ms” onto the currently present one. The problem is that if the `data.ms` is a mount point because that's what peoples do with docker usually. We can't override a mount point, and thus we were throwing an error. 2. The tempdir is created in `/tmp`, which is usually quite small AND may not be on the same partition as the `data.ms`. This means when we tried to move the dump over the `data.ms`, it was also failing because we can't move data between two partitions. ============== 1 was fixed by deleting the content of the `data.ms` and moving the content of the tempdir inside the `data.ms`. If someone tries to create volumes inside the `data.ms` that's his problem, not ours. 2 was fixed by creating the tempdir inside of the `data.ms`. If a user mounted its `data.ms` on a large partition, there is no reason he could not load a big dump because his `/tmp` was too small. This solves the issue; now the dump is extracted and indexed on the same partition the `data.ms` will lay. fix #1833	2022-01-10 14:56:03 +01:00
Morgane Dubus	030a90523d	Update dashboard for v0.25.0	2022-01-10 10:50:57 +01:00
bors[bot]	56d223a51d	Merge #2059 2059: change indexed doc count on error r=irevoire a=MarinPostma change `indexed_documents` and `deleted_documents` to return 0 instead of null when empty when the task has failed. close #2053 Co-authored-by: Marin Postma <postma.marin@protonmail.com>	2022-01-06 15:55:50 +00:00
Marin Postma	f558ff826a	feat(http): task view indexed and deleted documents return 0 instead of null	2022-01-06 14:55:02 +01:00
Marin Postma	5fb4ed60e7	chore(all) set rust edition to 2021	2022-01-06 13:30:45 +01:00
bors[bot]	0d2a358cc2	Merge #2056 2056: Allow any header for CORS r=curquiza a=curquiza Bug fix: trigger a CORS error when trying to send the `User-Agent` header via the browser `@bidoubiwa` thanks for the bug report! Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>	2022-01-05 16:45:51 +00:00
Clémentine Urquizar	595250c93e	Allow any header for CORS	2022-01-05 15:38:47 +01:00
bors[bot]	c636988935	Merge #2055 2055: fix(dump): Fix the loading of dump with empty indexes r=irevoire a=irevoire Co-authored-by: Tamo <tamo@meilisearch.com>	2022-01-05 14:31:53 +00:00
Tamo	eea483c470	fix(dump): Fix the loading of dump with empty indexes	2022-01-05 15:08:21 +01:00
bors[bot]	d53c61a6d0	Merge #2054 2054: Bug(auth): Wrap key list in results r=irevoire a=ManyTheFish fix #2052 Co-authored-by: ManyTheFish <many@meilisearch.com>	2022-01-04 15:44:55 +00:00
bors[bot]	74594be234	Merge #429 429: Benchmark CIs: not use a default label to call the GH runner r=irevoire a=curquiza Since we now have multiple self-hosted github runners, we need to differentiate them calling them in the CI. The `self-hosted` label is the default one, so we need to use the unique and appropriate one for the benchmark machine <img width="925" alt="Capture d’écran 2022-01-04 à 15 42 18" src="https://user-images.githubusercontent.com/20380692/148079840-49cd7878-5912-46ff-8ab8-bf646777f782.png"> Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>	2022-01-04 15:41:08 +00:00
Clémentine Urquizar	3d99686f7a	Change self-hosted label by benchmarks	2022-01-04 16:01:01 +01:00
ManyTheFish	c0d4f71a34	Bug(auth): Wrap key list in results	2022-01-04 14:10:30 +01:00

... 55 56 57 58 59 ...

7545 Commits