MeiliSearch

mirror of https://github.com/meilisearch/MeiliSearch synced 2024-11-30 00:34:26 +01:00

Author	SHA1	Message	Date
ad hoc	9b064e53e7	fix(http, lib): rename_min_word_length_for_typo into rename_min_word_size_for_typo	2022-04-17 10:02:56 +02:00
bors[bot]	289bfd46ff	Merge #2321 2321: Bump milli r=curquiza a=irevoire Co-authored-by: Irevoire <tamo@meilisearch.com>	2022-04-14 11:51:15 +00:00
Irevoire	64b0a50a58	chore: bump milli	2022-04-14 12:12:54 +02:00
Clémentine Urquizar - curqui	a68e3a79fb	Merge pull request #497 from meilisearch/v0.26.1 Update version for the next release (v0.26.1)	2022-04-14 11:53:31 +02:00
bors[bot]	b1333ab5b0	Merge #2320 2320: chore(http, lib): rename typo to typo_tolerance r=irevoire a=MarinPostma fix #2319 Co-authored-by: ad hoc <postma.marin@protonmail.com>	2022-04-14 09:50:39 +00:00
Clémentine Urquizar	8d630a6f62	Update version for the next release (v0.26.1)	2022-04-14 11:44:06 +02:00
Clémentine Urquizar - curqui	d362278a41	Merge pull request #494 from meilisearch/flatten-what-is-needed Only flatten the required objects	2022-04-14 11:43:28 +02:00
Tamo	00f78d6b5a	Apply code suggestions Co-authored-by: Clément Renault <clement@meilisearch.com>	2022-04-14 11:14:08 +02:00
Tamo	399fba16bb	only flatten an object if it's nested	2022-04-14 11:14:08 +02:00
Tamo	c2469b6765	create the json-depth-checker crate	2022-04-14 11:14:08 +02:00
ad hoc	276dc6043a	chore(http, lib): rename typo to typo_tolerance	2022-04-14 10:42:06 +02:00
bors[bot]	7791ef90e7	Merge #493 493: Use smartstring to store the external id in our hashmap r=Kerollmops a=irevoire We need to store all the external id (primary key) in a hashmap associated to their internal id. The smartstring remove heap allocation / memory usage and should improve the cache locality. I ran the benchmarks to measure the impact of this PR on the indexing time. I think we should merge it whatever happens thought because it'll decrease the memory consumption. --------- This improve really sliiiiiightly the performances but improve the memory usage thus it should be merged. ``` group indexing_main_6b073738 indexing_use-smartsring_3f343511 ----- ---------------------- -------------------------------- indexing/Indexing geo_point 1.02 25.2±0.20s ? ?/sec 1.00 24.8±0.13s ? ?/sec indexing/Indexing movies in three batches 1.00 18.2±0.10s ? ?/sec 1.00 18.2±0.23s ? ?/sec indexing/Indexing movies with default settings 1.00 17.5±0.09s ? ?/sec 1.01 17.7±0.11s ? ?/sec indexing/Indexing songs in three batches with default settings 1.00 68.3±1.01s ? ?/sec 1.00 68.0±0.95s ? ?/sec indexing/Indexing songs with default settings 1.00 63.2±0.78s ? ?/sec 1.00 63.0±0.58s ? ?/sec indexing/Indexing songs without any facets 1.02 59.6±1.00s ? ?/sec 1.00 58.5±1.03s ? ?/sec indexing/Indexing songs without faceted numbers 1.00 62.8±0.38s ? ?/sec 1.00 62.6±1.02s ? ?/sec indexing/Indexing wiki 1.01 1009.2±25.25s ? ?/sec 1.00 998.1±11.27s ? ?/sec indexing/Indexing wiki in three batches 1.01 1142.0±9.97s ? ?/sec 1.00 1134.4±11.21s ? ?/sec ``` Co-authored-by: Tamo <tamo@meilisearch.com>	2022-04-13 20:28:28 +00:00
Tamo	ee64f4a936	Use smartstring to store the external id in our hashmap We need to store all the external id (primary key) in a hashmap associated to their internal id during. The smartstring remove heap allocation / memory usage and should improve the cache locality.	2022-04-13 21:22:07 +02:00
bors[bot]	b9e676b8ca	Merge #2316 2316: Add version flag r=Kerollmops a=sanders41 # Pull Request ## What does this PR do? Fixes #2315 ## PR checklist Please check if your PR fulfills the following requirements: - [x] Does this PR fix an existing issue? - [x] Have you read the contributing guidelines? - [x] Have you made sure that the title is accurate and descriptive of the changes? Thank you so much for contributing to Meilisearch! Co-authored-by: Paul Sanders <psanders1@gmail.com>	2022-04-13 17:24:09 +00:00
bors[bot]	6c06fb226d	Merge #2307 2307: Feat(Analytics): Add analytics for search format options r=irevoire a=ManyTheFish Specification: [#120](https://github.com/meilisearch/specifications/pull/120) ([f5c6a8e](`f5c6a8e183`)) fix #2308 Co-authored-by: ManyTheFish <many@meilisearch.com>	2022-04-13 12:01:52 +00:00
bors[bot]	456887a54a	Merge #496 496: Improve the performances of the flattening subcrate r=irevoire a=Kerollmops This PR adds some benchmarks to the _flatten-serde-json_ crate, this crate is responsible for transforming the original documents into flat versions that the engine can understand. It can probably be speed-up and this is why I added benchmarks to it. I make some interesting performance improvements when I replaced the `json!` macro calls. ``` flatten/simple time: [452.44 ns 453.31 ns 454.18 ns] change: [-15.036% -14.751% -14.473%] (p = 0.00 < 0.05) Performance has improved. Found 2 outliers among 100 measurements (2.00%) 2 (2.00%) high mild Benchmarking flatten/complex: Collecting 100 samples in estimated 5.0007 s (4.9M i flatten/complex time: [1.0101 us 1.0131 us 1.0160 us] change: [-18.001% -17.775% -17.536%] (p = 0.00 < 0.05) Performance has improved. Found 6 outliers among 100 measurements (6.00%) 5 (5.00%) high mild 1 (1.00%) high severe ``` --- _I removed this particular commit from this PR._ The reason is that the two other commits were enough for this PR to give enough impact and be merged. We will continue to explore where we can get performances later. But when I changed the flattening function to accept an owned version of the objects, we lost a lot of performances. Yes, I rewrote the benchmarks (locally) to clone the input object (and measured both, previous and new versions, with the cloning benchmarks). Maybe cloning the benchmark inputs is not the right thing to do... ``` Benchmarking flatten/simple: Collecting 100 samples in estimated 5.0005 s (6.7M it flatten/simple time: [746.46 ns 749.59 ns 752.70 ns] change: [+40.082% +40.714% +41.347%] (p = 0.00 < 0.05) Performance has regressed. Benchmarking flatten/complex: Collecting 100 samples in estimated 5.0047 s (2.9M i flatten/complex time: [1.7311 us 1.7342 us 1.7368 us] change: [+40.976% +41.398% +41.807%] (p = 0.00 < 0.05) Performance has regressed. Found 1 outliers among 100 measurements (1.00%) 1 (1.00%) low mild ``` Co-authored-by: Kerollmops <clement@meilisearch.com>	2022-04-13 11:14:29 +00:00
Kerollmops	b3cec1a383	Prefer using direct method calls instead of using the json macros	2022-04-13 13:12:57 +02:00
Kerollmops	436d2032c4	Add benchmarks to the flatten-serde-json subcrate	2022-04-13 13:12:57 +02:00
bors[bot]	3828635fb2	Merge #489 489: fix distinct count bug r=curquiza a=MarinPostma fix https://github.com/meilisearch/meilisearch/issues/2152 I think the issue was that we didn't take off the excluded candidates from the initial candidates when returning the candidates with the search result. Co-authored-by: ad hoc <postma.marin@protonmail.com>	2022-04-13 10:15:30 +00:00
ad hoc	dda28d7415	exclude excluded canditates from search result candidates	2022-04-13 12:10:35 +02:00
ad hoc	cd83014fff	add test for disctinct nb hits	2022-04-13 12:10:35 +02:00
ad hoc	bbb6728d2f	add distinct attributes to cli	2022-04-13 12:10:35 +02:00
bors[bot]	49fbbacafc	Merge #492 492: Add the new `Specify breaking` check to bors.toml r=curquiza a=curquiza Should prevent this problem: https://github.com/meilisearch/milli/pull/489#issuecomment-1094988060 Co-authored-by: Clémentine Urquizar - curqui <clementine@meilisearch.com>	2022-04-13 08:59:40 +00:00
Clémentine Urquizar - curqui	7ad582f39f	Update bors.toml	2022-04-13 10:56:56 +02:00
Clémentine Urquizar - curqui	aa896f0e7a	Update bors.toml	2022-04-13 10:56:56 +02:00
Clémentine Urquizar - curqui	0261a0e3cf	Add the new `Specify breaking` check to bors.toml	2022-04-13 10:56:55 +02:00
Paul Sanders	41249be274	Add version flag	2022-04-12 15:22:36 -04:00
ManyTheFish	5809d3ae0d	Add first benchmarks on formatting	2022-04-12 16:31:58 +02:00
bors[bot]	049cf0fcee	Merge #2313 2313: fix(search): remove the back and forth between the IndexMap and the serde_json::Map r=irevoire a=irevoire This is ok because we're using the preserve_order feature in serde_json which is already internally using an IndexMap. See https://github.com/meilisearch/meilisearch/pull/2298#discussion_r845228412_ Co-authored-by: Tamo <tamo@meilisearch.com>	2022-04-12 14:17:26 +00:00
Tamo	2ee210483f	fix(search): remove the back and forth between the IndexMap and the serde_json::Map This is ok because we're using the preserve_order feature in serde_json which is already internally using an IndexMap.	2022-04-12 16:12:52 +02:00
ManyTheFish	827cedcd15	Add format option structure	2022-04-12 13:42:14 +02:00
ManyTheFish	011f8210ed	Make compute_matches more rust idiomatic	2022-04-12 10:19:02 +02:00
bors[bot]	6b0737384b	Merge #491 491: remove the unused key warning r=curquiza a=irevoire When I copy-pasted my flatten crate I forgot to remove the key used to publish the package and that throw a warning. Co-authored-by: Tamo <tamo@meilisearch.com>	2022-04-11 16:55:25 +00:00
bors[bot]	13205066f3	Merge #2311 2311: Change version for the next release (v0.27.0) r=irevoire a=curquiza Fixes #2310 Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>	2022-04-11 14:49:33 +00:00
Clémentine Urquizar	b3661bf8ec	Change version for the next release (v0.27.0)	2022-04-11 16:25:15 +02:00
ManyTheFish	0990e95830	Feat(Analytics): Add analytics for search format options	2022-04-11 14:53:15 +02:00
Tamo	e153418b8a	remove the unused key warning	2022-04-11 14:52:41 +02:00
bors[bot]	f67167fa9f	Merge #2178 2178: Refacto docker r=irevoire a=irevoire closes #2166 and #2085 ----------- I noticed many people had issues with the default configuration of our Dockerfile. Some examples: - #2166: If you use ubuntu and mount your `data.ms` in a volume (as shown in the [doc](https://docs.meilisearch.com/learn/getting_started/installation.html#download-and-launch)), you can't run meilisearch - #2085: Here, meilisearch was not able to erase the `data.ms` when loading a dump because it's the mount point Currently, we don't show how to use the snapshot and dumps with docker in the documentation. And it's quite hard to do: - You either send a big command to meilisearch to change the dump-path, snapshot-path and db-path a single directory and then mount that one - Or you mount three volumes - And there were other issues on the slack community I think this PR solve the problem. Now the image contains the `meilisearch` binary in the `/bin` directory, so it's easy to find and always in the `PATH`. It creates a `data` directory and moves the working-dir in it. So now you can find the `dumps`, `snapshots` and `data.ms` directory in `/data`. Here is the new command to run meilisearch with a volume: ``` docker run -it --rm -v $PWD/meili_data:/data -p 7700:7700 getmeili/meilisearch:latest ``` And if you need to import a dump or a snapshot, you don't need to restart your container and mount another volume. You can directly hit the `POST /dumps` route and then run: ``` docker run -it --rm -v $PWD/meili_data:/data -p 7700:7700 getmeili/meilisearch:latest meilisearch --import-dump dumps/20220217-152115159.dump ``` ------- You can already try this PR with the following docker image: ``` getmeili/meilisearch:test-docker-v0.26.0 ``` If you want to use the v0.25.2 I created another image; ``` getmeili/meilisearch:test-docker-v0.25.2 ``` ------ If you're using helm I created a branch [here](https://github.com/meilisearch/meilisearch-kubernetes/tree/test-docker-v0.26.0) that use the v0.26.0 image with the good volume 👍 If you use this conf with the v0.25.2, it should also work. Co-authored-by: Tamo <tamo@meilisearch.com>	2022-04-11 12:01:25 +00:00
bors[bot]	31584f34e8	Merge #2298 2298: Nested fields r=irevoire a=irevoire There are a few things that I want to fix _AFTER_ merging this PR. For the following RCs. ## Stop the useless conversion In the `search.rs` I convert a `Document` to a `Value`, and then the `Value` to a `Document` and then back to a `Value` etc. I should stop doing all these conversion and stick to one format. Probably by merging my `permissive-json-pointer` crate into meilisearch. That would also give me the opportunity to work directly with obkvs and stops deserializing fields I don't need. ## Add more test specific to the nested Everything seems to works but I should write tests to double check that the nested works well with the `formatted` field. ## See how I could stop iterating on hashmap and instead fill them correctly This is related to milli. I really often needs to iterate over hashmap to see if a field is a subset of another field. I could probably generate a structure containing all the possible key values. ie. the user say `doggo` is an attribute to retrieve. Instead of iterating on all the attributes to retrieve to check if `doggo.name` is a subset of `doggo`. I should insert `doggo.name` in the attributes to retrieve map. Co-authored-by: Tamo <tamo@meilisearch.com>	2022-04-11 11:45:37 +00:00
bors[bot]	a70e0a6422	Merge #2304 2304: chore(bors): comments clippy out r=curquiza a=irevoire There is currently an issue with clippy that stops us from merging PRs. https://github.com/rust-lang/rust-clippy/issues/8662#issuecomment-1093899755 We can't use clippy in the CI while that's not merged Co-authored-by: Tamo <tamo@meilisearch.com>	2022-04-11 11:20:18 +00:00
Tamo	348345f555	chore(bors): comments clippy out There is currently an issue with clippy that stops us from merging PRs. https://github.com/rust-lang/rust-clippy/issues/8662#issuecomment-1093899755 We can't use clippy in the CI while that's not merged	2022-04-11 13:19:00 +02:00
Tamo	683206e140	feat(docker): refactoring the dockerfile - Move the meilisearch binary to `/bin/meilisearch` so it's always in scope. - Create a `meili_data` directory used as the default working directory	2022-04-11 13:14:44 +02:00
bors[bot]	c8306616e0	Merge #490 490: Enforce labelling for the PRs r=curquiza a=curquiza - Enforce one of the following labels to make the CI pass: `no breaking`, `DB breaking`, `API breaking` (milli API, not the Meilisearch API of course), or `skip changelog`. This new CI is now `Required` in the GitHub settings for merging a PR. - Adapt the release drafter to these new labels - rename `skip-changelog` into `skip changelog` according to the new label name Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>	2022-04-11 08:24:23 +00:00
Clémentine Urquizar	9383629d13	Enforce labelling for the PRs	2022-04-09 23:47:06 +02:00
ManyTheFish	a16de5de84	Symplify format and remove intermediate function	2022-04-08 11:20:41 +02:00
ManyTheFish	a769e09dfa	Make token_crop_bounds more rust idiomatic	2022-04-07 20:15:14 +02:00
Tamo	69d312209e	feat(search): Implements the nested fields See https://github.com/meilisearch/specifications/pull/121	2022-04-07 19:47:20 +02:00
bors[bot]	9ac2fd1c37	Merge #487 487: Update version (v0.26.0) r=Kerollmops a=curquiza breaking because of #458 Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>	2022-04-07 17:10:24 +00:00
bors[bot]	80ae020bee	Merge #458 458: Nested fields r=Kerollmops a=irevoire For the following document: ```json { "id": 1, "person": { "name": "tamo", "age": 25, } } ``` Suppose the user sets `person` as a filterable attribute. We need to store `person` in the filterable _obviously_. But we also need to keep track of `person.name` and `person.age` somewhere. That’s where I changed a little bit the logic of the engine. Currently, we have a function called `faceted_field` that returns the union of the filterable and sortable. I renamed this function in `user_defined_faceted_field`. And now, when we finish indexing documents, we look at all the fields and see if they « match » a `user_defined_faceted_field`. So in our case: - does `id` match `person`: 🔴 - does `person.name` match `person`: 🟢 - does `person.age` match `person`: 🟢 And thus, we insert in the database the following faceted fields: `person, person.name, person.age`. The good thing about that solution is that we generate everything during the indexing phase, and then during the search, we can access our field without recomputing too much globbing. ----- Now the bad thing is that I had to create a new db. And if that was only one db, that would be ok, but actually, I need to do the same for the: - Displayed attributes - Attributes to retrieve - Attributes to highlight - Attribute to crop `@Kerollmops` Do you think there is a better way to do it? Apart from all the code, can we have a problem because we have too many dbs? Co-authored-by: Irevoire <tamo@meilisearch.com> Co-authored-by: Tamo <tamo@meilisearch.com>	2022-04-07 16:26:09 +00:00
Tamo	bab898ce86	move the flatten-serde-json crate inside of milli	2022-04-07 18:20:44 +02:00

... 66 67 68 69 70 ...

8509 Commits