MeiliSearch

mirror of https://github.com/meilisearch/MeiliSearch synced 2025-06-30 10:28:30 +02:00

Author	SHA1	Message	Date
Tamo	399fba16bb	only flatten an object if it's nested	2022-04-14 11:14:08 +02:00
Tamo	c2469b6765	create the json-depth-checker crate	2022-04-14 11:14:08 +02:00
bors[bot]	7791ef90e7	Merge #493 493: Use smartstring to store the external id in our hashmap r=Kerollmops a=irevoire We need to store all the external id (primary key) in a hashmap associated to their internal id. The smartstring remove heap allocation / memory usage and should improve the cache locality. I ran the benchmarks to measure the impact of this PR on the indexing time. I think we should merge it whatever happens thought because it'll decrease the memory consumption. --------- This improve really sliiiiiightly the performances but improve the memory usage thus it should be merged. ``` group indexing_main_6b073738 indexing_use-smartsring_3f343511 ----- ---------------------- -------------------------------- indexing/Indexing geo_point 1.02 25.2±0.20s ? ?/sec 1.00 24.8±0.13s ? ?/sec indexing/Indexing movies in three batches 1.00 18.2±0.10s ? ?/sec 1.00 18.2±0.23s ? ?/sec indexing/Indexing movies with default settings 1.00 17.5±0.09s ? ?/sec 1.01 17.7±0.11s ? ?/sec indexing/Indexing songs in three batches with default settings 1.00 68.3±1.01s ? ?/sec 1.00 68.0±0.95s ? ?/sec indexing/Indexing songs with default settings 1.00 63.2±0.78s ? ?/sec 1.00 63.0±0.58s ? ?/sec indexing/Indexing songs without any facets 1.02 59.6±1.00s ? ?/sec 1.00 58.5±1.03s ? ?/sec indexing/Indexing songs without faceted numbers 1.00 62.8±0.38s ? ?/sec 1.00 62.6±1.02s ? ?/sec indexing/Indexing wiki 1.01 1009.2±25.25s ? ?/sec 1.00 998.1±11.27s ? ?/sec indexing/Indexing wiki in three batches 1.01 1142.0±9.97s ? ?/sec 1.00 1134.4±11.21s ? ?/sec ``` Co-authored-by: Tamo <tamo@meilisearch.com>	2022-04-13 20:28:28 +00:00
Tamo	ee64f4a936	Use smartstring to store the external id in our hashmap We need to store all the external id (primary key) in a hashmap associated to their internal id during. The smartstring remove heap allocation / memory usage and should improve the cache locality.	2022-04-13 21:22:07 +02:00
bors[bot]	456887a54a	Merge #496 496: Improve the performances of the flattening subcrate r=irevoire a=Kerollmops This PR adds some benchmarks to the _flatten-serde-json_ crate, this crate is responsible for transforming the original documents into flat versions that the engine can understand. It can probably be speed-up and this is why I added benchmarks to it. I make some interesting performance improvements when I replaced the `json!` macro calls. ``` flatten/simple time: [452.44 ns 453.31 ns 454.18 ns] change: [-15.036% -14.751% -14.473%] (p = 0.00 < 0.05) Performance has improved. Found 2 outliers among 100 measurements (2.00%) 2 (2.00%) high mild Benchmarking flatten/complex: Collecting 100 samples in estimated 5.0007 s (4.9M i flatten/complex time: [1.0101 us 1.0131 us 1.0160 us] change: [-18.001% -17.775% -17.536%] (p = 0.00 < 0.05) Performance has improved. Found 6 outliers among 100 measurements (6.00%) 5 (5.00%) high mild 1 (1.00%) high severe ``` --- _I removed this particular commit from this PR._ The reason is that the two other commits were enough for this PR to give enough impact and be merged. We will continue to explore where we can get performances later. But when I changed the flattening function to accept an owned version of the objects, we lost a lot of performances. Yes, I rewrote the benchmarks (locally) to clone the input object (and measured both, previous and new versions, with the cloning benchmarks). Maybe cloning the benchmark inputs is not the right thing to do... ``` Benchmarking flatten/simple: Collecting 100 samples in estimated 5.0005 s (6.7M it flatten/simple time: [746.46 ns 749.59 ns 752.70 ns] change: [+40.082% +40.714% +41.347%] (p = 0.00 < 0.05) Performance has regressed. Benchmarking flatten/complex: Collecting 100 samples in estimated 5.0047 s (2.9M i flatten/complex time: [1.7311 us 1.7342 us 1.7368 us] change: [+40.976% +41.398% +41.807%] (p = 0.00 < 0.05) Performance has regressed. Found 1 outliers among 100 measurements (1.00%) 1 (1.00%) low mild ``` Co-authored-by: Kerollmops <clement@meilisearch.com>	2022-04-13 11:14:29 +00:00
Kerollmops	b3cec1a383	Prefer using direct method calls instead of using the json macros	2022-04-13 13:12:57 +02:00
Kerollmops	436d2032c4	Add benchmarks to the flatten-serde-json subcrate	2022-04-13 13:12:57 +02:00
bors[bot]	3828635fb2	Merge #489 489: fix distinct count bug r=curquiza a=MarinPostma fix https://github.com/meilisearch/meilisearch/issues/2152 I think the issue was that we didn't take off the excluded candidates from the initial candidates when returning the candidates with the search result. Co-authored-by: ad hoc <postma.marin@protonmail.com>	2022-04-13 10:15:30 +00:00
ad hoc	dda28d7415	exclude excluded canditates from search result candidates	2022-04-13 12:10:35 +02:00
ad hoc	cd83014fff	add test for disctinct nb hits	2022-04-13 12:10:35 +02:00
ad hoc	bbb6728d2f	add distinct attributes to cli	2022-04-13 12:10:35 +02:00
bors[bot]	49fbbacafc	Merge #492 492: Add the new `Specify breaking` check to bors.toml r=curquiza a=curquiza Should prevent this problem: https://github.com/meilisearch/milli/pull/489#issuecomment-1094988060 Co-authored-by: Clémentine Urquizar - curqui <clementine@meilisearch.com>	2022-04-13 08:59:40 +00:00
Clémentine Urquizar - curqui	7ad582f39f	Update bors.toml	2022-04-13 10:56:56 +02:00
Clémentine Urquizar - curqui	aa896f0e7a	Update bors.toml	2022-04-13 10:56:56 +02:00
Clémentine Urquizar - curqui	0261a0e3cf	Add the new `Specify breaking` check to bors.toml	2022-04-13 10:56:55 +02:00
bors[bot]	6b0737384b	Merge #491 491: remove the unused key warning r=curquiza a=irevoire When I copy-pasted my flatten crate I forgot to remove the key used to publish the package and that throw a warning. Co-authored-by: Tamo <tamo@meilisearch.com>	2022-04-11 16:55:25 +00:00
Tamo	e153418b8a	remove the unused key warning	2022-04-11 14:52:41 +02:00
bors[bot]	c8306616e0	Merge #490 490: Enforce labelling for the PRs r=curquiza a=curquiza - Enforce one of the following labels to make the CI pass: `no breaking`, `DB breaking`, `API breaking` (milli API, not the Meilisearch API of course), or `skip changelog`. This new CI is now `Required` in the GitHub settings for merging a PR. - Adapt the release drafter to these new labels - rename `skip-changelog` into `skip changelog` according to the new label name Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>	2022-04-11 08:24:23 +00:00
Clémentine Urquizar	9383629d13	Enforce labelling for the PRs	2022-04-09 23:47:06 +02:00
bors[bot]	9ac2fd1c37	Merge #487 487: Update version (v0.26.0) r=Kerollmops a=curquiza breaking because of #458 Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>	2022-04-07 17:10:24 +00:00
bors[bot]	80ae020bee	Merge #458 458: Nested fields r=Kerollmops a=irevoire For the following document: ```json { "id": 1, "person": { "name": "tamo", "age": 25, } } ``` Suppose the user sets `person` as a filterable attribute. We need to store `person` in the filterable _obviously_. But we also need to keep track of `person.name` and `person.age` somewhere. That’s where I changed a little bit the logic of the engine. Currently, we have a function called `faceted_field` that returns the union of the filterable and sortable. I renamed this function in `user_defined_faceted_field`. And now, when we finish indexing documents, we look at all the fields and see if they « match » a `user_defined_faceted_field`. So in our case: - does `id` match `person`: 🔴 - does `person.name` match `person`: 🟢 - does `person.age` match `person`: 🟢 And thus, we insert in the database the following faceted fields: `person, person.name, person.age`. The good thing about that solution is that we generate everything during the indexing phase, and then during the search, we can access our field without recomputing too much globbing. ----- Now the bad thing is that I had to create a new db. And if that was only one db, that would be ok, but actually, I need to do the same for the: - Displayed attributes - Attributes to retrieve - Attributes to highlight - Attribute to crop `@Kerollmops` Do you think there is a better way to do it? Apart from all the code, can we have a problem because we have too many dbs? Co-authored-by: Irevoire <tamo@meilisearch.com> Co-authored-by: Tamo <tamo@meilisearch.com>	2022-04-07 16:26:09 +00:00
Tamo	bab898ce86	move the flatten-serde-json crate inside of milli	2022-04-07 18:20:44 +02:00
Tamo	ab458d8840	fix tests after rebase	2022-04-07 17:00:00 +02:00
Irevoire	4f3ce6d9cd	nested fields	2022-04-07 16:58:46 +02:00
Clémentine Urquizar	ee1d627803	Update version (v0.26.0)	2022-04-07 15:56:10 +02:00
bors[bot]	4ae7aea3b2	Merge #486 486: Update version (v0.25.0) r=curquiza a=curquiza v0.25.0 will be released once #478 is merged Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>	2022-04-06 11:40:41 +00:00
bors[bot]	aadb0c58c9	Merge #478 478: Disable typo on attribute r=Kerollmops a=MarinPostma disable typo on attributes Co-authored-by: ad hoc <postma.marin@protonmail.com>	2022-04-05 23:45:35 +00:00
ad hoc	86249e2ae4	add missing \t in cli update display Co-authored-by: Clément Renault <clement@meilisearch.com>	2022-04-05 21:35:06 +02:00
ad hoc	b799f3326b	rename merge_nothing to merge_ignore_values	2022-04-05 18:44:35 +02:00
ad hoc	201fea0fda	limit extract_word_docids memory usage	2022-04-05 14:14:15 +02:00
ad hoc	5cfd3d8407	add exact attributes documentation	2022-04-05 14:10:22 +02:00
Clémentine Urquizar	9eec44dd98	Update version (v0.25.0)	2022-04-05 12:06:42 +02:00
ad hoc	b85cd4983e	remove field_id_from_position	2022-04-05 09:50:34 +02:00
ad hoc	dac81b2d44	add missing \n in cli settings	2022-04-05 09:48:56 +02:00
ad hoc	ab185a59b5	fix infos	2022-04-05 09:46:56 +02:00
ad hoc	59e41d98e3	add comments to integration test	2022-04-04 21:17:06 +02:00
ad hoc	1810927dbd	rephrase exact_attributes doc	2022-04-04 21:04:49 +02:00
ad hoc	b7694c34f5	remove println	2022-04-04 21:00:07 +02:00
ad hoc	6cabd47c32	fix typo in comment	2022-04-04 20:59:20 +02:00
ad hoc	9963f11172	fix infos crate compilation issue	2022-04-04 20:54:03 +02:00
ad hoc	c8d3a09af8	add integration test for disabel typo on attributes	2022-04-04 20:54:03 +02:00
ad hoc	bfd81ce050	add exact atttributes to cli settings	2022-04-04 20:54:03 +02:00
ad hoc	6b2c2509b2	fix bug in exact search	2022-04-04 20:54:03 +02:00
ad hoc	56b4f5dce2	add exact prefix to query_docids	2022-04-04 20:54:03 +02:00
ad hoc	21ae4143b1	add exact_word_prefix to Context	2022-04-04 20:54:03 +02:00
ad hoc	e8f06f6c06	extract exact_word_prefix_docids	2022-04-04 20:54:03 +02:00
ad hoc	6dd2e4ffbd	introduce exact_word_prefix database in index	2022-04-04 20:54:03 +02:00
ad hoc	ba0bb29cd8	refactor WordPrefixDocids to take dbs instead of indexes	2022-04-04 20:54:02 +02:00
ad hoc	c4c6e35352	query exact_word_docids in resolve_query_tree	2022-04-04 20:54:02 +02:00
ad hoc	8d46a5b0b5	extract exact word docids	2022-04-04 20:54:02 +02:00

1 2 3 4 5 ...

1662 Commits