Commit Graph

10042 Commits

Author SHA1 Message Date
bors[bot]
5e754b3ee0
Merge #708
708: Reduce memory usage of the MatchingWords structure r=ManyTheFish a=loiclec

# Pull Request

## Related issue
Fixes (partially) https://github.com/meilisearch/meilisearch/issues/3115 

## What does this PR do?
1. Reduces the memory usage caused by the creation of a 10-word query tree by 20x. 
   This is done by deduplicating the `MatchingWord` values, which are heavy because of their inner DFA. The deduplication works by wrapping each `MatchingWord` in a reference-counted box and using a hash map to determine whether a  `MatchingWord` DFA already exists for a certain signature, or whether a new one needs to be built.
 
2. Avoid the worst-case scenario of creating a `MatchingWord` for extremely long words that cannot be indexed by milli.

Co-authored-by: Loïc Lecrenier <loic.lecrenier@me.com>
2022-11-30 17:47:34 +00:00
bors[bot]
e1612fcb01
Merge #712
712: Fix bulk facet indexing bug r=Kerollmops a=loiclec

# Pull Request

## Related issue
Fixes (partially, until merged into meilisearch) https://github.com/meilisearch/meilisearch/issues/3165

## What does this PR do?
Fixes a bug where indexing certain numbers of filterable attribute values in bulk led to corrupted facet databases. This was due to a lossy integer conversion which would ultimately prevent entire levels of the facet database to be written into LMDB.

More specifically, this change was made:
```diff
      - if cur_writer_len as u8 >= self.min_level_size {
      + if cur_writer_len >= self.min_level_size as usize {
```
I also checked other comparisons to `min_level_size` and other conversions such as `x as u8` in this part of the codebase.



Co-authored-by: Loïc Lecrenier <loic.lecrenier@me.com>
2022-11-30 16:51:48 +00:00
Clémentine Urquizar - curqui
da8044f91e
Update download-latest.sh
Co-authored-by: Louis Dureuil <louis.dureuil@gmail.com>
2022-11-30 16:55:32 +01:00
Clémentine Urquizar - curqui
9e3b1eb7a8
Update download-latest.sh
Co-authored-by: Louis Dureuil <louis.dureuil@gmail.com>
2022-11-30 16:55:27 +01:00
Loïc Lecrenier
9dd4b33a9a Fix bulk facet indexing bug 2022-11-30 14:27:36 +01:00
curquiza
eab1156f8c Update script to be used with macOS apple silicon 2022-11-30 14:14:46 +01:00
curquiza
8d405fad12 Simplify download-latest.sh script 2022-11-30 13:53:12 +01:00
jiangbo212
bf96b6df93 clippy fix change 2022-11-30 17:59:06 +08:00
jiangbo212
9c28632498 Merge branch 'main' into fix-3037 2022-11-30 09:38:01 +08:00
jiangbo212
38982d13fe fix issue 3037 2022-11-30 00:03:22 +08:00
bors[bot]
de22116b3d
Merge #711
711: Replace deprecated gh actions r=curquiza a=pnhatminh

# Pull Request

## Related issue
Fixes #678

## What does this PR do?
- Replace deprecated github action command with newly defined command.

## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?

Thank you so much for contributing to Meilisearch!


Co-authored-by: Minh Pham <minh.pham@codelink.io>
2022-11-29 09:56:22 +00:00
bors[bot]
6150aa73b0
Merge #3148
3148: Bring back `release-v0.30.0` into `main` r=Kerollmops a=curquiza

Following this message

https://github.com/meilisearch/meilisearch/pull/3145#issuecomment-1329296168

Co-authored-by: Clémentine Urquizar - curqui <clementine@meilisearch.com>
Co-authored-by: Tamo <tamo@meilisearch.com>
Co-authored-by: bors[bot] <26634292+bors[bot]@users.noreply.github.com>
2022-11-29 09:23:44 +00:00
Minh Pham
5f78522044 Updagte 2022-11-29 10:11:38 +07:00
Gregory Conrad
87e2bc3bed fix(reindex): reindex in a few more cases
Cases: whenever searchable_fields OR user_defined_searchable_fields is modified
2022-11-28 13:12:19 -05:00
bors[bot]
14840a24c6
Merge #3149
3149: Fix the dump tests r=Kerollmops a=irevoire

You'll need to trust me on this one. But the tests in the release-v0.30.0 were deactivated for a long time, and I don't know what was wrong with them.
Anyway, I checked that these SHA did match the tasks view we're expecting, and it looks good to me.

Co-authored-by: Tamo <tamo@meilisearch.com>
2022-11-28 15:48:36 +00:00
Tamo
41eb986f65
Fix the dump tests 2022-11-28 16:39:22 +01:00
Loïc Lecrenier
61b58b115a Don't create partial matching words for synonyms in ngrams 2022-11-28 16:32:28 +01:00
Clémentine Urquizar - curqui
457a473b72
Bring back release-v0.30.0 into release-v0.30.0-temp (final: into main) (#3145)
* Fix error code of the "duplicate index found" error

* Use the content of the ProcessingTasks in the tasks cancelation system

* Change the missing_filters error code into missing_task_filters

* WIP Introduce the invalid_task_uid error code

* Use more precise error codes/message for the task routes

+ Allow star operator in delete/cancel tasks
+ rename originalQuery to originalFilters
+ Display error/canceled_by in task view even when they are = null
+ Rename task filter fields by using their plural forms
+ Prepare an error code for canceledBy filter
+ Only return global tasks if the API key action `index.*` is there

* Add canceledBy task filter

* Update tests following task API changes

* Rename original_query to original_filters everywhere

* Update more insta-snap tests

* Make clippy happy

They're a happy clip now.

* Make rustfmt happy

>:-(

* Fix Index name parsing error message to fit the specification

* Bump milli version to 0.35.1

* Fix the new error messages

* fix the error messages and add tests

* rename the error codes for the sake of consistency

* refactor the way we send the cli informations + add the analytics for the config file and ssl usage

* Apply suggestions from code review

Co-authored-by: Clément Renault <clement@meilisearch.com>

* add a comment over the new infos structure

* reformat, sorry @kero

* Store analytics for the documents deletions

* Add analytics on all the settings

* Spawn threads with names

* Spawn rayon threads with names

* update the distinct attributes to the spec update

* update the analytics on the search route

* implements the analytics on the health and version routes

* Fix task details serialization

* Add the question mark to the task deletion query filter

* Add the question mark to the task cancelation query filter

* Fix tests

* add analytics on the task route

* Add all the missing fields of the new task query type
* Create a new analytics for the task deletion
* Create a new analytics for the task creation

* batch the tasks seen events

* Update the finite pagination analytics

* add the analytics of the swap-indexes route

* Stop removing the DB when failing to read it

* Rename originalFilters into originalFilters

* Rename matchedDocuments into providedIds

* Add `workflow_dispatch` to flaky.yml

* Bump grenad to 0.4.4

* Bump milli to version v0.37.0

* Don't multiply total memory returned by sysinfo anymore

sysinfo now returns bytes rather than KB

* Add a dispatch to the publish binaries workflow

* Fix publish release CI

* Don't use gold but the default linker

* Always display details for the indexDeletion task

* Fix the insta tests

* refactorize the whole test suite
1. Make a call to assert_internally_consistent automatically when snapshoting the scheduler. There is no point in snapshoting something broken and expect the dumb humans to notice.
2. Replace every possible call to assert_internally_consistent by a snapshot of the scheduler. It takes as many lines and ensure we never change something without noticing in any tests ever.
3. Name every snapshots: it's easier to debug when something goes wrong and easier to review in general.
4. Stop skipping breakpoints, it's too easy to miss something. Now you must explicitely show which path is the scheduler supposed to use.
5. Add a timeout on the channel.recv, it eases the process of writing tests, now when something file you get a failure instead of a deadlock.

* rebase on release-v0.30

* makes clippy happy

* update the snapshots after a rebase

* try to remove the flakyness of the failing test

* Add more analytics on the ranking rules positions

* Update the dump test to check for the dumpUid dumpCreation task details

* send the ranking rules as a string because amplitude is too dumb to process an array as a single value

* Display a null dumpUid until we computed the dump itself on disk

* Update tests

* Check if the master key is missing before returning an error

Co-authored-by: Loïc Lecrenier <loic.lecrenier@me.com>
Co-authored-by: bors[bot] <26634292+bors[bot]@users.noreply.github.com>
Co-authored-by: Kerollmops <clement@meilisearch.com>
Co-authored-by: ManyTheFish <many@meilisearch.com>
Co-authored-by: Tamo <tamo@meilisearch.com>
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2022-11-28 16:27:41 +01:00
Gregory Conrad
d3182f3830 refactor: Change return type to keep consistency with others 2022-11-28 10:02:03 -05:00
bors[bot]
f698e6cfdf
Merge #707
707: Add all_obkv_to_json function r=Kerollmops a=GregoryConrad

## What does this PR do?
When embedding milli in an application (other than Meilisearch), it often makes sense to not use the `displayed_attributes` functionality and instead just use milli as a full document store. Thus, this PR adds a function, `all_obkv_to_json`, to supplement the already exposed `milli::obkv_to_json` so that those embedding milli *do not* need to deal with `displayed_attributes` if they don't need to.

~This PR also introduces a slight breaking change: `obkv_to_json` now accepts a reference to `obkv::KvReaderU16` instead of taking ownership of it. As far as I can tell, this seems like a change for the better (`obkv_to_json` only acts upon `obkv` rather than consuming it), but I can change it back if you so desire.~ (reverted in [935a724](935a724c57))

## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?

Thank you so much for contributing to Meilisearch!


Co-authored-by: Gregory Conrad <gregorysconrad@gmail.com>
2022-11-28 14:52:45 +00:00
Loïc Lecrenier
f70856bab1 Remove memory usage test that fails when many tests are run in parallel 2022-11-28 12:55:28 +01:00
Loïc Lecrenier
80588daae5 Fix compilation error in formatting benches 2022-11-28 10:27:15 +01:00
Loïc Lecrenier
e2ebed62b1 Don't create partial matching words for synonyms, split words, phrases 2022-11-28 10:20:13 +01:00
Loïc Lecrenier
8284bd760f Relax memory ordering of operations within the test CountingAlloc 2022-11-28 10:20:13 +01:00
Loïc Lecrenier
8d0ace2d64 Avoid creating a MatchingWord for words that exceed the length limit 2022-11-28 10:20:13 +01:00
Loïc Lecrenier
86c34a996b Deduplicate matching words 2022-11-28 10:20:13 +01:00
Minh Pham
eba7af1d2c Replace deprecated gh actions 2022-11-27 06:47:08 +07:00
Gregory Conrad
e0d24104a3 refactor: Rewrite another method chain to be more readable 2022-11-26 13:33:19 -05:00
Gregory Conrad
2db738dbac refactor: rewrite method chain to be more readable 2022-11-26 13:26:39 -05:00
bors[bot]
84dd2e4df1
Merge #710
710: Update Clippy to use Rust Stable r=irevoire a=Kerollmops

This PR changes the CI to use Rust stable for Clippy and Rustfmt. This way we will reduce the number of times we break the CI. [The version will only change every two months or so](https://www.whatrustisit.com/).

Co-authored-by: Clément Renault <clement@meilisearch.com>
2022-11-24 15:57:04 +00:00
Clément Renault
3d06ea41ea
Keep a nightly for rustfmt
Co-authored-by: Tamo <tamo@meilisearch.com>
2022-11-24 16:54:40 +01:00
Clément Renault
3958db4b17
Update the CI to use Rust Stable 2022-11-24 16:26:48 +01:00
Gregory Conrad
935a724c57 revert: Revert pass by reference API change 2022-11-24 10:08:23 -05:00
bors[bot]
914f8b118c
Merge #3119
3119: Dump tests r=Kerollmops a=irevoire

Reenable the dump tests

Co-authored-by: Tamo <tamo@meilisearch.com>
2022-11-24 10:17:38 +00:00
Gregory Conrad
ed29cceae9 perf: Prevent reindex in searchable set case when not needed 2022-11-23 22:33:06 -05:00
Gregory Conrad
bb9e33bf85 perf: Prevent reindex in searchable reset case when not needed 2022-11-23 22:01:46 -05:00
Gregory Conrad
7c0e544839 feat: Add all_obkv_to_json function 2022-11-23 21:18:58 -05:00
Colby Allen
6ecd486d1b Bumps cargo_toml version to most up to date 2022-11-23 16:27:54 -07:00
Gregory Conrad
d19c8672bb perf: limit reindex to when exact_attributes changes 2022-11-23 15:50:53 -05:00
Tamo
7b8641a7af
fix the dump tests
The issue was linked to the fact that the debug implementation of the PhantomData wasn't the same between rust stable and rust nightly.
This was causing an issue while snapshsotting the settings and this commit fix it by representing the settings as json which already ignores the PhantomData
2022-11-23 16:59:20 +01:00
bors[bot]
f509d81ec9
Merge #3100
3100: Add a dispatch to the publish binaries workflow r=Kerollmops a=curquiza

Add `worklfow_dispatch` event to publish binaries workflow to allow the manually trigger

Co-authored-by: Kerollmops <clement@meilisearch.com>
2022-11-21 19:31:20 +00:00
Kerollmops
8d73ae80bb Add a dispatch to the publish binaries workflow 2022-11-21 18:50:57 +01:00
bors[bot]
57c9f03e51
Merge #697
697: Fix bug in prefix DB indexing r=loiclec a=loiclec

Where the batch's information was not properly updated in cases where only the proximity changed between two consecutive word pair proximities.

Closes partially https://github.com/meilisearch/meilisearch/issues/3043



Co-authored-by: Loïc Lecrenier <loic.lecrenier@me.com>
2022-11-17 15:22:01 +00:00
bors[bot]
00129e112a
Merge #3041
3041: Add `workflow_dispatch` to flaky.yml r=irevoire a=curquiza

To be able to run the job manual and don't wait for one week

Co-authored-by: Clémentine Urquizar - curqui <clementine@meilisearch.com>
2022-11-17 13:21:11 +00:00
bors[bot]
467e742bd1
Merge #702
702: Update version for the next release (v0.37.0) in Cargo.toml files r=ManyTheFish a=meili-bot

⚠️ This PR is automatically generated. Check the new version is the expected one before merging.

Co-authored-by: curquiza <curquiza@users.noreply.github.com>
2022-11-17 12:54:27 +00:00
curquiza
cd5aaa3a9f Update version for the next release (v0.37.0) in Cargo.toml files 2022-11-17 12:50:07 +00:00
bors[bot]
8ceb199dca
Merge #696
696: Fix Facet Indexing bugs r=Kerollmops a=loiclec

1. Handle keys with variable length correctly

Closes (partially) https://github.com/meilisearch/meilisearch/issues/3042 
This issue is now easily reproducible with the updated fuzz tests, which now generate keys with variable lengths.

2. Prevent adding facets to the database if their encoded value does not satisfy `valid_lmdb_key`.

Closes (partially) https://github.com/meilisearch/meilisearch/issues/2743
This fixes an indexing failure when a document had a filterable attribute containing a value whose length is higher than ~500 bytes. For now, this fix is just meant to prevent crashes. Better handling of long values of filterable attributes will be handled in a separate PR.


Co-authored-by: Loïc Lecrenier <loic.lecrenier@me.com>
2022-11-17 11:56:16 +00:00
Loïc Lecrenier
777eb3fa00 Add insta-snaps for test of bug 3043 2022-11-17 12:21:27 +01:00
Loïc Lecrenier
0caadedd3b Make clippy happy 2022-11-17 12:17:53 +01:00
Loïc Lecrenier
ac3baafbe8 Truncate facet values that are too long before indexing them 2022-11-17 11:29:42 +01:00