Commit Graph

9257 Commits

Author SHA1 Message Date
ManyTheFish
87e3d27878
update extract word pair proximity to support deladd obkvs 2023-10-30 11:34:02 +01:00
ManyTheFish
6bcf8b4f8c
update extract word position docids 2023-10-30 11:34:02 +01:00
ManyTheFish
46aa75abdb
update extract word docids 2023-10-30 11:34:02 +01:00
ManyTheFish
2597bbd107
Make script language docids map taking a tuple of roaring bitmaps expressing the deletions and the additions 2023-10-30 11:34:00 +01:00
Clément Renault
e2bc054604
Update extract_facet_string_docids to support deladd obkvs 2023-10-30 11:32:36 +01:00
Clément Renault
fcd3a1434d
Update extract_facet_number_docids to support deladd obkvs 2023-10-30 11:31:04 +01:00
Clément Renault
a82dee21e0
Rename docid_fid into fid_docid 2023-10-30 11:31:02 +01:00
Clément Renault
bc45c1206d
Implement all the facet extraction paths and simplify them 2023-10-30 11:29:08 +01:00
Clément Renault
6ae4100f07
Generate the DelAdd for is_null, is_empty, and exists 2023-10-30 11:29:08 +01:00
Clément Renault
0c47defeee
Work on fid docid facet values rewrite 2023-10-30 11:29:06 +01:00
ManyTheFish
313b16bec2
Support diff indexing on extract_docid_word_positions 2023-10-30 11:24:19 +01:00
ManyTheFish
1dd97578a8
Make the transform struct return diff-based documents obkvs 2023-10-30 11:22:07 +01:00
ManyTheFish
f5ef69293b
deactivate prefix dbs 2023-10-30 11:22:07 +01:00
ManyTheFish
1c5705c164
clean PR warnings 2023-10-30 11:22:05 +01:00
ManyTheFish
66c2c82a18
Split wpp in several sorters 2023-10-30 11:15:02 +01:00
ManyTheFish
28a8d0ccda
Fix word pair proximity 2023-10-30 11:15:02 +01:00
ManyTheFish
96be85396d
Use a vecDeque in wpp database 2023-10-30 11:15:02 +01:00
ManyTheFish
df9e5c8651
Generalize usage of CboRoaringBitmap codec to ease the use 2023-10-30 11:15:02 +01:00
ManyTheFish
b541d48847
Add buffer to the obkv writter 2023-10-30 11:15:02 +01:00
ManyTheFish
8ccf32d1a0
Compute word_fid_docids before word_docids and exact_word_docids 2023-10-30 11:15:02 +01:00
ManyTheFish
db1ca21231
add puffin in sorter into reeder function 2023-10-30 11:15:00 +01:00
ManyTheFish
11ea5acff9
Fix 2023-10-30 11:13:10 +01:00
ManyTheFish
8d77736a67
Fix fid_word_docids 2023-10-30 11:13:10 +01:00
ManyTheFish
748b333161
Add usefull debug assert before key insertion in database 2023-10-30 11:13:10 +01:00
ManyTheFish
17b647dfe5
Wip 2023-10-30 11:13:08 +01:00
meili-bors[bot]
2614e7d9ca
Merge #4174
4174: Fix warnings r=dureuill a=irevoire

Fix all the warnings found in the CI: https://github.com/meilisearch/meilisearch/actions/runs/6622576021/job/17988323623

Co-authored-by: Tamo <tamo@meilisearch.com>
2023-10-30 10:12:54 +00:00
Tamo
e7244aa485 fix warnings 2023-10-30 11:00:46 +01:00
meili-bors[bot]
9cacc82307
Merge #4169
4169: update charabia r=curquiza a=ManyTheFish

Update Charabia to v0.8.5 and add the new khmer tokenizer

Co-authored-by: ManyTheFish <many@meilisearch.com>
2023-10-26 17:21:30 +00:00
ManyTheFish
4c6fddb1cb update charabia 2023-10-26 17:01:10 +02:00
meili-bors[bot]
62ea81bef6
Merge #4132
4132: Extract the creation and last updated timestamp from v2 dumps r=irevoire a=vivek-26

# Pull Request

## Related issue
Fixes #2989

## What does this PR do?
This PR - 
- extracts the `created_at` and `updated_at` dates from v2 dumps.
- updates the unit tests.

## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?

Thank you so much for contributing to Meilisearch!


Co-authored-by: Vivek Kumar <vivek.26@outlook.com>
2023-10-24 08:50:57 +00:00
Vivek Kumar
f28f09ae2f
update tests for v2 dumps 2023-10-24 14:10:46 +05:30
meili-bors[bot]
ca52021079
Merge #4154
4154: Update version for the next release (v1.5.0) in Cargo.toml r=curquiza a=meili-bot

⚠️ This PR is automatically generated. Check the new version is the expected one and Cargo.lock has been updated before merging.

Co-authored-by: curquiza <curquiza@users.noreply.github.com>
2023-10-23 12:00:50 +00:00
curquiza
ee6f79d60b Update version for the next release (v1.5.0) in Cargo.toml 2023-10-23 11:49:07 +00:00
meili-bors[bot]
e4c24ca6a3
Merge #4151
4151: Bring back changes from v1.4.2 into `release-v1.5.0` r=dureuill a=curquiza

This will bring the fixes in v1.4.2 for v1.5.0 release

Co-authored-by: curquiza <curquiza@users.noreply.github.com>
Co-authored-by: Vivek Kumar <vivek.26@outlook.com>
Co-authored-by: Louis Dureuil <louis.dureuil@gmail.com>
2023-10-23 10:11:11 +00:00
Louis Dureuil
2bae9550c8
Add explanatory comment 2023-10-23 12:06:28 +02:00
Vivek Kumar
32c78ac8b1
add/update tests when search with distinct attribute & pagination with no ranking 2023-10-23 12:06:27 +02:00
Vivek Kumar
5fe7c4545a
compute all candidates correctly when skipping 2023-10-23 12:02:45 +02:00
curquiza
2042229927
Update version for the next release (v1.4.2) in Cargo.toml 2023-10-23 12:02:45 +02:00
meili-bors[bot]
eae9eab181
Merge #4126
4126: Make the experimental route /metrics activable via HTTP r=dureuill a=braddotcoffee

# Pull Request

## Related issue
Closes #4086

## What does this PR do?
- [x] Make `/metrics` available via HTTP as described in #4086 
- [x] The users can still launch Meilisearch using the `--experimental-enable-metrics` flag.
- [x] If the flag `--experimental-enable-metrics` is activated, a call to the `GET /experimental-features` route right after the launch will show `"metrics": true` even if the user has not called the `PATCH /experimental-features` route yet.
- [x] Even if the --experimental-enable-metrics flag is present at launch, calling the `PATCH /experimental-features` route with `"metrics": false` disables the experimental feature.
- [x] Update the spec
    - I was unable to find docs in this repository to update about the `/experimental-features` endpoint. I'll happily update if you point me in the right direction!

## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?

Co-authored-by: bwbonanno <bradfordbonanno@gmail.com>
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2023-10-23 08:51:37 +00:00
Louis Dureuil
cf8dad1ca0
index_scheduler.features() is no longer fallible 2023-10-23 10:38:56 +02:00
bwbonanno
dd619913da Use RwLock to never persist cli state to db 2023-10-19 12:45:57 -07:00
meili-bors[bot]
9b55ff16e9
Merge #4134
4134: Bump rustix from 0.36.15 to 0.36.16 r=Kerollmops a=dependabot[bot]

Bumps [rustix](https://github.com/bytecodealliance/rustix) from 0.36.15 to 0.36.16.
<details>
<summary>Commits</summary>
<ul>
<li><a href="6534992521"><code>6534992</code></a> chore: Release rustix version 0.36.16</li>
<li><a href="4928cf7a38"><code>4928cf7</code></a> Disable riscv64 testing.</li>
<li><a href="8cc159c4c3"><code>8cc159c</code></a> Fix the <code>test_ttyname_ok</code> test when /dev/stdin is inaccessable. (<a href="https://redirect.github.com/bytecodealliance/rustix/issues/821">#821</a>)</li>
<li><a href="6dc7ba9478"><code>6dc7ba9</code></a> Downgrade dependencies and disable tests to compile under Rust 1.48.</li>
<li><a href="ded8986e7e"><code>ded8986</code></a> Disable MIPS in CI. (<a href="https://redirect.github.com/bytecodealliance/rustix/issues/793">#793</a>)</li>
<li><a href="739f9c3ba0"><code>739f9c3</code></a> Fixes for <code>Dir</code> on macOS, FreeBSD, and WASI.</li>
<li><a href="87481a97f4"><code>87481a9</code></a> Merge pull request from GHSA-c827-hfw6-qwvm</li>
<li>See full diff in <a href="https://github.com/bytecodealliance/rustix/compare/v0.36.15...v0.36.16">compare view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=rustix&package-manager=cargo&previous-version=0.36.15&new-version=0.36.16)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting ``@dependabot` rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- ``@dependabot` rebase` will rebase this PR
- ``@dependabot` recreate` will recreate this PR, overwriting any edits that have been made to it
- ``@dependabot` merge` will merge this PR after your CI passes on it
- ``@dependabot` squash and merge` will squash and merge this PR after your CI passes on it
- ``@dependabot` cancel merge` will cancel a previously requested merge and block automerging
- ``@dependabot` reopen` will reopen this PR if it is closed
- ``@dependabot` close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
- ``@dependabot` show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency
- ``@dependabot` ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
- ``@dependabot` ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
- ``@dependabot` ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
You can disable automated security fix PRs for this repo from the [Security Alerts page](https://github.com/meilisearch/meilisearch/network/alerts).

</details>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-10-19 08:01:36 +00:00
dependabot[bot]
e761db582f
Bump rustix from 0.36.15 to 0.36.16
Bumps [rustix](https://github.com/bytecodealliance/rustix) from 0.36.15 to 0.36.16.
- [Release notes](https://github.com/bytecodealliance/rustix/releases)
- [Commits](https://github.com/bytecodealliance/rustix/compare/v0.36.15...v0.36.16)

---
updated-dependencies:
- dependency-name: rustix
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
2023-10-18 18:42:12 +00:00
bwbonanno
d8c649b3cd Return recoverable error if we fail to retrieve metrics state 2023-10-18 08:28:24 -07:00
meili-bors[bot]
5e0485d8dd
Merge #4131
4131: Reduce proximity range from 7 to 3 r=Kerollmops a=ManyTheFish

## Summary
This PR aims to reduce the impact of the proximity databases on the indexing time and on the database size by reducing the maximum distance between two words to be indexed in the proximity database.

## Stats

### Impact on database size and indexing time
![Impact on datasets](https://github.com/meilisearch/meilisearch/assets/6482087/28ed3d96-bdde-41c1-bdac-e90c1b1dbb23)

### Impact on search relevancy

<details>

| dataset_name | host_name        | Relevancy rate (Precision) | completion_rate  25.00% | completion_rate 50.00% | completion_rate 75.00% | completion_rate 100.00% |
|--------------|------------------|------------------------------------|-----------------|-----------------|-----------------|-----------------|
| FBIS         | 1_4_0            | percentile-10 |           0.00% |           0.00% |           0.00% |           0.00% |
| FBIS         | 1_4_0            | percentile-25 |           0.00% |           0.00% |           0.00% |           0.00% |
| FBIS         | 1_4_0            | percentile-50 |           0.00% |           0.00% |           5.00% |           5.56% |
| FBIS         | 1_4_0            | percentile-75 |           0.00% |          12.50% |          35.00% |          45.00% |
| FBIS         | 1_4_0            | percentile-90 |          20.00% |          40.00% |                 |         100.00% |
| FBIS         | 1_4_0            | average       |           5.78% |          11.16% |          21.90% |          26.29% |
| FBIS         | reduce_proximity | percentile-10 |           0.00% |           0.00% |           0.00% |           0.00% |
| FBIS         | reduce_proximity | percentile-25 |           0.00% |           0.00% |           0.00% |           0.00% |
| FBIS         | reduce_proximity | percentile-50 |           0.00% |           0.00% |           5.00% |           5.56% |
| FBIS         | reduce_proximity | percentile-75 |           0.00% |          15.00% |          35.00% |          40.00% |
| FBIS         | reduce_proximity | percentile-90 |          20.00% |          40.00% |          85.00% |         100.00% |
| FBIS         | reduce_proximity | average       |           5.55% |          11.34% |          21.75% |          26.14% |
| FR94         | 1_4_0            | percentile-10 |           0.00% |           0.00% |           0.00% |           0.00% |
| FR94         | 1_4_0            | percentile-25 |           0.00% |           0.00% |           0.00% |           0.00% |
| FR94         | 1_4_0            | percentile-50 |           0.00% |           0.00% |           0.00% |           0.00% |
| FR94         | 1_4_0            | percentile-75 |           0.00% |           5.00% |          15.00% |          42.11% |
| FR94         | 1_4_0            | percentile-90 |          15.00% |          54.55% |         100.00% |         100.00% |
| FR94         | 1_4_0            | average       |           5.95% |          12.07% |          18.70% |          25.57% |
| FR94         | reduce_proximity | percentile-10 |           0.00% |           0.00% |           0.00% |           0.00% |
| FR94         | reduce_proximity | percentile-25 |           0.00% |           0.00% |           0.00% |           0.00% |
| FR94         | reduce_proximity | percentile-50 |           0.00% |           0.00% |           0.00% |           0.00% |
| FR94         | reduce_proximity | percentile-75 |           0.00% |           5.00% |          15.00% |          42.11% |
| FR94         | reduce_proximity | percentile-90 |          15.00% |          54.55% |         100.00% |         100.00% |
| FR94         | reduce_proximity | average       |           5.79% |          12.00% |          18.70% |          25.53% |
| FT           | 1_4_0            | percentile-10 |           0.00% |           0.00% |           0.00% |           0.00% |
| FT           | 1_4_0            | percentile-25 |           0.00% |           0.00% |           0.00% |           0.00% |
| FT           | 1_4_0            | percentile-50 |           0.00% |           0.00% |           5.00% |          10.00% |
| FT           | 1_4_0            | percentile-75 |           0.00% |          15.00% |          30.00% |          40.00% |
| FT           | 1_4_0            | percentile-90 |          20.00% |          50.00% |          65.00% |         100.00% |
| FT           | 1_4_0            | average       |           5.08% |          12.58% |          20.00% |          25.49% |
| FT           | reduce_proximity | percentile-10 |           0.00% |           0.00% |           0.00% |           0.00% |
| FT           | reduce_proximity | percentile-25 |           0.00% |           0.00% |           0.00% |           0.00% |
| FT           | reduce_proximity | percentile-50 |           0.00% |           0.00% |           5.00% |          10.00% |
| FT           | reduce_proximity | percentile-75 |           0.00% |          15.00% |          30.00% |          40.00% |
| FT           | reduce_proximity | percentile-90 |          10.00% |          45.00% |          60.00% |         100.00% |
| FT           | reduce_proximity | average       |           5.01% |          12.64% |          20.10% |          25.53% |
| LAT          | 1_4_0            | percentile-10 |           0.00% |           0.00% |           0.00% |           0.00% |
| LAT          | 1_4_0            | percentile-25 |           0.00% |           0.00% |           0.00% |           0.00% |
| LAT          | 1_4_0            | percentile-50 |           0.00% |           0.00% |           5.00% |           5.00% |
| LAT          | 1_4_0            | percentile-75 |           5.00% |          15.00% |          30.00% |          30.00% |
| LAT          | 1_4_0            | percentile-90 |          15.00% |          45.00% |          60.00% |          80.00% |
| LAT          | 1_4_0            | average       |           4.80% |          11.80% |          17.88% |          21.62% |
| LAT          | reduce_proximity | percentile-10 |           0.00% |           0.00% |           0.00% |           0.00% |
| LAT          | reduce_proximity | percentile-25 |           0.00% |           0.00% |           0.00% |           0.00% |
| LAT          | reduce_proximity | percentile-50 |           0.00% |           0.00% |           5.00% |           5.00% |
| LAT          | reduce_proximity | percentile-75 |           0.00% |          11.11% |          25.00% |          35.00% |
| LAT          | reduce_proximity | percentile-90 |          15.00% |          45.00% |          55.00% |          80.00% |
| LAT          | reduce_proximity | average       |           4.43% |          11.23% |          17.32% |          21.45% |

</details>

### Impact on Search time

| dataset_name | host_name        |      25.00% |      50.00% |      75.00% |     100.00% | Average     |
|--------------|------------------|------------:|------------:|------------:|------------:|-------------|
| FBIS         | 1_4_0            |        3.45 | 7.446666667 | 9.773489933 | 9.620300752 | 7.572614338 |
| FBIS         | reduce_proximity | 2.983333333 | 5.316666667 | 6.911073826 | 7.637218045 | 5.712072968 |
| FR94         | 1_4_0            | 2.236666667 |        4.45 | 5.523489933 | 4.560150376 | 4.192576744 |
| FR94         | reduce_proximity |        2.09 | 3.991666667 | 4.981543624 | 4.266917293 | 3.832531896 |
| FT           | 1_4_0            | 5.956666667 | 9.656666667 | 13.86912752 | 10.83270677 |  10.0787919 |
| FT           | reduce_proximity |        4.51 | 5.981666667 | 7.701342282 | 6.766917293 |  6.23998156 |
| LAT          | 1_4_0            | 5.856666667 | 9.233333333 | 12.98322148 | 10.78759398 | 9.715203865 |
| LAT          | reduce_proximity |        6.91 | 6.706666667 | 8.463087248 | 8.265037594 | 7.586197877 |

## Technical approach

- Ensure the MAX_DISTANCE constant is used everywhere needed
- Reduce the MAX_DISTANCE from 8 to 4

## Related

TBD

Co-authored-by: ManyTheFish <many@meilisearch.com>
2023-10-18 14:56:08 +00:00
ManyTheFish
27eec21415 Fix tests 2023-10-18 16:03:22 +02:00
Vivek Kumar
62cc97ba70
update tests to include created_at and updated-at in v2 dumps 2023-10-18 13:31:39 +05:30
Vivek Kumar
fed59cc1d5
extract created_at and updated_at dates from v2 dumps 2023-10-18 13:30:24 +05:30
bwbonanno
2b3adef796 Use index_scheduler from configured app_data in middleware 2023-10-17 08:17:13 -07:00
bwbonanno
956cfc5487 Add runtime check to metrics middleware 2023-10-16 13:48:57 -07:00