Commit Graph

  • a89aa9c4de support fetch data from Prospectus Blade He 2025-01-14 16:21:48 -0600
  • e230a5bf15 a little change Blade He 2025-01-09 12:19:24 -0600
  • 91c86bb983 update AUS Prospectus relevant configuration Blade He 2025-01-08 17:40:57 -0600
  • 0a867dcf07 complete configuration for AUS Prospectus Blade He 2025-01-07 16:25:13 -0600
  • 201a809ffa comment remove_abundant_data function Blade He 2025-01-06 15:27:43 -0600
  • c335992ced update requirements.txt Blade He 2025-01-06 13:56:09 -0600
  • 9348e32caa support more performance fee keywords Blade He 2025-01-06 13:14:20 -0600
  • 65e752e25a realize merge_output_data function, whether to output as this format, depends on confirmation with data/ developer teams Blade He 2024-12-18 09:19:55 -0600
  • 309bb714f6 fix issue for parsing data via Vision Function. Blade He 2024-12-11 16:49:04 -0600
  • d673a99e21 switch back to extract data from image stream directly, instead of getting text from image stream as the first step, then extract data from extracted text. The reason is: the quality of getting text from image steam is not good enough. Blade He 2024-12-10 16:17:47 -0600
  • f71e2968cc simplify code Blade He 2024-12-09 22:24:40 -0600
  • 75ea5e70de 1. support fetch data from messy-code page by ChatGPT4o Vision function. 2. multilingual share features configuration Blade He 2024-12-09 17:47:42 -0600
  • d96f77fe00 Split share class names which with multiple share classes in same line Blade He 2024-12-06 16:31:42 -0600
  • d79b05885d optimize prompts for TOR Blade He 2024-12-06 14:50:34 -0600
  • a25991e2bb 1. Set TOR reported name priority 2. Optimize investment mapping logic Blade He 2024-12-06 09:54:43 -0600
  • 95c386911c Clean fund name after getting response from ChatGPT Blade He 2024-12-04 22:08:09 -0600
  • 70362b554f Fix issue for "The last fund name of previous PDF page" logic: If current page fund name starts with "The last fund name of previous PDF page" and with more contents below, then remove "The last fund name of previous PDF page". Blade He 2024-12-04 16:57:52 -0600
  • 36fbaa946e Add the statement when transferring the last fund name of previous PDF page: The last fund name of previous PDF page: page_text = f"\nThe last fund name of previous PDF page: {previous_page_fund_name}\n{page_text}" Blade He 2024-12-03 11:50:31 -0600
  • a11a99fdc3 1. Optimize instructions: not to fetch the data with "up to" statement. 2. Add exception handler in function. Blade He 2024-12-03 11:27:28 -0600
  • bc32860f87 remove_abundant_data Blade He 2024-12-02 17:16:56 -0600
  • c146497052 optimize share feature judgment logic: accumulation with capitalisation and institutional income with distribution Blade He 2024-12-02 13:11:49 -0600
  • 352886ade2 update instructions for TER, OGC, Performance Fees Blade He 2024-12-02 11:45:19 -0600
  • 276ff93a1d Optimize drilldown algorithm Share class names with currency Reason The currency in document not next to share name Solution If can't get relevant text from PDF page contents, and the last word of share class name belongs to currency, remove currency from share class name, then try again. After implementing this solution, recall is from 95% to 96% Can't find relevant text from current PDF page text Reason Hence apply try to merge previous page text into current page, perhaps the text is from previous page text. Solution Try to get previous page and search relevant value. After implementing this solution, recall is from 96% to 98%. Blade He 2024-11-26 16:35:07 -0600
  • a09778d9d1 Create EMEA AR API code file. Optimize annotation list for drilldown. Blade He 2024-11-26 11:24:29 -0600
  • fb356fce76 1. optimize drilldown algorithm 2. support calculate drilldown recall metrics Blade He 2024-11-25 15:11:03 -0600
  • 78fb283130 update python libraries Blade He 2024-11-25 11:11:02 -0600
  • fc80093557 optimize investment mapping Blade He 2024-11-22 14:54:52 -0600
  • f1c0290588 Optimize investment mapping algorithm. Blade He 2024-11-21 16:36:58 -0600
  • 5b9f9416de 1. Update for mapping multilingual share class names. 2. Optimize getting currency logic Blade He 2024-11-21 11:37:58 -0600
  • 843bbbd13f dynamic loading instructions for multilingual. Blade He 2024-11-20 17:00:22 -0600
  • 067d89e0f9 Add datapoint_reportedname.json for dynamic loading reported names based on document language. Blade He 2024-11-19 16:49:15 -0600
  • 8223ca9a5c a little change Blade He 2024-11-18 16:13:24 -0600
  • a42c0b5c2b optimize retrieve fund instructions Blade He 2024-11-13 10:25:08 -0600
  • 7a41b03634 1. optimize instructions for fund name 2. optimize drilldown logic Blade He 2024-11-12 17:01:10 -0600
  • c2d2e54670 "total match" logic for single word value, need consider the "\n" char scenario Blade He 2024-11-12 11:40:19 -0600
  • 5b67bd332b optimize drilldown algorithm Blade He 2024-11-12 11:20:38 -0600
  • c6c3e99d3e integrate pdf drilldown logic to pdf_util.py Blade He 2024-11-11 16:34:25 -0600
  • c34e2e960e optimize drilldown algorithm Blade He 2024-11-08 15:00:34 -0600
  • 81f855f725 support drilldown data to PDF Blade He 2024-11-08 11:22:35 -0600
  • 0349033eaf update for more statistics methods Blade He 2024-11-06 16:39:42 -0600
  • 81a424b00d Support replaces share class name in database to be more readable. Examples document 532422720 M&G European Credit Investment Fund A CHFH Acc -> M&G European Credit Investment Fund A CHF H Accumulation Blade He 2024-11-05 11:14:56 -0600
  • 2645d528b1 support output data point reported name Blade He 2024-10-29 16:47:45 -0500
  • 9d453c9fae a little updates Blade He 2024-10-28 15:15:55 -0500
  • fa763f4f14 1. optimize instructions 2. optimize mapping algorithm Blade He 2024-10-24 16:24:21 -0500
  • 53dadf61f4 optimize keywords/ instructions for special cases documents. Blade He 2024-10-23 16:56:43 -0500
  • 171f3b6d1f optimize for OGC data extraction. Blade He 2024-10-23 16:07:54 -0500
  • 03365227b9 optimize instructions Blade He 2024-10-21 11:04:53 -0500
  • 3f2bb38208 Resolve issue first records only with share class name but without fund name (in previous page text). Blade He 2024-10-16 16:55:32 -0500
  • f166e73362 optimize data extraction algorithm: if can't find cost numeric value from PDF page text, then extract data by Vision ChatGPT Blade He 2024-10-15 15:57:54 -0500
  • 8b651f374c optimize instructions Blade He 2024-10-14 09:12:05 -0500
  • df66489c5f support this scenario: fund and share are with same name. Blade He 2024-10-11 13:14:04 -0500
  • 92a26cd262 optimize configuration Blade He 2024-10-11 12:16:34 -0500
  • 17284c74f0 optimize for investment mapping: share feature logic Blade He 2024-10-09 14:07:07 -0500
  • 04a2409c58 optimize investment mapping algorithm Blade He 2024-10-08 23:53:55 -0500
  • aa2c2332ae optimize for more cases Blade He 2024-10-08 17:16:01 -0500
  • 8bd6008425 refactor code Blade He 2024-10-07 10:34:13 -0500
  • b18c48efeb A little change Blade He 2024-10-03 16:31:16 -0500
  • f0dd7f9e89 Consider multiple share short names cases. Blade He 2024-10-02 17:25:25 -0500
  • edb90c718e Optimize mapping algorithm Consider some share class names are with multiple short name, e.g. CPR Invest Global Disruptive Opportunities Class I sw EUR - Acc The short names are I and sw The purpose is to support get all of short names from share class name. Blade He 2024-10-02 15:08:26 -0500
  • 3bb13947af Optimize mapping algorithm: For multiple currencies in fund/ share name, if exist USD, remove it Fix the issue for split words without space If there is no currency in share class name, try to get same currency from document mapping which with same fund name and same short share class name. Blade He 2024-10-02 13:25:08 -0500
  • f06355e0c8 optimize mapping algorithm: check whether exist "-" to connect share names Blade He 2024-10-02 11:38:11 -0500
  • 035f028155 optimize mapping algorithm Blade He 2024-10-01 16:46:59 -0500
  • 3adbd7631a optimize mapping algorithm Blade He 2024-10-01 15:31:15 -0500
  • d92053a16e optimize mapping metrics algorithm Blade He 2024-10-01 12:19:45 -0500
  • 18174bf1cf optimize mapping: choose proper candidates mapping list. Blade He 2024-10-01 11:35:29 -0500
  • 60a26377e5 optimize investment mapping algorithm Blade He 2024-09-30 16:32:56 -0500
  • 3aa596ea33 optimize mapping logic Blade He 2024-09-27 16:39:56 -0500
  • 39cd53dc33 support calculate mapping metrics based on document investment mapping in database Blade He 2024-09-27 13:20:50 -0500
  • 0c4c541319 optimize mapping algorithm, this is the fixed version to confirm mapping metrics Blade He 2024-09-27 09:25:11 -0500
  • 7eba9a52ae recover algorithm to the better version Blade He 2024-09-26 19:25:17 -0500
  • d25bae936c Optimize investment mapping algorithm. Blade He 2024-09-26 12:18:37 -0500
  • 598e2ab820 investment mapping: optimize for currency logic Blade He 2024-09-25 17:28:22 -0500
  • dd6701f18c 1. optimize investment mapping algorithm 2. realize investment mapping metrics Blade He 2024-09-25 15:15:38 -0500
  • 0f14bf4a7a 1. get document/ provider mapping data 2. optimize metrics algorithm 3. Expand max token length since switch ChatGPT4o to 2024-08-06 version. Blade He 2024-09-23 17:21:02 -0500
  • 8496c7b5ed optimize instructions optimize metrics algorithm Blade He 2024-09-20 16:46:44 -0500
  • 91530d6089 add more description for Performance Fees calculation rules Blade He 2024-09-20 11:58:48 -0500
  • 40bcce4404 instructions: explicitly announce, not to collect data which value with -, *, **, N/A, N/A%, N/A %, NONE Blade He 2024-09-20 10:26:18 -0500
  • c4985ac75f optimize data extract, metrics calculation algorithm Blade He 2024-09-19 22:45:08 -0500
  • 48dc8690c3 support extract data by pdf page image Blade He 2024-09-19 16:29:26 -0500
  • 67371e534e only calculate metrics for intersection document list Blade He 2024-09-19 11:54:51 -0500
  • 27b3540c63 optimize metrics calculation algorithm Blade He 2024-09-19 11:44:17 -0500
  • 98e86a6cfd realize to calculate data extraction metrics. Blade He 2024-09-18 17:10:54 -0500
  • 50e6c3c19d a little change Blade He 2024-09-16 16:43:03 -0500
  • 932870f406 support split text for this case: outputs over 4K tokens. Blade He 2024-09-16 12:03:13 -0500
  • 0f6dbd27eb optimize instructions for performance fees. Blade He 2024-09-13 16:10:44 -0500
  • e17414173a update to get more precise results Blade He 2024-09-12 16:00:49 -0500
  • d56ac9482e Adjust for output example format Blade He 2024-09-11 09:24:36 -0500
  • 0887608719 support auto-mapping fund/ share by raw names. Blade He 2024-09-09 17:34:53 -0500
  • 878383a72c support extract the continuous page(s) for not missing next page data which without table header. Blade He 2024-09-06 16:29:35 -0500
  • 1caf552065 support extract data by ChatGPT4o. The instructions is generated dynamically. Blade He 2024-09-05 17:22:26 -0500
  • 7c83f9152a try to improve page filter precision Blade He 2024-09-04 17:01:12 -0500
  • 7198450e53 support calculate page filter metrics. Blade He 2024-09-03 17:07:53 -0500
  • f81e2862f3 update prompts to extract TOR, OGC, TER, Performance fees data. Blade He 2024-08-30 16:37:00 -0500
  • 63da030fe1 update general prompts Blade He 2024-08-29 17:05:58 -0500
  • 134b365b68 Try to generate general prompts for LUX English AR - Support output fund name ,share name, TER, performance fees, OGC - Only output data point and value which can be found in page text. - Output fund level data and share level data separately. - List part of special cases to fit cases as many as possible. Blade He 2024-08-28 16:44:19 -0500
  • 32676728f6 optimize prompts Blade He 2024-08-28 10:21:26 -0500
  • 15720d8bfd 1. Text-and-image all in one chat function by ChatGPT4o 2. many experiments for extracting data by two ways: page text or page image. Blade He 2024-08-26 17:17:39 -0500
  • 843f588015 support chat with image by ChatGPT4o Blade He 2024-08-26 11:19:07 -0500
  • 6519dc23d4 support filter pages by data point keywords Blade He 2024-08-23 16:38:11 -0500
  • 993664cf78 a lot of functions to prepare data. Blade He 2024-08-22 10:37:56 -0500